Genetic epidemiological studies have suggested that several genetic variants increase the risk for hypertension. It is likely that a number of genes rather than a single gene account for the heritability of this complex disorder. However, the genetic analysis of hypertension produced complex, inconsistent and nonreproducible results, which makes it difficult to draw conclusions about the association between specific genes and hypertension. Material and methods: In this study, we aimed to analyze SNPs that had been investigated in hypertension. These SNPs were collected from text-mind hypertension, obesity and diabetic (T-HOD) data base program, during the period of 31 may 2016. SNPs lists which were reported with hypertension were collected in excel file sheet and processed for analysis using different types of bioinformatics tools and programs. Results: SNPs were evaluated for their deleterious effect on the protein function and stability, in the present study, 7 SNPs were predicted deleterious (A288S, M731T, R172C, R50Q, G460W, K197N, G75V). Mutation3D server showed 3 of mutations (STEA4, PLD2, AZIN2, rs28933400, rs2286672, rs16835244 genes and corresponding rsSNPs respectively) were found to increase risk to hypertension.
Hypertension (elevated blood pressure levels exceeding 140/90 mmHg according to WHO criteria) is a common complex disorder, which affects 15% - 20% of adult population in Western societies [
Genetic epidemiological studies have suggested that several genetic variants increase the risk for hypertension [
Text-mined Hypertension, Obesity and Diabetes candidate gene database (T-HOD), employed the state-of-art text-mining technologies, including a gene identification (GI) system [
Single Nucleotide Polymorphism, causes the most common genetic mutation in human. Around 93% of human genes represent SNPs [
Normally, two different alleles, and also triallelic SNPs in which three different base variations may coexist within a population [
SNPs’ has a greatest importance in biomedical research is for comparing regions of the genome between cohorts (such as with matched cohorts with and without a disease) in genome-wide association studies. SNPs have been used in genome-wide association studies as high-resolution markers in gene mapping related to diseases or normal traits. SNPs without an observable impact on the phenotype (so called silent mutations) are still useful as genetic markers in genome-wide association studies, because of their quantity and the stable inheritance over generations [
A single SNP may cause a Mendelian disease, though for complex diseases, SNPs do not usually function individually, rather, they work in coordination with other SNPs to manifest a disease condition as has been seen in Osteoporosis [
SNPs in non-coding regions can manifest in a higher risk of cancer [
Synonymous Substitutions by definition do not result in a change of amino acid in the protein, but still can affect its function in other ways. An example would be a seemingly silent mutation in the multidrug resistance gene 1 (MDR1), which codes for a cellular membrane pump that expels drugs from the cell, can slow down translation and allow the peptide chain to fold into an unusual conformation, causing the mutant pump to be less functional [
Missense-single change in the base results in change in amino acid of protein and its malfunction which leads to disease (e.g. c.1580G > T SNP in LMNA gene-position 1580 (nt) in the DNA sequence (CGT codon) causing the guanine to be replaced with the thymine, yielding CTT codon in the DNA sequence, results at the protein level in the replacement of the arginine by the leucine in the position 527 [
Nonsense-point mutation in a sequence of DNA that results in a premature stop codon, or a nonsense codon in the transcribed mRNA, and in a truncated, incomplete, and usually nonfunctional protein product (e.g. Cystic fibrosis caused by the G542X mutation in the cystic fibrosis transmembrane conductance regulator gene) [
In this study we aimed to analyze SNPs that had been investigated in hypertension. These SNPs were collected from text-mind hypertension, obesity and diabetic (T-HOD) data base program, during the period of 31 may 2016. The reported SNPs with hypertension were collected in excel file sheet and processed for analysis using different types of bioinformatics tools and programs.
Functional effects of nsSNPs were predicted using different types of bioinformatics tools and programs, these program included SIFT (http://sift.jcvi.org/, http://provean.jcvi.org/index.php), PhD-SNP (http://snps.biofold.org/phd-snp/phd-snp.html), SNPs & GO (http://snps-and-go.biocomp.unibo.it/snps-and-go/), and MutPred (http://mutpred.mutdb.org/), furthermore polyphen was used to confirm PROVEAN results.
SIFT is a sequence homology-based tool that sorts intolerant from tolerant amino acid substitutions and predicts whether an amino acid substitution in a protein will have a phenotypic effect. SIFT is based on the premise that protein evolution is correlated with protein function. Positions important for function should be conserved in an alignment of the protein family, whereas unimportant positions should appear diverse in an alignment [
Substitution of amino acid effects was predicted in protein function based on the conservation degree of the amino acid in the protein sequence, SIFT score of <0.05 is predicted by the algorithm to be damaged and >0.05 is considered to be tolerated [
Is a soft ware that predict the amino acid substitution has any impact on the biological function of the protein, the assessment is based on PROVEAN score, where score of <−2.5 indicated that the protein variants is predicted have a deleterious effects, while the score of >−2.5 the variant is predicted to have a “neutral” effect. [
The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System was designed to classify proteins (and their genes) in order to facilitate high throughput analysis. In this study the amino acid sequences were analyzed using PANTHER program to classify proteins.
PolyPhen-2 (Polymorphism Phenotyping v2) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. In this study PolyPhen-2 program was used to classify proteins into deleterious and benign.
It is a support vector machine (SVM), based on the method to predict accurately where the mutation is related to disease from the protein sequence. Protein sequence was prepared in FASTA format and processed for analysis, the output results was obtained as neutral or disease related variation, the RI (reliability index ) with value > 5 indicate disease related effect on function caused by the mutation on the protein [
The MutPred server (http://mutpred.mutdb.org/), used to classify amino acid substitution (aas) as disease associated or neutral, also it predict disease/deleterious amino acids. The output of MutPred contains a general score (G), the probability that the amino acid substitution is deleterious/disease-associated and top 5 property scores (p).
I-Mutant 2.0 is a SVM based tools, support vector machine based tool that leads to automatic protein stability change prediction which is caused by single point mutation. Positive ΔΔG value indicated that the mutated protein is of higher stability [
Is a support vector machine-based tool for the prediction of protein stability changes upon non-synonymous SNPs. A score < 0 means the variant decreases the protein stability; while, a score > 0 means the variant increases the protein stability.
ELASPIC is a novel ensemble machine learning approach that predicts the effects of mutations on protein folding and protein-protein interactions. The web server can be used to evaluate the effect of mutations on any protein in the Uniprot database, and allows all predicted results, including modeled wild-type and mutated structures, to be managed and viewed online and downloaded if needed.
The detection of nsSNPs Location in Protein Structure uses Mutation3D. Mutation3D (http://mutation3d.org) is a functional prediction tool for studying the spatial arrangement of amino acid substitutions on protein models and structures. This tool was used to analyse proteins structure for selected SNPs from hypertension data according to T-HOD data base.
UCSF Chimera is a highly extensible program for interactive visualization and analysis of molecular structures and related data. Chimera (version 1.8) software was used to scan the 3D (three-dimensional) structure of specific protein [
ModWeb: A Server for Protein Structure Modeling: was used to analyse and remodelling of protein sequences of Q687X5, P35611and Q9UBU3-2 proteins.
Comparative Modelling uses a combination of multiple templates and iterative optimization of alternative alignments.
Project HOPE is an easy-to-use web-server that analyses the structural effects of mutation of interest. The server was used to analyse protein sequences in this study. Project HOPE collecte and combine available information from a series of web-servers and databases and produced a mutation report complete with results, figures and animations. Where available Project HOPE will use the 3D structure of the protein but the server can also build a homology model if necessary. Other information sources include the Uniprot database and a series of DAS prediction servers [
The T-HOD data-base server was used to retrieve hypertension SNPs, a total of 282 rsSNPs, were analyzed using variants effects predictor, results showed that intron-variants 30%, down_stream_gene_variants 23%, non- codong transcript variants 15%, upstream gene variants 11%, messense variants 4%, regulatory region variants 3%, 3 prime UTR variants 3%, 2%. The coding consequences represent missense variants 69%, synonymous variants 28%, stop-lost 2%, and coding sequence variants 1%. Only missense non-synonymous coding SNPs were chosen for further analysis. nsSNPS, and mutations position were displayed in
The all SNPS were submitted to SIFT program to predict their effects on protein, out of 282 rsSNPs screened 27 rsSNPs were tolerated, 7 rsSNPS were damaging from which 4 were deleterious and 3 were neutral, SIFT couldn’t find 248 SNPs (
The above mentioned damaging 7 rsSNPS, were submitted to provean server, 4 of them were deleterious while 3 were neutral. Significant correlation was found between SIFT and PROVEAN results, that the results of SIFT
rsSNP | Protein ID | Amino acid change | Polymorphism |
---|---|---|---|
rs16835244 | NP_443724 | A288S | Gca/Tca |
rs28933400 | NP_000693 | M731T | aTg/aCg |
rs2286672 | NP_002654 | R172C | Cgt/Tgt |
rs34911341 | NP_001128413 | R50Q | cGa/cAa |
rs4961 | NP_001110 | G460W | Ggg/Tgg |
rs5370 | NP_001161791 | K197N | aaG/aaT |
rs1981529 | NP_078912 | G75V | gGc/gTc |
rsSNPs | SIFT | Score | PROVEAN | Score | Polyphen | Panther |
---|---|---|---|---|---|---|
rs16835244 | damaging | 0 | deleterios | −2.792 | benign | probably damaging |
rs28933400 | damaging | 0 | deleterios | −4.465 | possibly damaging | probably damaging |
rs2286672 | damaging | 0 | deleterios | −2.511 | benign | probably benign |
rs34911341 | damaging | 0 | deleterios | −2.593 | probably damaging | probably damaging |
rs4961 | damaging | 0.03 | neutral | −2.31 | probably damaging | probably benign |
rs5370 | damaging | 0.01 | neutral | −0.928 | possibly damaging | probably benign |
rs1981529 | damaging | 0.03 | neutral | −1.937 | benign |
showed 7 of the SNPs were damaging while PROVEAN detected 4 of the 7 SNPs were deleterious, SIFT and PROVEAN prediction may suggest protein disruption and function. Panther server was also used, out of the 7 SNPs, 3 were probably damaging and the rest were benign (
SNPs & GO results showed that out of 7 SNPs 3 were predicted to have disease causing ability while the rest were neutral by PHD-SNP, by SNP & GO 2 of the SNPs were predicted to have causing disease ability (
Changes in protein stability were examined by I-mutant 2 and MUpro software programs. The results of I-mutant 2 showed that (A288S, M731T, R171C, R50Q, G460W, K197N, G75V) were predicted decreasing of the free energy of proteins except G460W was predicted to increase of the free energy of protein. MUpro results predict increase stability of protein in all of the variants (
MutPred analysis was done to determine the degree of tolerance for each amino acid substitution on the basis of physio-chemical properties.
SNPs were classified according to their structural location, into core or interface, in the present study 3 variants were core structural location while the rest were not classified, detailed results of ELASPIC was displayed in
Results of Mutation3D indicated that 3 of mutations (STEA4, PLD2, AZIN2, rs28933400, rs2286672, rs16835244 genes and corresponding rsSNPs respectively) were found to be with a high risk to hypertension, they located in the protein domain, detailed results were displayed in Figures 1-6.
3D of protein structure is very important to verify the deleterious mutations and possible effects on the structure and function of protein, in this study 4 proteins were modelled by Chimera UCSF program 1.8, and H bonding inter-actions and clashes were calculated using Chimera 1.8 program. Modeller server [
In the present study we aimed to investigate SNPs which were reported with hypertension, and as we mentioned
rsSNPs | PHD_SNP | RI | SNP & GO | RI |
---|---|---|---|---|
rs16835244 | disease | 5 | disease | 0 |
rs28933400 | disease | 3 | disease | 10 |
rs2286672 | disease | 3 | neutral | 1 |
rs34911341 | neutral | 1 | unclassified | |
rs4961 | disease | 0 | neutral | 4 |
rs5370 | neutral | 3 | unclassified | |
rs1981529 | disease | 2 | neutral | 3 |
rsSNPs | I-Mutant SVM2 | DDG value | MuPro | Confidence score |
---|---|---|---|---|
rs16835244 | dicrease | −1.24 | increase stability | −0.805949883 |
rs28933400 | dicrease | −0.43 | increase stability | −0.793908856 |
rs2286672 | dicrease | −1.18 | increase stability | 0.613637436 |
rs34911341 | dicrease | −0.6 | increase stability | 0.707156682 |
rs4961 | increase | 0.35 | increase stability | −0.791147359 |
rs5370 | dicrease | −2.27 | increase stability | −0.928539952 |
rs1981529 | dicrease | −1.54 | increase stability | 0.955986275 |
Mutation | Prob. deleterious | Loss of sheet | Gain of helix | Loss of loop | Glycos. S283 | Gain of MoRF binding |
---|---|---|---|---|---|---|
A288S | 0.253 | (P = 0.0228) | (P = 0.0893) | (P = 0.2897) | (P = 0.4302) | (P = 0.4656) |
M731T | 0.937 | (P = 0.1358) | (P = 0.1466) | (P = 0.1872) | (P = 0.2205) | (P = 0.2897) |
R50Q | 0.581 | (P = 0.0115) | (P = 0.0921) | (P = 0.132) | (P = 0.1688) | (P = 0.1894) |
K197N | 0.075 | (P = 0.02) | (P = 0.0668) | (P = 0.0997) | (P = 0.1299) | (P = 0.1579) |
G460W | 0.078 | (P = 0.0549) | (P = 0.1312) | (P = 0.1551) | (P = 0.1736) | (P = 0.1907) |
M731T | 0.937 | (P = 0.1358) | (P = 0.1466) | (P = 0.1872) | (P = 0.2205) | (P = 0.2897) |
G75V | 0.388 | (P = 0.0359) | (P = 0.0477) | (P = 0.0556) | (P = 0.0609) | (P = 0.1131) |
rsSNPs | ELASPIC | ΔGwt | ΔGmut | ΔΔG |
---|---|---|---|---|
rs16835244 | core | 195.996 | 196.748 | −0.96341 |
rs28933400 | core | 327.034 | 331.157 | 0.575316 |
rs2286672 | core | 276.518 | 276.932 | 0.34204 |
rs34911341 | unclassified | unclassified | unclassified | unclassified |
rs4961 | unclassified | unclassified | unclassified | unclassified |
rs5370 | unclassified | unclassified | unclassified | unclassified |
rs1981529 | core | 14.7475 | 13.7847 | −0.87053 |
before these SNPs were retrieved from T-HOD database web site. The methods used to assess nsSNPs (mutations) in this study were based on different types of bioinformatics tools, describing pathogenicity and providing some clue on molecular level about the effect of mutations. It is very difficult to use one method or bioinformatics tool to predict pathogenic effect of SNPs, so in the present study we used 12 different in cilico prediction olgarithim (SIFT, PROVEAN, PHD-SNP, Panther, MUpro, MutPred, I-Mutation2, polyphen, SNP & GO, Project- Hope, Chimera and modeller to sort tolerant and diseased SNPs.
The findings of this study showed that 7 SNPs were damaged by using SIFT (A288S, M731T, R172C, R50Q, G460W, K197N, G75V) and 4 (A288S, M731T, R172C, R50Q) out of the seven SNPs were deleterious by PROVEAN, while 4 SNPs were found to be disease caused by PHD-SNP (A288S, R172C) and 2 SNPS (A288S, M731T, R172C, G75V and G460W) by using SNPS & GO. Polyphen results showed that 4 SNPS (M731T, R50Q, G460W, K197N) were probably and possibly damaging, moreover Panther results indicated A288S, M731T and G460W were probably damaging. I-Mutant Suite results showed that 6 mutations were decreasing protein stability (A288S, M731T, R172C, R50Q, K197N, G75V) while G460W showed increased stability of protein. By comparing output of the 6 above mentioned in-cilico bioinformatics tools, A288S, M731T, R172C, G75V, G460W, R50Q and K197N mutations were found functionally significant. Using MutPred to determine the degree of tolerance of each amino acid substitution on the bases of physo-chemical properties, results of this study showed that, A288S,R50Q, K197N and G460W were harmful with loss of sheet P = 0.0228, 0.0115, 0.02 and 0.0549 respectively. Furthermore, these 7 SNPs were analysed by structurally and functionally by using 5 bioinformatics tools; Chimera, Mutation 3D, PDB, modeller and ELASPIC. In the present study the “core” residues were found predominant within the mutations, this residues are defined as residues which are exposed in the monomeric protein but buried in the protein complex. Core residues are typically hydrophobic with a composition strongly divergent from the composition of the remainder of the protein surface [
rs28933400
The mutant residue is smaller than the wild-type residue. The wild-type residue is more hydrophobic than the mutant residue. The mutated residue is located in a domain that is important for binding of other molecules. The mutated residue is in contact with residues in another domain. It is possible that the mutation disturbs these contacts. 3D of protein of this mutation showed that the mutation was located in the core of protein, and this may increase the risk of hypertension. Moreover this mutation showed differences in H-bonding between the wild type and mutant type residues, and these differences may affect protein stability.
rs2286672
The mutant residue is smaller than the wild-type residue. The wild-type residue was positively charged, the mutant residue is neutral. The mutant residue is more hydrophobic than the wild-type residue. The mutation is located within a domain. The mutation introduces an amino acid with different properties, which can disturb this domain and abolish its function. There is a difference in charge between the wild-type and mutant amino acid.
The charge of the wild-type residue will be lost, and this can cause loss of interactions with other molecules or residues. The wild-type and mutant amino acids differ in size. The mutant residue is smaller, and this might lead to loss of interactions. The hydrophobicity of the wild-type and mutant residue differs. The mutation introduces a more hydrophobic residue at this position. This can result in loss of hydrogen bonds and/or disturb correct folding. 3D of protein of this mutation showed that the mutation was located in the core of protein, and this may increase the risk of hypertension. Moreover this mutation showed differences in H-bonding between the wild type and mutant type residues, and these differences may affect protein stability.
rs16835244
The wild-type and mutant amino acids differ in size. The mutant residue is bigger than the wild-type residue. The wild-type residue was buried in the core of the protein. The mutant residue is bigger and probably will not fit. The hydrophobicity of the wild-type and mutant residue differs. The mutation will cause loss of hydrophobic interactions in the core of the protein. 3D of protein of this mutation showed that the mutation was located in the core of protein, and this may increase the risk of hypertension. Moreover this mutation showed differences in H-bonding between the wild type and mutant type residues, and these differences may affect protein stability.
rs34911341
There is a difference in charge between the wild-type and mutant amino acid. The charge of the wild-type residue will be lost, and this can cause loss of interactions with other molecules or residues. The wild-type and mutant amino acids differ in size. The mutant residue is smaller, and this might lead to loss of interactions.
rs4961
The wild-type and mutant amino acids differ in size. The mutant residue is bigger, this might lead to bumps. The torsion angles for this residue are unusual. Only Glycine is flexible enough to make these torsion angles, mutation into another residue will force the local backbone into an incorrect conformation and will disturb the local structure.
rs5370
The mutation is located within the signal peptide. This sequence of this peptide is important because it is recognized by other proteins and often cleaved of to generate the mature protein.
The new residue that is introduced in the signal peptide differs in its properties from the original one. It is possible that this mutation disturbs recognition of the signal peptide.
There is a difference in charge between the wild-type and mutant amino acid. The charge of the wild-type residue will be lost, and this can cause loss of interactions with other molecules or residues. The wild-type and mutant amino acids differ in size. The mutant residue is smaller, and this might lead to loss of interactions.
rs1981529
The wild-type and mutant amino acids differ in size; in addition to that the mutant residue is bigger than the wild-type residue. Moreover the mutation is located on the surface of the protein; mutation of this residue can disturb interactions with other molecules or other parts of the protein. The torsion angles for this residue are unusual. Only glycine is flexible enough to make these torsion angles, mutation into another residue will force the local backbone into an incorrect conformation and will disturb the local structure.
The available hypertension rsSNPs from T-HOD data base were retrieved, and then analyzed using different types of bioinformatics tools, and the predicted deleterious SNPs were evaluated for their deleterious effect on the protein function and stability. In the present study, 7 SNPs were predicted deleterious (A288S, M731T, R172C, R50Q, G460W, K197N, G75V). Mutation3D server showed that 3 of mutations (STEA4, PLD2, AZIN2, rs28933400, rs2286672, rs16835244 genes and corresponding rsSNPs respectively) were found to increase risk to hypertension.
Alsadig Gassoum,Nahla E. Abdelraheem,Nehad Elsadig, (2016) Comprehensive Analysis of rsSNPs Associated with Hypertension Using In-Silico Bioinformatics Tools. Open Access Library Journal,03,1-24. doi: 10.4236/oalib.1102839
SIFT: is a sequence homology-based tool that Sorts Intolerant From Tolerant amino acid substitutions and predicts whether an amino acid substitution in a protein will have a phenotypic effect
PROVEAN: (Protein Variation Effect Analyzer)
ELASPIC: Ensemble Learning Approach for Stability Prediction of Interface and Core mutation
SNP: Single Nucleotide polymorphism