Engineering, 2013, 5, 181-188 http://dx.doi.org/10.4236/eng.2013.510B039 Published Online October 2013 (http://www.scirp.org/journal/eng) Copyright © 2013 SciRes. ENG Semi-Global Inference in Phenotype-Protein Network Siliang Xia1, Guangri Quan2, Yon gb o Zhao3, Xuhui Jia4 1Institute of Architecture of Application Systems, University of Stuttgart, Stuttgart, Germany 2School of Computer Science, Harbin Institute of Technology at Weihai, Weihai, China 3Institute of Microelectronics, Chinese Academy of Sciences, Beijing, China 4Department of Computer Science, The University of Hong Kong, Hong Kong, China Email: xiasiliang.hit@gmail.com Received May 2013 ABSTRACT Discovering genetic basis of diseases is an important goal and a challenging problem in bioinformatics research. In- spired by network-based global inference approach, Semi-global inference method is proposed to capture the complex associations between phenotypes and genes. The proposed method integrates phenotype similarities and protein-protein interactions, and it establishes the profile vectors of phenotypes and proteins. Then the relevance between each candi- date gene and the target phenotype is evaluated. Candidate genes are then ranked according to relevance mark and genes that are potentially associated with target disease are identified based on this ranking. The model selects nodes in integrated phenotype-protein network for inference, by exploiting Phenotype Similarity Threshold (PST), which throws lights on selection of similar phenotypes for gene prediction problem. Different vector relevance metrics for computing the relevance marks of candidate genes are discussed. The performance of the model is evaluated on Online Mendelian Inheritance in Man (OMIM) data sets and experimental evaluation shows high performance of proposed Semi-global method o ut performs existing global i n fe r ence methods. Keywords: Diseases Gene Prioritization; Phenotype-Protein Network; Se mi-Global Inference; Phenotype Similarity Threshold 1. Introduction It is challenging for biomedical research to figure out the genetic basis of diseases. Traditional biology researchers adopt linkage analysis and association studies [1] to dis- cover disease genes, which firstly locate disease g enes in a chromosome region. However, the resolution of this approach is low and further analysis of candidate genes in a large genomic region is an expensive task, which prevents gene identification even after a region has been detected. Many studies have tried to discover disease genes with computational methods. Some work related was based on annotations [2-4], or based on sequences [5]. But, the methods rely on functional annotations are limited be- cause only a small part of genes in the genome have been annotated currently and methods based on sequencing is an expensive task. Moreover, they treated disease genes as separate and independent, however, biological pro- cesses are not realized by a single molecule, but rather by the complex interactions of proteins, and the breakdown in protein interaction networks could result in diseases [6]. Moreover, some research indicates that phenotypi- cally similar diseases are caused by functionally related genes [7], and the proteins coded by these functionally related genes usually have direct or indirect interactions [8]. From this perspective, disease genes could then be investigated through the interaction networks of disease proteins. Recently, researchers took advantage of the computing method to build biological network to help explore the relationship among biological information in multiple granularity, and netwo rk approach in biology is pro posed and under active research [9], which also facilitates dis- ease gene discovery. A wide range of methods are pro- posed based on network methods for disease gene priori- tization [10-16]. A method utilizing Bayesian predictor and ranking of protein complexes linked to human dis- eases is proposed by Kasper Lage et al. to predict genes of human’s inherited phenotypes [13]. Xuebing Wu et al. proposed network-based global inference approach [14]. These methods achieve some accomplishments in disease gene prioritization, which primarily relies on analysis of the topological properties of PPI networks and the ex- pectation that the products of genes that are associated with similar diseases interact heavily with each other. Motivated by these existing network based approaches, we propose a network based Semi-global inference model for disease gene prioritization, which selects diseases in
S. L. XIA ET AL. Copyright © 2013 SciRes. ENG integrated phenotype-protein network for building profile vectors of candidate genes and target disease, by ex- ploiting Phenotype Similarity Threshold (PST). The model evaluates the relevance between candidate genes and the given target phenotype. Candidate genes are then ranked according to relevance marks. Genes that are po- tentially associated with target disease are prioritized based on this ranking. To evaluate the effectiveness of the model, the proposed model is tested on known phe- notype and gene pairs from OMIM. Our research has three contributions: • Semi-global inference method with Phenotype Simi- larity Threshold (PST) is proposed to prioritize candi- date disease genes. The experimental result shows the proposed Semi-global method outperforms existing global infe r ence method. • Phenotype Similarity Threshold (PST) is defined to make a difference between high similarities and low similarities and to distinguish between diseases closely related to target disease and diseases less related, which specifies phenotypes in the network to be con- sidered and exploited for inference. Two methods (S-PST, D-PST) to get PST are introduced and com- pared. • Performance of proposed model with different vector relevance metrics (Pearson correlation coefficient, Euclidean distance and Cosine similarity) are eva- luated and compared. We show that Semi-global infe- rence works well with Euclidean distance and Cosine similarity. In Section 2, we briefly introduce the background of network based candidate gene prioritization by describ- ing the problem formally and discussing the related work and their limitations. Section 3 presents Semi-global in- ference model and explains strategies of PST to select nodes in phenotype-protein network. Section 4 shows experimental results of proposed Semi-global inference model with variation of relevance metrics and PST,and comprehensively compares the performance of proposed model against an existing global inference method. In Section 5, we draw some conclusions and point out fur- ther wor k. 2. Background 2.1. Network Based Candidate Gene Prioritization Here is a brief d escription of network -based disease gene prioritization problem referring to [17]: given target dis- ease d, the input to the candidate disease gene prioritiza- tion problem consists of two sets of genes, known set K and candidate set C. The known set K contains prior knowledge of the disease d, e.g., it is the set of genes known to be associated with d and diseases similar to d. Each gene g K is associated with a similarity score σ(g, d), indicating the known degree of association be- tween g and d. The candidate set C contains candidate genes, one or more of which is potentially associated with target disease d (e.g., these genes might be in the linkage interval of d that is identified by association stu- dies). The purpose of network based disease prioritiza- tion is to use a PPI network G = (V, E), to compute a score φ(v, D) for each gene g C that represents the likelihood of g to be associated with d. The PPI network G = (V, E) consists of a set of gene products V and a set of undirected inter actions E between these gene products, in which uv E represents an interaction between u V and v V. In this ne t work, the set of interacting partners of a gene product v V is defined as N(v) = {u V: uv E}. Global prioritization methods use this network infor- mation to compute φ by propagating σ over G. Candidate genes with high relev ance to target disease of interest are ranked in the top and are regarded as the disease genes. 2.2. Related Work Xuebing Wu et al. have proposed network-based global inference approach called CIPHER algorithm [14], in which Pearson correlation coefficient is adopted to eva- luate the relevance between candidate genes and the tar- get disease. Another global inference method is proposed based on a network propagation algorithm to formulate constraints on the prioritization function [16]. Although these existing global network based methods to some extent throw ligh ts on disease gene prioritization problems, they have some drawbacks and limitations. Research of Xuebing Wu et al. is based on the assump- tion of the linear correlation between profiles of pheno- types and disease genes, which shows some bias against genes whose related proteins have few interactions with other peers [14]. Moreover, as reported in literatures, network based global inference methods, favor genes whose products are highly connected in the network and perform poorly in identifying loosely connected disease genes, due to centrality of target disease genes [17] and incomplete and noisy nature of the PPI data [18]. In global inference method, all the diseases in the phenotype similarity network are exploited to generate a prediction, including less related diseases to profile a target disease, which fails to take into consideration that more similar diseases may play more important roles in inference. No work has been done for disease gene pri- oritization using only parts of diseases in phenotype network , and nodes selection strategy has not been ex- plored. Secondly, phenotype similarities vary. A target disease has different phenotype similarities to other dis- eases in the network. No selection criteria is made to treat roles of diseases differently in phenotype network,
S. L. XIA ET AL. Copyright © 2013 SciRes. ENG no methods make a difference between high similarities and low similarities, which might be considered to de- termine which related d iseases to refer in gene pr ioritiza- tion pro bl e ms. Our research aims at exploring the uncovered areas mentioned and overcoming limitations of global infe- rence methods. We propose Semi-global inference me- thod by exploiting PST as the criteria to select pheno- types in network for inference, which is the essential difference between proposed Semi-global model and existing global inference methods. 3. Methodology In this section we present the mathematical model and show the general framework of gene prioritization algo- rithm of Semi-global inference. Furthermore, we explain how Phenotype Similarity Threshold is exploited for nodes selection in phenotype network, which is the core of Semi -global inference model. It is important to note that the purpose here is to infer functional associations between genes from functional and physical interactions between their products. For this reason, any reference to interactions between genes in this paper refers to the interactions between their prod- ucts. Meanwhile, disease gene prioritization is inferred from phenotypically similar diseases, term disease and term phenotypes deliver identical conception in this pa- per. 3.1. Mathematical Model • Undirect ed gra ph (1) is defined as phenotype similarity network; is a subset of all the phenotypes, ; and the element is the similarity of phenotypes . • Undirect ed gra ph (2) is defined as protein interaction network; is a subset of all the proteins, ; and the element denotes the interaction of proteins . • Given a phen oty pe , , set (3) is defined as association set of ; each element in is an association of ; set (4) is defined as global association set, which contains all phenotype-protein associations. • Given phenotype similarity network GPhenotype, pro- tein interaction network and global asso- ciation set , set (5) is phenotype -pr otein network. • Given a phenotype and a protein , , (6) denotes one dimension of the profile vector of protein . • Phenotype Similarity Threshold (PST) is a manually set similarity value that satisfies (7) • Given a phen oty pe , set (8) contains the phenotypes that have similarities higher or equals to PST with . Each element in is de- fined as a Closely Related P henotype of . • Given a phen oty pe , , if , then is used as a dimension in pro- file vector of ; vector (9) characterizes the profile of in phenotype similarity network , in which . Means only the similar- ities of Closely Related Phenotypes (higher than PST) are used to build the profile vector of a target phenotype of interest. • Given a phenotype and a protein , , if , thenis used as a dimens ion of vector o f ; vector (10) characterizes the profile ofin Protein Network, in which . Given a phenotype and a protein, let denote a relevance metric of vector and vector . Three different metrics are defined, which characterize the correlation between profile vectors of protein and phenotype and thus indicate the relevance of candidate protein and target phenotype . Let denote Euc l idean dis t a nc e of t w o ve c t ors , (11) Cosine similarity is a metric measuring the included angle of two vectors, which is denoted as , ∀∈ ={<, is associated with } jk kkj pggG gp>|∈ ∧ 1 = m j j CAssociation CAssociation = ,,NGPhenotype GProtein CAssociation=<> (,){ |,} kxkyx yx fgpMax IpgCAssociation=< >∈ {|, }{|, } jkj kjkj k Minsp pPPSTMaxsp pP∈≤ ≤∈ {| } jkkjk CRPp pP sPST=∈∧ ≥ j CRP 123 (,,) r ji jijiji SSS S= … 123 ((,),(, ),(, )(, )) r ki kikiki fg pfgpfg pfg p= … 1 22 1, 1 (,)((()) ) m jkjiki i p gSfgd ϕ = = − ∑
S. L. XIA ET AL. Copyright © 2013 SciRes. ENG (12) Pearson correlation coefficient indicates linear correla- tion between two vectors, which is denoted as , (13) 3.2. Phenotype Similarity Threshold (PST) According to the biological assumption that phenotypi- cally similar diseases are caused by functionally related genes [7], the proposed Sem i-global inference model takes into consideration only phenotypes that are highly similar to target disease, with similarities higher than PST. We use only those Closely Related Phenotypes (re- fer to (8)) of and exploit corresponding similarities to characterize the target phenotype. Therefore, In (9) and (10), given a phenotype, the dimensions of profile vector are determined by the number of phenotypes in , the dimensions of profile vector of candidate genes are reduced correspondingly. 3.3. Semi-Global Inference Based on the mathematical model above, here we give the computation framework of proposed semi-global inference method, which consists of two algorithms to prioritize candidate disease genes. Algorithm 1 Relevance Mark Calculation calculates the relevance mark for a given pair of target phenotype and candidate protein . Algorithm 2 Disease Gene Prioritization takes a target phenotype as the input and evaluates relevan ce mark for all candidate proteins in linkage interval, then prioritizes the candidate proteins based on their relevance marks. Proteins with high re- levance mark are regarded highly related to target phe- notype and thus genes associate with these top ranked proteins are the underlying causing genes of target dis- ease, as the predictive result of Semi-global inference model. In practice, each of metrics (11) or (12) (13) ar e tested respectively in Algorithm 1 as relevance evaluation of candidate proteins. Algorithm 2 is invoked to prioritize candidate genes for all phenotypes we are interested in. 4. Results In this section, we comprehensively evaluate the perfor- mance of proposed Semi-global inference model with different setting of metrics and PSTs. Then we compare proposed model to global inference method. 4.1. Datasets To evaluate the proposed model, data sets needed are listed as follows: Phenotype set and quantified similari- ties between each pair of phenotypes. Protein set and quantified protein interaction between each pair of pro- teins. Set of known pairs (associations) of phenotypes and associated proteins, which serves as the validation set. Phenotype set and their linkage intervals are obtained from Online Mendelian Inheritance in Man (OMIM) Morbid Map [19], which provides a publicly accessible and comprehensive database of genotype-phenotype re- lationship in humans; phenotype similarities come from the research of van Driel et al. [20]; quantified protein interaction marks are extracted from STRING database [21] to build PPI network; chromosome mapping of pro- teins are extracted from Ensembl database [22]; valida- tion set can be built from phenotype-protein network, by extracting the phenotype-gene mapping from OMIM Morbid Map and gene-protein mapping from bioDBnet database [23] and mapping phenotype network to PPI network. Those phenotypes that can not be mapped to proteins are removed, due to lack of known associated genes or incomplete information of proteins coded by genes in the linkage interval. We finally get 1897 phenotypes and 84652 proteins in total, while only 156584 protein-pro- tein interactions are available. Those missing PPI records are regarded as zero. 2549 known phenotype-gene pairs are maintained for evaluation. 4.2. Experimental Setting We apply leave-one-out cross-validation in order to eva- luate the performance of different methods in terms of accuracy of disease gene prioritization. For each disease of interest, we conduct following experiment: • We remove all associations of this targ et disease from global association set (refe r t o (4)). • All the genes in the linkage interval are regarded as candidate genes to be prioritized. On average, there are 750 candidate genes in the linkage interval of a disease. • In practice, w e exploit Po sition Parameter to get PST: Phenotype similarities are sorted in an array in ascending order, then PST is assigned as the value re- trieved from the array with index of (array size * ), so PST is determined by assigning a value from zero to one. It is important to note that when , all the nodes in phenotype network are considered in inference. In this case, Semi-global model degenerates into global inference. Thus, global inference method is a case of proposed Semi-global model when . We conduct experiment with two methods to ge t PST: Static method (S-PST). All the phenotyp e similarities are sorted in one array. PST is a global static value for all target diseases during the experiment. 2( ,) cos( ,) ||| | jk jk pg pg pg ϕ ⋅ = = 3 cov( ,) (,) ( )( ) jk pg pg ϕσσ = j CRP
S. L. XIA ET AL. Copyright © 2013 SciRes. ENG Dynamic method (D-PST) . PST is retrieved from a smaller phenotype similarity set containing only the si- milarities related to current targ et disease. Different PSTs are gained for prediction of different target diseases, ac- cording to the similarity range of that target disease. • We conduct the experiment with each combination of relevance metrics and PST met hods. • In order to systematically compar e the performance of proposed model, we use following evaluation criteria: Average Rank. Average rank in proposed model of known dise a s e genes. Fold Enrichment. Ability to enrich known disease genes ove r ra n dom selection [13]. Distribution of Cases. Percentage of the test cases ranked within top 1%, top 5% and top 10%. 4.3. Experiment with Variation of PST and Relevance Metri cs Proposed model with Euclidean distance shows a rapid increase of average rank with the increase of λ, though the performance is always poorer than that of model with the other two relevance metrics. The model exhibits a high average rank with high PST (high Position Parame- ter λ) using S-PST, in spite of relevance metrics adopted. For model with Euclidean distance and Cosine simi- larity, fold enrichment gets higher along with the in- crease of . On the other hand, Figure 1 to Figure 4 show that proposed model with Cosine similarity gets higher performance than the other two relevance metrics. Moreover, the trend of the performance with increasing λ shows that the model gains better performance when highly similar diseases are referred to profile target dis- ease and candidate genes, in which the profile vectors consist of only a few dimensions and only small part of nodes (eg. diseases holding top 5% highest similarities in whole phenotype network in S-PST and diseases holding Figure 1. Average rank to compare the performance of pro- posed model using S-PST with differen t rel ev a n ce m et rics. Figure 2. Average rank to compare the performance of pro- posed model using D-PST with different releva n ce m et rics. Figure 3. Fold enrichment to compare the performance of S-PST model with different metrics . Figure 4. Fold enrichment to compare the performance of S-PST model with different metrics .
S. L. XIA ET AL. Copyright © 2013 SciRes. ENG top 5% highest similarities to the tar get disease in D-PST) are exploited. Therefore it indicates the strategy that re- ferring only part diseases in proposed Semi -global model works well with these two relevance metrics (especially with Euclidean distance) and nodes selection with PST and dimension reduction of profile vectors achieves per- formance improvement. Model with Pearson correlation coefficient reaches its best performance when λ = 0 (global inference method) and shows a decline with increase of λ. Therefore, pro- posed Semi-global inference does not increase the per- formance if Pearson correlation is adopted as the relev- ance metric. 4.4. Comparison to Global Inference Method Here we discuss the cases when D-PST is exploited with a certain λ assigned to get the relative high performance using different relevance metrics and compare them to global inference method using CIPHER algorithm [14] with the same relevance metric. Table 1 and Table 2 demonstrate that Semi-global model with D-PST and high λ outperforms global infe- rence method using same relevance metrics. Especially for Euclidean distance, when λ is assigned with a high value, Semi-global model shows much higher perfor- mance than global inference. D-PST and Cosine similarity work as the best combi- nation, with which proposed model reaches a high per- formance with fold enrichment being 217.62 when and average rank being 16.29 when . In this configuration, Semi-global model takes into ac- count top 4% most similar diseases of the target disease for inference, outperforms the highest fold enrichment of 197.60 and average rank of 22.31 in global inference method. Table 3 shows that with same relevance metrics, more known disease genes are ranked within top 1%, top 5% Table 1. Fold enrichment to compare performance of pro- posed Semi-global model to a global inference method.1 ED CS PCC Global inference (CIPHER) 55.44 159.71 197.60 Semi-global inference (D-PST) ( ) ( ) ( ) Table 2. Average rank to compare performance of proposed Semi-global model to a global inference method.1 ED CS PCC (CIPHER) 1172.60 22.31 67.94 Semi-global inference (D-PST) ( ) ( ) ( ) Table 3. Percentage of the known disease genes ranked within top 1%, top 5% and top 10% in proposed Semi- global model and a global inference model.1,2 Top 1% Top 5% Top 10% (CIPHER) ED 0.21 0.25 0.26 CS 0.64 0.93 0.97 PCC 0.72 0.91 0.94 Semi-global inference (D-PST) ED ( ) 0.59 0.82 0.85 CS ( ) 0.75 0.94 0.98 PCC ( ) … … … and top 10% in proposed Semi-global model using D-PST than that in global inference method. It also shows D-PST and Cosine similarity in proposed model achieves better performance than other combinations of PST methods and relevance metrics. Then, we compare distribution and accumulation of test cases between proposed Semi-global model and global method when they reach their respective high performance with particular experimental settings. Figures 5 and 6 are general views about distribution and accumulation of test cases of proposed Semi-global model and global inference method with particular set- tings to reach their high performance which are Semi- global model using D-PST with Euclidean distance and , Semi-global model using D-PST with Co- sine similarity and and CIPHER algorithm with Pearson correlation coefficient as a representative of global inference. Semi-global model using D-PST and Cosine similarity not only gets better performance in terms of average rank and fold enrichment than global inference, but also generates a more desirable distribu- tion of test cases. It ranks more than 75% cases within top 1%, and the accumulated ratio of test cases is higher than global inference method. 5. Conclusions In this paper, a S emi-global inference model with PST is proposed for disease gene prioritization, which applies profile vectors in phenotype-protein network to charac- terize target disease and candidate genes. The model is evaluated comprehensively on OMIM dataset and the experimental result shows proposed Semi-global model outperforms existing global inference method. Phenotype Similarity Threshold (PST) is proposed and Closely Related Phenotypes are defined. It is adopted as a criterion to select diseases in phenotype network to pro- file the target disease. Thus, by considering only highly similar diseases, proposed PST has significance in nodes 1ED = Euclidean distance, CS = Cosine similarity, PCC = Pearson correlation coefficient. 2Last row in Table 3 is blank because proposed model with Pearson correlation coefficient reaches best performance when λ = 0 degenerates into global inference.
S. L. XIA ET AL. Copyright © 2013 SciRes. ENG Figure 5. Comparison of distribution of test cases between proposed Semi-global model using D-PST and a global infe- rence method.1 Figure 6. Comparison of accumulation of test cases between proposed Semi-global model using D-PST and a global infe- rence method.1 selection in phenotype-protein network for gene prioriti- zation problem, which as a trial demonstrates a novel understanding of the well accepted belief that phenotyp- ically similar diseases are caused by functionally related genes. Effect of different relevance metrics o f profile vectors, different methods and variation of PST on the proposed model are discussed. The proposed model with Cosine similarity as relevance metric shows higher performance than model using other two metrics. Moreover, proposed model achieves performance improvement along with the increase of PST when Cosine similarity and Euclidean distance are adopted as relevance metrics. We have also shown proposed Semi-global model using D-PST exhi- bits higher average rank, fold enrichment and more ad- mirable distribution than global method . Further research includes configurations of Semi- global model (proper PST, Position Parameter and re- levance metric) t o ach iev e better performance, sensitivity of proposed model to noise of PPI data, and the issue of bias occurs in global inference. 6. Acknowledgements This research is supported by National Science Founda- tion of China 60973077. REFERENCES [1] D. Botstein and N. Risch, “Discovering Genotypes Un- derlying Human Phenotypes: Past Successes for Mende- lian Disease, Future Approaches for Complex Disease,” Nature Genetics, Vol. 33, 2003, pp. 228-237. http://dx.doi.org/10.1038/ng1090 [2] F. S. Turner, D. R. Clutterbuck and C. Semple, “Pocus: Mining Genomic Sequence Annotation to Predict Disease Genes,” Genome Biology, Vol. 4, 2003, p. R75. http://dx.doi.org/10.1186/gb-2003-4-11-r75 [3] J. Chen, C. Shen and A. Sivachenko, “Mining Alzheimer Disease Relevant Proteins from Integrated Protein Inte- ractome D ata ,” Pacific Symposium on Biocomputing, Vol. 11, 2006, pp. 367-378. [4] A. Hamosh, A. F. Scott, J. S. Amberger, C. A. Bocchini, and V. A. McKusick, “Online Mendelian Inheritance in Man (OMIM), a Knowledgebase of Human Genes and Genetic Disorders,” Nucleic Acids Research, Vol. 33, Database Issue, 2005. [5] E. Adie, R. R. Adams, K. L. Evans, D. J. Porteous and B. Pickard, “Speeding Disea se Gene Discovery by Sequence Based Candidate Prioritization,” BMC Bioinformatics, Vol. 6, 2005, p. 55. http://dx.doi.org/10.1186/1471-2105-6-55 [6] L. Sam, Y. Liu, J. Li, C. Friedman and Y. A. Lussier, “Discovery of Protein Interaction Networks Shared by Diseases,” Pacific Symposium on Biocomputing, Vol. 12, 2007, pp. 76-87. [7] G. Jiminez-Sanchez, et al., ”Human Disease Genes,” Na- ture, Vol. 409, 2001, pp. 853-854 http://dx.doi.org/10.1038/35057050 [8] M. Oti and H. G. Brunner, “The Modular Nature of Ge- netic Diseases,” Clinical Genetics, Vol. 71, 2007, pp. 1- 11. http://dx.doi.org/10.1111/j.1399-0004.2006.00708.x [9] J. H. Jing-Dong, “Understanding Biological Functions through Molecular Networks,” Cell Research, Vol. 18, 2008, pp. 224-237. http://dx.doi.org/10.1111/j.1399-0004.2006.00708.x [10] J. Chen, B. Aronow and A. Jegga, “Disease Candidate Gene Identification and Prioritization Using Protein Inte- raction Networks,” BMC Bioinformatics, Vol. 10, No. 1, 2009, p. 73. http://dx.doi.org/10.1186/1471-2105-10-73 [11] M. Oti, B. Snel, M. A. Huynen and H. G. Brunner, “Pre- dicting Di sease Genes Using Protein-Protein Interactions,” Journal of Medical Genetics, Vol. 43, 2006, pp . 691-698. http://dx.doi.org/10.1136/jmg.2006.041376 [12] S. Navlakha and C. Kingsford, “The Power of Protein
S. L. XIA ET AL. Copyright © 2013 SciRes. ENG Interaction Networks for Associating Genes with Diseas- es,” Bioinformatics, Vol. 26, 2010, pp. 1057-1063. http://dx.doi.org/10.1093/bioinformatics/btq076 [13] K. Lage, E. O. Karlberg, Z. M. Storling, P. I. Olason, A. G. Pedersen, O. Rigina, A. M. Hinsby, Z. Tumer, F. Po- ciot, N. Tommerup, Y. Moreau and S. Brunak, “A Human Phenome-Interactome Network of Protein Complexes Im- plicated in Genetic Disorders,” Nature Biotechnology, Vol. 25, 2007, pp. 309-316. http://dx.doi.org/10.1038/nbt1295 [14] X. B. Wu, R. Jiang, M. Q Zhang and S. Li, “Network- Based Global Inference of Human Disease Genes,” Mo- lecular Systems Biology, Vol. 4, 2008, p. 189. http://dx.doi.org/10.1038/msb.2008.27 [15] S. Kohler, S. Bauer, D. Horn and P. N. Robinson, “Walk- ing the Interactome for Prioritization of Candidate Dis- ease Genes,” The American Journal of Human Genetics, Vol. 82, No. 4, 2008, pp. 949-958. http://dx.doi.org/10.1016/j.ajhg.2008.02.013 [16] Y. Li and J. C. Patra, “Genome-Wide Inferring Gene- Phenotype Relationship by Walking on the Heterogene- ous Network,” Bioinf ormatics, Vol. 26, No. 9, 2010, pp. 1219-1224. http://dx.doi.org/10.1093/bioinformatics/btq108 [17] S. Erten and M. Koyut¨urk, “Role of Centrality in Net- work-Based Prioritization of Disease Genes,” Proceed- ings of the 8th European Conf. Evolutionary Computation, Machine Learning, and Data Mining in Bioinformatics (EVOBIO’10), Vol. LNCS 6023, 2010, pp. 13-25. [18] A. M. Edwards, B. Kus, R. Jansen, D. Greenbaum, J. Greenblatt and M. Gerstein, “Bridging Struct ural Biology and Genomics: Assessing Protein Interaction Data with Known Complexes,” Trends in Genetics, Vol. 18, No. 10, 2002, pp. 529-536. http://dx.doi.org/10.1016/S0168-9525(02)02763-4 [19] Online Mendelian Inheritance in Man, OMIM®. McKu- sick-Nathans Institute of Genetic Medicine, Johns Hop- kins University (Baltimore, MD), World Wide Web URL: http://omim.org/ [20] M. A. va n Driel, J. Bruggeman, G. Vriend, et al., ”A Text- Mining Analysis of the Human Phenome,” European Journal of Human Genetics, Vol. 14, 2006, pp. 535-542. http://dx.doi.org/10.1038/sj.ejhg.5201585 [21] D. Szklarczyk, A. Franceschini, M. Kuhn, M. Simonovic, A. Roth, P. Minguez, T. Doerks, M. Stark, J. Muller, P. Bork, et al., “The STRING Database in 2011: Functional Interaction Networks of Proteins, Globally Integrate d and Scored,” Nucleic Acids Research, Vol. 39, 2011, pp. D561- D568. http://dx.doi.org/10.1093/nar/gkq973 [22] E. Birney, D. Andrews, M. Caccamo, Y. Chen, L. Clarke, G. Coates, T. Cox, F. Cunningham, V. Curwen, T. Cutts, et al., “Ensembl 2006,” Nucleic Acids Research, Vol. 34, 2006, pp. D556-D561. http://dx.doi.org/10.1093/nar/gkj133 [23] U. Mudunuri, A. Che, M. Yi and R. M. Stephens, “bio- DBnet: The Biological Database Network,” Bioinformat- ics, Vol. 25, No. 4, 2009, pp. 555-556. http://dx.doi.org/10.1093/bioinformatics/btn654
|