Complex breast cancer network constructed from experimentally verified seventy genes, by coordinating standard seven human protein and genome databases, follows hierarchical scale free features. Centrality based method of identification of inferred genes is implemented to this network and has predicted forty nine breast cancer genes, and nineteen non-breast cancer genes. As predicting good candidate genes before experimental analysis will save time and effort both. Fourteen genes out of nineteen are found to involve in various types of cancer and diseases, and five genes are engaged in non-cancer diseases. Some of the inferred genes need proper experimental investigation to understand fundamental roles of these genes in regulating breast cancer network.
Breast cancer is the most common cancer in women world-wide and it has the ability to get inherited [
It has been reported that most of the existing networks in nature fall in one of the following nature, namely, scale-free, small world, random and hierarchical, and their combinations [
Integration of breast cancer data: We have incorporated six standard databases of cancer, namely, KEGG (Kyoto Encyclopedia of Genes and Genomes), CGC (Cancer Gene Census), BCGD (Breast Cancer Gene Database), CGAP (Cancer Genome Anatomy Project), GAD (Genetic Association Database) and NCG (Network of Cancer Genes), to obtain a comprehensive list of breast cancer genes. We extracted 2050 genes from these databases, out of which 1332 were found to be unique (
(http://www.genecards.com/) database. Following this method, we could arrive at unique 1332 genes. Now, data is further curated using Agilent literature sear- ch, a plugin of cytoscape. Finally, from the whole process we possessed the list of 70 genes out of 1332 unique. Now, the details of the genes extracted by mapping these genes to Uni Prot (January 2016).
Construction of primary network: The breast cancer network is constructed following simple rule of one gene one protein concept. The network was constructed using APID2NET plug-in implemented in cytoscape version 2.8.3, which was used to retrieve all the possible information from seven main resources namely the DIP (Database of Interacting Proteins), BIND (Bio- molecular Interaction Network Database), IntAct, MINT (Molecular Interactions Database), UniProt, BioGRID (The General Repository of Interaction Datasets) and HPRD (Human Protein Reference Database) [
Characterization of network compactness: LCP-DP approach: The LCP- decomposition-plot (LCP-DP) is two-dimensional representation of common neighbors ( C N ) index of interacting nodes and local community links ( L C L ) to characterize the topological properties of a network. It provides information on number, size, and compactness of communities in a network [
bound is defined by, max ( L C L ) = 1 2 C N ( C N − 1 ) , is the number of internal
links in local-community ( L C ) . These two nodes most probably link together if C N of these two nodes are members of L C [
The LCP correlation ( L C P - c o r r ) is the Pearson correlation co-efficient
of C N and L C L defined by L C P - c o r r = cov ( C N , L V L ) σ C N σ L C L with C N > 1 ,
where cov ( C N , L V L ) is the covariance between C N and L C L , σ C N and σ L C L are standard deviations of C N and L C L respectively.
Constant pott’s model: energy distribution in a network. The state of a persisting system can be estimated by calculating the difference in the HE (Hamiltonian Energy) between two ensemble states of the system. HE based calculation was done for a network or module by considering hub influencing modules. We then identified the modules where particular hub is present at each level. HE of the system having these modules were calculated according to the formalism built by Constant Potts Model [
H [ s ] = − ∑ C e C − γ n C 2
where e C and n C number of edges and nodes in a community (“C”) and γ is the resolution parameter acting as edge density thresh hold. in general, γ
should be ≤ 1 ( n C ) 2 .
Centrality based link prediction: Since centrality measurements can characterize the most influencing candidates in a network, which are capable of fast information propagation, reception, and sensitivity to the local and global perturbations, it can be used as a method to identify important fundamental regulators. For each of the centrality Degree, Betweenness, Closeness and Eigenvector, we computed the centrality score (using CytoNCA) for each node in the breast cancer network [
The complex breast cancer network constructed from experimentally verified seventy genes obeys hierarchical characteristics [
F = ( P , C , C N ) T , F i ( a k ) F i ~ a d i , where a is a constant scale factor with d i as the
fractal dimension of the i t h component of F. Hence, the network properties indicate that the breast cancer network follows hierarchical scale free fractal network [
Similarly, the centrality parameters, namely, betweenness ( C B ) , closeness ( C C ) and eigen-vector ( C E ) centralities of the network also exhibit fractal behavior (
properties of the breast cancer network can be represented by, Γ = ( F , G ) T ,
Γ j ( c k ) Γ j ( k ) = C D j where, D = ( d , q ) T maintaining fractal properties.
Now following the centrality measurements based methodology (see in Methods), we examined the first top twenty genes each identified by each centrality and degree measurements (
Cancer associated genes: This category holds two sub-categories depending upon the source. Thus, 14 genes include 10 genes that acquired association to cancer from literature while the other 4 genes from NCG database (
These 4 genes, namely, CTNNB1, HS90B, NMP, and PAPB1 are obtained after verification from NCG (
Non-cancer associated genes: This category holds only 5 genes that were found associated to other diseases but not cancer (neither in NCG nor in Literature). Out of these five genes EF1A1, 1433G, RL23, RL24 and RS26, RS26 is correlated to the conjunctival cancer.
Further, after removing breast cancer related genes from the list, the highly repeated genes (EF1A1, HS90B, CTNB1, KU70, 1433Z) in the four measurements are most probably important inferred genes which help in regulating bre- ast cancer regulatory network and their regulating roles should be significant other than other inferred genes. Hence, we further study the topological properties of the sub-networks associated with these genes for understanding their activities (Figures 4(a)-(e)). These sub-networks still follow hierarchical scale free characteristics, may be inherited from the main network obeying fractal property of the network. These sub-networks are compact (all L C P - c o r r s > 0.8 ) where nodes are tightly bound (see Method), their sizes are in the range (26 - 170) and the points in the LCP-DP plots indicating strong linkage of the nodes in each sub-network [
P L C P = x i x N ; i = 1 , ⋯ , 5 , where x i is the value of LCP-correlation of i t h
sub-network and x N is the LCP-correlation of the complete breast cancer network. Since the calculated PLCP values of EF1A1 and 1433 Z are largest, the sub-networks corresponding to these inferred genes strongly correlate with the breast cancer network, and actively regulate it. Whereas, CTNB1 has lowest
Genes | Cancer type involved | Disease other than cancer | Reference |
---|---|---|---|
1433Z (Cc) | Prostrate; Lung | Creutzfeldt-Jakob Disease; Prion Disease; Gerstmann-Straussler Disease; Post-Vaccinal Encephalitis etc. | [ |
CTNB1 (Cc) | Colorectal cancer, somatic; Hepatocellular carcinoma, somatic; Ovarian cancer, somatic; | Pilomatricoma, somatic; Mental retardation, autosomal dominant 19 | [ |
TF65 (uCb) | Lung cancer; Prostrate cancer; Leukemia; | HIV-1 | [ |
CKS1 (uCb) | Multiple Myeloma; Hepatocellular Carcinoma; Cervical Squamous Cell Carcinoma; Oral Squa-mous Cell Carcinoma | Clear Cell Adenofibroma; | [ |
EF1A1 (*) | BardetBiedl Syndrome; Fusariosis; Cutaneous Anthrax; Tinea Nigra; Hepatitis | [ | |
1433G (uCb) | Intellectual dissablity | [ | |
KU70 (d) | Renal Cell Carcinoma; Hepatocellular Carcinoma; Lung Cancer. | Post encephalitic Parkinson Disease; DNA Ligase Iv Deficiency; Middle Cerebral Artery Infarction; Systemic Lupus Erythematosus; Lupus Erythematosus; Neuroblastoma | [ |
HS90B (d) | multiple cancer types | [ | |
SF3B3 (Ce) | Breast (ER+ cells) | [ | |
RL11 (Ce) | Gastric cancer | DiamondBlackfan Anemia 1; Pierre Robin Syndrome; Congenital Hypoplastic Anemia; Macrocytic Anemia; | [ |
RS3A (Ce) | Lung; Hepatocellular Carcinoma; | Diamond Blackfan Anemia; Huntington Disease | [ |
HNRPU (Ce) | Renal Cancer; Pancreatic Ductal Adenocarcinoma; Cervical Cancer (somatic); | Myotonic Dystrophy 1; Becker Muscular Dystrophy | [ |
RL6 (Ce) | Gastric Cancer; TCell Leukemia | Noonan Syndrome 1 | [ |
RL23 (Ce) | Neuroblastoma; Myelodysplastic Syndrome | [ | |
NPM (uCe) | Leukemia, acute myeloid, somatic | [ | |
RL26 (uCe) | Conjunctival Cancer; | Diamondback Anemia; Macrocytic Anemia; Pierre Robin Syndrome; Rpl26 Related Diamond Blackfan Anemia; | [ |
RL24 (u) | Bone Remodeling Disease; Bone Resorption Disease; Cauda Equina Neoplasm; Diamond Blackfan anemia. | [ | |
RS26 (u) | Conjunctival | Macrocytic Anemia; Pierre Robin Syndrome; Diamond Blackfan Anemia | [ |
PABP1 (u) | Lymphoma; Gastric cancer | [ |
Literature (cancerous genes): green; NCG (cancerous genes): yellow; not cancer white: white; (*): Commonly appeared in all (i.e. four) centralities(Cc); (Cc): Gene appeared in both Betweeness (Cb) and Closeness centralities (Cc); (uCb): Gene unique Betweeness (Cb); (d): Gene appeared in both Degree (d) and Closenesscentralities (Cc); (Ce): Gene appeared in both Degree (d) and Eigenvector centralities (Ce); (uCe): Gene unique to Eigenvector centralities (Ce).
P L C P value indicating weak correlation and regulation of corresponding sub-network to the breast cancer network (
network H j given by, P L C P = H s N s ; i = 1 , ⋯ , 5 . The calculated P H of sub-
networks corresponding to EF1A1 and HS90B show largest values, and those of CTNB1 and 1433 Z show smallest values indicating strong and weak distribution of energies in their respective sub-networks.
Complex breast cancer network constructed from experimentally verified seventy genes follows hierarchical scale free network which involves interaction of emergent diverse modules and sparsely distributed hubs in regulating the network. Regulation of this network is done by various breast and non-breast cancer genes. These genes can be identified by centrality based measurements which is an important method for identifying inferred genes [
KC and RKBS are financially supported by UPE-II sanction No. . SA and MZM is financially supported by Indian Council of Medical Research under SRF (Senior Research Fellowship).
Chirom, K., Ali, S., Malik, M.Z., Ishrat, R. and Singh, R.K.B. (2017) Identification of Inference Genes in Breast Cancer Network. Journal of Biosciences and Medicines, 5, 29-42. https://doi.org/10.4236/jbm.2017.59004