MicroPath-A pathway-based pipeline for the comparison of multiple gene expression profiles to identify common biological signatures

doi:10.4236/jbise.2009.22020

Paper Menu >>

Journal Menu >>

J. Biomedical Science and Engineering, 2009, 2, 106-116

Published Online April 2009 in SciRes. http://www.scirp.org/journal/jbise JBiSE

MicroPath-A pathway-based pipeline for the

comparison of multiple gene expression profiles to

identify common biological signatures

Mohsin Khan1, Chandrasekhar Babu Gorle1, Ping Wang3, Xiao-Hui Liu2, Su-Ling Li1

1Molecular Immunology & bioinformatics Group, Microarray Facility, Division of Bio-Sciences, Brunel University, Uxbridge, UB8 3PH, UK; 2Intelligent Data

Analysis Group, Department of Information Systems and Computing, Brunel University, Uxbridge, UB8 3PH, UK; 3Immunology Group, Institute of Cell and

Molecular Sciences, Barts and London School of Medicine, London, UK. Correspondence should be addressed to Su-Ling Li (su-ling.li@brunel.ac.uk)

Received Jan. 2nd, 2009; revised Feb. 15th, 2009; accepted Mar. 4th, 2009.

ABSTRACT

High throughput gene expression analysis is

swiftly becoming the focal point for deciphering

molecular mechanisms underlying various dif-

ferent biological questions. Testament to this is

the fact that vast volumes of expression profiles

are being generated rapidly by scientists

worldwide and subsequently stored in publicly

available data repositories such as ArrayEx-

press and the Gene Expression Omnibus (GEO).

Such wealth of biological data has motivated

biologists to compare expression profiles gen-

erated from biologically-related microarray ex-

periments in order to unravel biological mecha-

nisms underlying various states of diseases.

However, without the availability of appropriate

software and tools, they are compelled to use

manual or labour-intensive methods of com-

parisons. A scrutiny of current literature makes

it apparent that there is a soaring need for such

bioinformatics tools that cater for the multiple

analyses of expression profiles.

In order to contribute towards this need, we

have developed an efficient software pipeline for

the analysis of multiple gene expression data-

sets, called Micropath, which implements three

principal functions; 1) it searches for common

genes amongst n number of datasets using a

number crunching method of comparison as

well as applying the principle of permutations

and combinations in the form of a search strat-

egy, 2) it extracts gene expression patterns both

graphically and statistically, and 3) it streams

co-expressed genes to all molecular pathways

belonging to KEGG in a live fashion. We sub-

jected MicroPath to several expression datasets

generated from our tolerance-related in-house

microarray experiments as well as published

data and identified a set of 31 candidate genes

that were found to be co-expressed across all

interesting datasets. Pathway analysis revealed

their putative roles in regulating immune toler-

ance. MicroPath is freely available to download

from: www.1066technologies.co.uk/micropath.

Keywords: Co-Expression Analysis, Microarray,

Permutations and Combinations, Multiple Gene

Expression Analysis

1. INTRODUCTION

There is a general consensus amongst scientists and re-

searchers that the fundamental asset of microarray tech-

nology lies in its inherent ability to produce a global

snapshot of the cellular state in the milieu of any given

biological question. It is therefore not surprising that

microarrays have revolutionised the field of molecular

biology by offering an efficient and cost effective me-

dium for biologists to quantify mRNA transcript levels

of several thousands of genes concurrently in order to

observe specific states of the transcriptome (in response

to a particular treatment or specific time point). Owing to

this innate faculty to decipher the transcriptome, gene

expression profiles pertaining to a wide variety of bio-

logical questions are being rapidly generated by scien-

tists worldwide and are deposited and subsequently made

accessible through public repositories such as ArrayEx-

press [1] and the Gene Expression Omnibus [2]. With so

much wealth of high throughput biological data made

available, biologists have become motivated to utilise

these sets of data in an attempt to investigate common

regulatory signatures, which may be implicating the

transcriptome state across multiple gene expression pro-

files sharing a similar biological theme. One of the most

widely accepted methodologies of comparing expression

profiles is based on the assumption that genes across

different biological conditions sharing similar expression

patterns are likely to be involved in the same biological

processes [2], and therefore, may share common regula-

tory signatures. By using this method of comparison,

which is one of the most successful methods to date,

coupled with the availability of publicly available data

M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116 107

repositories offering gene expression profiles, biologists

have been granted the opportunity to answer complex

biological questions pertinent to biological phenomena

underlying various different disease states.

To this end, we have developed a novel bioinformatics

software pipeline called MicroPath, which specialises in

the cross comparison of multiple gene expression data-

sets and attempts to identify common regulatory signa-

tures from the standpoint of molecular pathway analysis.

When one scrutinises current literature relevant to auto-

mated solutions of gene expression analysis, it becomes

apparent that there is an increasing demand for software

applications that offer an efficient pipeline to the analysis

of multiple gene expression profiles. Although current

meta-analyses studies have been conducted with the

purpose of employing statistical techniques to compare

cDNA and affymetrix gene expression profiles [3,4,5,6],

it cannot be denied that there is a mounting need for this

process to be automated. Nevertheless, various ap-

proaches/algorithms of statistical nature have already

been implemented with the purpose of identifying the

most relevant pathways in a given experiment [7,8,9]

together with methods such as Gene Set Enrichment

Analysis (GSEA), which ranks genes based on the cor-

relations between their expressions and observed pheno-

types in the context of biological pathway discoveries

[10]. There are also tools available that functionally an-

notate gene expression data [11,12]. Albeit, it remains

infeasible for biologists to cross compare several expres-

sion profiles without an automated solution, and hence,

they are faced with the labour-intensive task of employ-

ing manual methods to carry out their comparisons. Mi-

croPath uses the meta-analytic standard and has been

specifically developed to: compare several significantly

expressed sets of genes in order to find the intersection

of common genes using both number crunching methods

as well as the classical permutation and combination

principle, extract putative regulatory signatures using

both statistical and graph-based approaches and finally,

mapping these sub-sets of co-expressed genes to mo-

lecular pathways all in the form of a high throughput

pipeline.

2. IMPLEMENTATION

The front-end of MicroPath was developed in Visual

Basic.Net and Perl, and the database back-end was de-

veloped in MySQL. Upon analysing the users input files

(gene expression profiles), processed data is displayed

intuitively on the graphical user interface, which is

equipped with various interactive objects such as chart-

ing facilities, buttons, drop-down menus and user in-

put/output dialogues. The interface is also equipped with

a function to export processed data into Microsoft excel

for further scrutiny and use.

2.1. System Architecture

MicroPath carries out meta-profiling of multiple gene

expression datasets using two different approaches.

Firstly, the intersection of common genes is identified

across n number of expression profiles, which is then

plotted graphically using a simple number crunching

exercise. The second approach applies to a situation

where an attempt to identify common genes across n

number of expression profiles using the aforementioned

approach fails due to the absence of common genes

across all datasets (this situation is especially common

when a large number of expression profiles are compared,

which reduces the probability of finding a common gene

amongst them). Consequently, MicroPath applies the

permutations and combinations mathematical principle

to solve this problem (refer to implementation of

meta-analysis strategy below for details). Once the in-

tersection of a set of common genes has been identified

and subsequently displayed on the interface (using either

of the above methods), the next stage in the analysis is to

extract patterns from the intersection in order to identify

common genes that are being expressed in accordance

with the biological question. MicroPath offers a semi-

automated graph-based approach to achieve this as well

as classical statistics to identify the overall correlation of

gene expression. Finally, co-expressed genes (common

genes that are expressed in accordance to the relevant

biological question) are mapped to all molecular path-

ways known to date in order to reveal their molecular

dependencies (refer to Figure 1 for the complete system

architecture).

2.2. Implementation of Meta-analysis Strategy

In theory, an intersection of a sub-set of common genes

across multiple gene expression profiles should be eas-

ily attainable using simple number crunching methods

of comparison. In practice, this is not always the case

since the likelihood of identifying genes sharing com-

mon accession identifiers decreases as the number of

profiles to compare increases. This inverse relationship

makes sense both mathematically and biologically.

From a biological perspective, regulatory signatures

tend to be diluted over entire datasets and as a result,

only a proportion of the total number of profiles to

compare may actually share common genes. In such a

scenario, using a simple method of comparison would

break down at some point and no common genes

would be reported to the user, although common genes

may be present within n-1 expression profiles. To pre-

vent potentially interesting biological findings to be

hampered at this point in the analysis, we have applied

the principle of mathematical combinations to the

comparison of multiple gene expression profiles. All

possible combinations of comparing n number of

datasets with each other are firstly computed using the

combination equation:

108 M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116

This generates the total number of permutations of

comparing datasets (Cr) for given values of n (total

number of datasets imported by user) and r (number of

intended datasets used to search for common genes when

zero common genes are reported across n datasets) (Ta-

ble 1).

Figure 1. Functions of MicroPath. Users are prompted to import up to 10 gene expression profiles, which are then compared using a

direct comparison method. If this method yields zero common genes, MicroPath automatically attempts to identify an intersection of

common genes by reducing the search space to n-1 datasets using permutations and combinations. This process is continued until at

least 1 common gene is reported. Following this, users are provided with a function to search for expression patterns graphically and gene

expression correlations are calculated statistically using the pearson’s correlation coefficient algorithm. Finally, co-expressed genes are

mapped to all molecular pathways of KEGG in a high throughput fashion by automatically accessing its API via SOAP-Lite.

M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116 109

Table 1. Multiple gene expression profile search strategy generated from applying the principle

of permutations and combinations. The first column represents the total number of expression

datasets, n, that users may import (this is the search space). The second column represents, r,

the number of expression datasets to compare if zero common genes are reported to be

matched across n datasets. The final column represents the total number of mathematical com-

binations possible for each given value of n and r.

Total number of

expression datasets (n)

Number of intended

expression datasets to

compare when

comparing n datasets

yields no results (r)

n - r

Total number of

combinations of r (Cr)

10 9 1 10

10 8 2 45

10 7 3 120

10 6 4 210

10 55252

10 4 6 210

10 3 7 120

10 2 8 45

9 8 1 9

9 7 2 36

9 6 3 84

9 5 4 126

9 4 5 126

9 3 6 84

9 2 7 36

8 7 1 8

8 6 2 28

8 5 3 56

8 4 4 70

8 3 5 56

8 2628

7 6 1 7

7 5 2 21

7 4 3 35

7 3 4 35

7 2521

6 5 1 6

6 4 2 15

6 3320

6 2 4 15

5 4 1 5

5 3 2 10

5 2 3 10

4 3 1 4

4 2 2 6

3 2 1 3

These combinations of datasets (Cr) are then used as a

criterion to search for common genes across r number of

gene expression profiles when comparing n number of

datasets fail to yield any common genes. However in this

scenario, n number of datasets are still used as the search

space from which all possible combinations (Cr ) of r

datasets are compared to each other in order to increase

the probability of finding a common gene. Once common

genes have been identified using this method, MicroPath

will report the results to the interface.

2.3. Extracting Gene Expression Patterns

Graphically and Statistically

Following the identification of common genes across n

datasets using either of the methods described earlier, the

next stage in the analysis is to generate a graphical repre-

sentation of this expression data from which biologically

meaningful patterns can be extracted. Because signals

pertaining to transcriptome states tend to be diluted over

entire profiles, a specific criterion is required to narrow

down the common genes of interest to include only those

genes that are consistently regulated according to the bio-

logical question. The assumption we have made is that any

given common gene across n datasets can exhibit one of

three specific behaviours. It can either be consistently

upregulated across all datasets, downregulated across all

datasets and up or downregulated across all datasets.

Based on the nature of the specific biological question,

users can select the appropriate pattern from the options,

which will result in a graphical display of those genes

which satisfy the search criteria. Together with this faculty

to graphically extract patterns for individual gene expres-

sion data points, MicroPath also implements the pearsons

correlation coefficient statistical test in order to extract a

global gene expression pattern existing between common

genes pertaining to two individual expression profiles. The

correlations are calculated in a pair-wise manner until

each expression data has been statistically compared to all

other datasets within n, according to the pearsons correla-

tion coefficient equation:

(

)

(

)

() ()

⎟

⎠

⎞

⎜

⎝

⎛−

⎟

⎠

⎞

⎜

⎝

⎛−

−

∑∑

∑

110 M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116

Each pair-wise score is then finally averaged in order

to provide a global measure of correlation existing be-

tween n expression profiles. Scores are reported from-1

(perfect negative correlation) to+1 (perfect positive cor-

relation).

2.4. High Throughput Molecular Pathway

Analysis

To decipher molecular mechanisms fundamental to the

researcher’s biological question, it is necessary to map

common gene expression profiles of co-expressed genes

to molecular pathways. This is because biological path-

ways reveal molecular dependencies that exist between

genes by illustrating how they collaborate with one an-

other when they participate in specific biological func-

tions. Furthermore, pathways reveal various signalling

cascades that play imperative roles in dictating these

gene associations. In light of this, we have implemented

Micropath to access the Application Programming Inter-

face (API) of the molecular pathway database belonging

to KEGG [13] using SOAP-Lite in order to dynamically

interact with the static pathway maps. Perl scripts were

written for MicroPath to specifically 1) search for user’s

co-expressed genes in all biological pathways, 2) high-

light genes on to pathways, and 3) return the results of

the search to Micropath’s interface (i.e. URL’s of colour

coded pathway maps) (Figure 2). Once MicroPath has

searched for all of the user’s co-expressed genes in all of

the molecular pathways, the URL of each pathway is

displayed on the sub-interface. In order to avoid redun-

dancy issues, the URL for each pathway will highlight

all co-expressed genes that participate in a given path-

way. To help users identify biologically meaningful

pathways relevant to their specific biological question,

MicroPath will calculate the number of genes identified

in a given pathway and 1) express this as a percentage in

relation to the total number of common genes from the

intersection and 2) express this as a percentage in rela-

tion to the total number of genes belonging to that path-

way.

Clicking on these links will generate the specific

KEGG pathway in HTML on which users co-expressed

genes will be highlighted.

Figure 2. Flow diagram of how MicroPath carries out high throughput molecular pathway analysis by connecting to the API of KEGG.

M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116 111

2.5. Generating and Processing Gene Ex-

pression Datasets

Gene expression datasets used for the purpose of this

article were generated from our in-house microarray ex-

periments as well as published datasets, where the fold

change approach was used to select a set of differentially

expressed genes from pre-processed data. Matchminer

[14] and the Synergizer [15] tools were used to convert

gene Hugo identifiers and long names into Genbank ac-

cession Id’s in order to ensure that the gene identifiers

were of the same type across all datasets prior to com-

parison. Raw expression data was generated, filtered and

normalised using GenePix pro 4.1 [16] and Acuity 4.0

[17] software. Although we used cDNA microarray data

for the purpose of demonstrating MicroPath’s capabilities,

other data types generated from different platforms such

as affymetrix can also be analysed provided Genbank ac-

-cession identifiers are used to represent the genes.

3. RESULTS AND DISCUSSION

Regardless of the biological question, a typical microar-

ray experiment almost always results in the generation of

a set of differentially expressed genes, which represents

genes of most importance to the biologist. Therefore, by

carrying out several biologically related microarray ex-

periments, several sets of differentially expressed genes

would be generated, which would need to be compared

and mined efficiently in order to help answer the bio-

logical questions asked by the investigators from differ-

ent research laboratories around the world. Employing

manual methods of comparison in this situation would be

very inefficient and infeasible. In light of this, to demon-

strate the benefits that can be derived from analysing

multiple gene expression profiles using MicroPath, we

employed datasets generated from our in-house microar-

ray experiments as well as published data. The biological

question related to these studies focussed on unravelling

the underlying molecular mechanisms dictating immune

tolerance by analysing the role of Egr-2 in implicating

T-cell tolerance. Although the Early Growth Response

gene (Egr-2) has been recently characterised as a candi-

date tolerance-inducing transcription factor, which inter-

acts with specific genes in order to induce the state of

T-cell tolerance [18,19], the possibility of further puta-

tive unknown target genes exists that may be vital to the

mechanism of tolerance. Hence, the biological purpose

of our experiments was to attempt to identify such po-

tentially important genes via the comparison of biologi-

cally related expression datasets using MicroPath.

Data consisting of a set of differentially expressed

genes generated from the comparison of tolerance Vs

activated mice CD4+ T cells was obtained from the Ar-

rayExpress website (accession number: e-mexp-283).

The first in-house experiment aimed to generate differ-

entially expressed genes from the comparison of an

un-stimulated T cell line from which the Egr-2 gene had

been knocked out and a wild type un-stimulated cell line.

The second in-house experiment focussed on the com-

parison between an Egr-2 knock-out T cell line activated

with CD3/CD28 for 6 hours and a wild type cell line also

activated with CD3/CD28 for 6 hours. Results generated

from these experiments were then compared with the

aforementioned published tolerance data using MiNer in

order to understand the molecular mechanisms control-

ling immune tolerance.

3.1. Comparison of Gene Expression Pro-

files Pertaining to Immune Tolerance

The first step in the analysis was to subject the above-

mentioned expression profiles to MicroPath in order to

identify genes amongst them that had the same accession

identifiers. Having done this, MicroPath identified 31

differentially expressed genes that were common to all

three expression datasets and generated a graph to de-

lineate their expression values (Table 2, Figure 3). A

simple number crunching exercise was used to perform

this task since its use generated a reasonable number of

common genes, which did not warrant the use of permu-

tations and combinations to perform the search. The next

step was to use these 31 differentially expressed genes as

a search space to determine those genes that have the

potential to be co-expressed. In order to do this, we em-

ployed MicroPath’s graphical utility to extract gene ex-

pression patterns, which led to the identification of 6/31

genes that were found to be upregulated in tolerance Vs

activated CD4+T-cells and downregulated in both

p-KOA0 Vs WTA0 and p-KOA6 Vs WTA6 datasets

(Table 2). The remaining 25 common differentially ex-

pressed genes were found to be highly and lowly ex-

pressed in tolerance and knock-out datasets respectively.

Statistical analysis revealed an overall pearson’s correla-

tion score of 0.109 from the pair-wise comparison of

tolerance data with p-KOA0 Vs WTA0 and a score of

-0.123 from the comparison of tolerance with p-KOA6

Vs WTA6. Furthermore, Reverse Transcriptase PCR

experiments confirmed that 15 genes from our tolerance

Vs activated data were found to be highly expressed in

immune tolerance and from these 15 genes, 8 were found

to be common amongst all three expression profiles (Ta-

ble 2).

Because Egr-2 has been previously characterised and

found to be highly upregulated in immune tolerance,

these results generated from MicroPath are biologically

significant because as expected, those genes that were

highly expressed in our tolerance Vs activated datasets

were found to be insignificantly expressed in our

p-KOA6 Vs WTA6 and p-KOA0 Vs WTA0 datasets

(from which the Egr-2 gene was knocked out of the cell

lines). Amongst these genes, Ap1s1, Shd, Surf6, Vil2,

Lilrb4, Tbx21 and Pdcd1lg2 (Table 2) have been con-

firmed to be upregulated in the process of immune toler-

ance [20], all of which were found to exhibit low expres-

sion values in our knock-out expression datasets. This

consistent gene expression pattern can be seen graphi-

cally in Figure 3. However, from the 31 interesting

common genes, 16 were not confirmed to be involved in

112 M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116

Table 2. Tabulated overview of gene accession ids, Hugo ids and fold change values belonging to 31 common

genes identified from the comparison of tolerant Vs activated CD4+T cells, p-KOA0 Vs WTA0 and p-KOA6 Vs

WTA6 expression datasets. Entries highlighted in bold represent genes that were found to be up-regulated in tol-

erance Vs activated CD4+ T cells and down-regulated in both p-KOA0 Vs WTA0 and p-KOA6 Vs WTA6 datasets.

Entries with * represent genes that have been confirmed to be highly expressed in tolerance by RT-PCR.

GeneID HUGOID FoldChange(p‐

KOA0VsWTA0)

FoldChan ge(p‐

KOA6VsWTA6)

Fol d Change

(ToleranceVs

activated)

NM_007381

Acadl

0.371336 0.6245256.373

NM_007457

Ap1s1

0.542474 0.31525 4.965

NM_007664

Cdh2

0.243646 ‐0.7999 1.658

NM_008205

H2-M9

‐0.080480.116434 2.857

NM_008972

Ptma

‐1.31334 ‐0.46688 5.42

NM_009128

Scd2

‐0.18816 ‐0.393664.552

NM_009168

Shd

‐0.17495 ‐0.535822.838

NM_009298

Surf6

0.272072 0.1263014.365

NM_009465

Axl

0.149539 1.4758063.836

NM_009510

Vil2

‐0.498240.319645 3.151

NM_010102

Edg6

0.313489 0.1326891.573

NM_010413

Hdac6

‐0.90335‐0.8226 4.745

NM_010548

Il10

3.083863 1.6607393.521

NM_010638

Bteb1

0.024803 ‐0.42533 1.613

NM_011125

Pltp

‐0.5354‐0.71558 4.363

NM_011620

Tnnt3

‐0.616460.035844 1.665

NM_011696

Vdac3

‐0.980840.191964 4.701

NM_011705

Vrk1

0.466922 ‐0.34601 2.032

NM_013488

Cd4

0.584494 0.4202774.905

NM_013490

Chka

‐2.13728 ‐0.694585.677

NM_013532

Lilrb4

0.792335 1.1108982.111

NM_013615

Odf2

2.776384 3.0044494.809

NM_013814

Galnt1

‐0.477520.500297 2.246

NM_013866

Zfp385

0.118995 0.4285911.664

NM_016772

Ech1

‐0.06660.053081 4.284

NM_019507

Tbx21

0.124767 ‐0.32731 1.595

NM_019561

Ensa

0.778767 ‐0.44703 1.718

NM_019777

Ikbke

0.291602 ‐0.00772 1.609

NM_020027

Bat2

0.291219 ‐0.23966 5.091

NM_021396

Pdcd1lg2

1.140087 0.0791823.921

NM_021538

Cope

0.154049 0.2645412.035

tolerance by RT-PCR yet some of them also exhibited a

coherent pattern of gene expression. For example, Ptma,

Scd2, Hdac6, Pltp and Chka were all highly expressed in

tolerance and conversely downregulated in both knock

out datasets. There is a possibility that these genes may

also be insignificantly expressed due to the absence of

Egr-2. However, conducting RT-PCR for these specific

genes would be required in order to confirm that their

over-expression results in T-cell tolerance.

3.2. Deciphering Gene Regulatory Networks

of Co-Xpressed Genes Via High Throug

-Hput Molecular Pathway Analysis

The final stage of the analysis entails using MicroPath’s

function to connect to the Application Programming In-

terface (API) of KEGG via SOAP-Lite in order to carry

out high throughput molecular pathway analysis. There-

fore, for this stage in the analysis, we used MicroPath to

map 31 of our co-expressed interesting genes to KEGG

pathways and from these 31 genes, 14/31 were identified

in a total of 31 molecular pathways (Table 3). Interest-

ingly, several of these pathways were related to the study

of immunology and illustrated biological networks such

as MapKinase, Jak-Stat, T-cell receptor signalling and

Cytokine-cytokine interactions. More specifically, the

Pdcd1lg2 gene (accession id: NM_021396) was identi-

fied in the Cell Adhesion Molecules (CAM) pathway

(Table 3) and studies have confirmed that the over-ex-

pression of Pdcd1lg2 has resulted in consistently low

levels of Interleukin-2 (IL-2) in naive CD4(+) T-cells

[21]. Further studies have correlated the over-expression

of this gene to the negative regulation of T-cell activation.

In one particular study, PDL2 (Pdcd1lg2) deficient mice

were created in order to characterise the function of this

gene in T-cell activation and tolerance, and results gen-

erated from this study suggested that Antigen-presenting

cells from PDL2-deficient mice were found to be more

potent in activating T-cells in vitro when compared to the

wild-type counterparts [22]. These findings are conclu-

sive and correlate well with the results generated from

our in-house microarray experiments because using Mi-

croPath to compare all three of our datasets followed by

extracting gene expression patterns from them resulted in

an important finding that Pdcd1lg2 was not only found to

be over-expressed in tolerance (fold change of 3.921),

but it was also under-expressed in our KOA0 Vs WTA0

and KOA6 Vs WTA6 knock-out datasets (with a fold

change of 1.140 and 0.079 respectively) (Table 2). This

M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116 113

particular finding is in agreement with the aforemen-

tioned studies, concluding that Pdcd1lg2 has a negative

inhibitory role towards the process of T-cell activation.

In addition, molecular pathway analysis of the Inter-

leukin-10 (IL-10) gene using MicroPath, identified its

role in the Cytokine-cytokine interaction, Jak-STAT and

T-cell receptor signalling pathways; all three of which

are important immunological pathways. IL-10 is a well

known cytokine, which has previously been shown to

successfully induce immune tolerance in Dendritic Cells

[23]. Results generated from MicroPath revealed that

IL-10 was highly expressed in our tolerance data with a

fold change of 3.521, which was found to be expressed

lower in our KOA0 Vs WTA0 profile (fold change:

3.084). Interestingly, following activated with

CD3/CD28 for 6 hours, its expression dropped signifi-

cantly to 1.66, perhaps attributable to the absence of

Egr-2. Likewise, other genes from the 31 co-expressed

interesting genes show similar patterns of expression and

perhaps may be candidate genes for Egr-2 mediated

T-cell tolerance. However, this is yet to be confirmed by

publications. Finally, the pathway analysis function of

MicroPath was used to calculate the percentage of genes

identified in each pathway in relation to 1) the intersec-

tion of common genes and 2) the total number of genes

comprising each pathway. From the results, the Cell Ad-

hesion Molecules (CAM) pathway was particularly sig-

nificant since 12.91% of the overall pathway was af-

fected by 6.84% of genes common to all 3 expression

profiles (Table 4).

Figure 3. A preliminary graphical overview of common interesting genes generated from the comparison of tolerant

Vs activated CD4+ T cells (green), p-KOA0 Vs WTA0 (red) and p-KOA6 Vs WTA6 (blue) expression datasets. It can

be seen that genes that are highly expressed in tolerance appear to be expressed poorly in the knock-out datasets.

This pattern is consistent throughout the 31 gene expression data points.

Table 3. Tabulated data generated from high throughput molecular pathway analysis of co-regulated genes. 14/31

common interesting genes were identified in a total of 31 molecular pathway maps of KEGG.

GenBank

Accession ID HUGO ID Pathway IDTotal No of

pathways

GenBank

Accession ID

HUGO

ID Pathway ID Total No of

pathways

NM_007381 Acadl

mmu00071

mmu00280

mmu00410

mmu00640

mmu03320

5 NM_009510 Vil2

mmu04670

mmu04810 2

NM_007664 Cdh2 mmu045141 NM_008205 H2-M9

mmu04514

mmu04612

mmu04940

NM_013488 Cd4

mmu04514

mmu04612

mmu04640

mmu04660

4 NM_013814 Galnt1mmu00512

mmu01030 2

NM_011696 Vdac3 mmu040201 NM_019777 Ikbke

mmu04010

mmu04620 2

NM_011125 Pltp mmu03320 1 NM_010102 Edg6 mmu04080 1

NM_016772 Ech1

mmu00350

mmu00362

mmu00628

3 NM_021396 Pdcd1lg2mmu04514 1

NM_010548 Il10

mmu04060

mmu04630

mmu04660

3 NM_013652 Ccl4

mmu04060

mmu04620 2

114 M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116

The fundamental strength of MicroPath stems from

the implementation of a novel search strategy for the

comparison of multiple gene expression profiles. Al-

though there are a few software that cater for multiple

gene expression comparison, there is currently no

software that searches for common genes beyond sim-

ple number crunching methods of comparison (Table

5). Just because a direct comparison of a given num-

ber of datasets may not yield any common genes, it

does not mean that the analysis should end here since

there is a potential to identify common genes across

n-1 profiles. MicroPath ensures that such genes are

identified, which current software would overlook.

When coupled with other important functions such as

pattern extraction and pathway analysis, it becomes

apparent that MicroPath would offer valuable assis-

tance to biologists wanting to decipher their high

throughput data.

Table 4. Results generated from pathway analysis showing the extent to which each pathway is affected by common genes from the

intersection. The percentages reflect the proportion of common genes that contribute towards controlling the proportion of each pathway.

Pathway ID Pathway Name GenBank

Accession ID Result from Analysis

mmu00071 Fatty Acid Metabolism NM_007381 3.26% of genes contribute 8.45% role in pathway

mmu00280 Valine, leucine and isoleucine degradation NM_007381 3.26% of genes contribute 2.73% role in pathway

mmu00410 Beta Alanine Metabolism NM_007381 3.26% of genes contribute 7.14% role in pathway

mmu00640 Propanoate Metabolism NM_007381 3.26% of genes contribute 5.88% role in pathway

mmu03320 PPAR Signalling Pathway NM_007381 3.26% of genes contribute 1.92% role in pathway

mmu04514 Cell Adhesion Molecules

NM_007664

NM_008205

NM_013488

NM_021396

12.91% of genes contribute 6.84 % role in pathway

mmu04612 Antigen Processing & Presentation NM_013488 3.26% of genes contribute 2.44% role in pathway

mmu04640 Hematopoietic Cell Lineage NM_013488 3.26% of genes contribute 0.76 % role in pathway

mmu04660 T Cell Receptor Signalling Pathway NM_013488

NM_010548 6.45 % of genes contribute 3.33 % role in pathway

mmu04020 Calcium Signalling Pathway NM_011696 3.26% of genes contribute 2.33 % role in pathway

mmu00350 Tyrosine Metabolism NM_016772 3.26% of genes contribute 2.17 % role in pathway

mmu04060 Cytokine-cytokine receptor interaction NM_010548

NM_013652 6.45 % of genes contribute 0.73 % role in pathway

mmu04630 JAK-STAT Signalling Pathway NM_010548 3.26% of genes contribute 3.85 % role in pathway

mmu04670 Leukocyte Transendothelial Migration NM_009510 3.26% of genes contribute 1.25 % role in pathway

mmu04810 Regulation of Actin Cytoskeleton NM_009510 3.26% of genes contribute 1.47 % role in pathway

mmu04940 Type I Diabetes Mellitus NM_008205 3.26% of genes contribute 4.35 % role in pathway

mmu00512 O-Glycan Biosynthesis NM_013814 3.26% of genes contribute 10 % role in pathway

mmu04010 MAPK Signalling Pathway NM_019777 3.26% of genes contribute 0.83 % role in pathway

mmu04620 Toll-Like Receptor Signalling Pathway NM_019777

NM_013652 6.45% of genes contribute 1.32 % role in pathway

mmu04080 Neuroactive Ligand-Receptor Interaction NM_010102 3.26% of genes contribute 1.15 % role in pathway

Table 5. Functional comparison of MicroPath to similar software packages and applications.

Function MicroPath EXPANDER

[24]

INCLUSIVE

[25]

Pathway

Studio [26] KEGG [13] BioCarta

[27]

MaXlab

[28]

Suitable for high throughput

data analysis YES YES YES YES NO NO YES

Suitable for comparing multiple

gene expression profiles YES YES NO YES NO NO YES

Implementation of efficient

algorithm to search for common

genes from n-1 datasets

YES NO NO NO NO NO NO

Graphical representation of gene

expression values from multiple

datasets

YES NO NO NO NO NO YES

Pattern extraction from Graph

data YES NO NO NO NO NO NO

Construction of pathway maps YES NO NO YES YES YES NO

Mapping gene expression data

to pathway maps YES NO NO YES NO NO NO

User interactive software (S) or

Database (D) S S S S D D S

M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116 115

4. Conclusion

In this article, we have illustrated the potential benefits

that can be derived from using MicroPath for the analy-

sis of multiple gene expression profiles. Each function of

the software has been developed to streamline the overall

analysis pipeline, providing users with a walkthrough of

how their data is biologically deciphered. Here, we have

applied to our software, microarray datasets generated

from different laboratories pertaining to the molecular

mechanisms underlying immune tolerance. However,

MicroPath is capable of analysing data for any given

biological question, whether the datasets are taken from

public repositories such as ArrayExpress or generated

from in-house microarray experiments. We believe that

its faculty to use both number crunching and permuta-

tions and combinations as the search strategy to identify

the intersection of common genes, coupled with its func-

tion to extract gene expression patterns graphically and

statistically makes this an attractive software for biolo-

gists to use. Finally, its ability to carry out live streaming

of mapping genes to biological pathways makes it a use-

ful tool for the automation of multiple gene expression

analysis.

Availability and requirements

Project name: MicroPath

Project home page: www.1066technologies.co.uk/mi-

cropath

Operating system(s): MicroPath has been tested on

Windows 2000, XP and Vista

Programming language: Visual Basic.Net, Perl

Other requirements: None

License: N/A

Any restrictions to use by non-academics: No

Acknowledgements

This study was supported by grants from the UK Medical Research

Council (MRC) (Grant number: G0300520).

REFERENCES

[1] U. Sarkans, H. Parkinson, G. G. Lara, A. Oezcimen, A. Sharma,

N. Abeygunawardena, S. Contrino, E. Holloway, P. Rocca-

Serra, G. Mukherjee, M. Shojatalab, M. Kapushesky, S. A. San-

sone, A. Farne, T. Rayner and A. Brazma. (2005) The ArrayEx-

press gene expression database: a software engineering and im-

plementation perspective. Bioinformatics 21(8): 1495- 1501.

[2] T. Barrett and R. Edgar. (2006) Mining Microarray Data at

NCBI’s Gene Expression Omnibus (GEO). Methods Mol Biol

338: 175-190.

[3] D. Ghosh, Barette, T. R., Rhodes, D. and Chinnaiyan, A. M.

(2003). Statistical issues and methods for meta-analysis of mi-

croarray data, A case study in prostate cancer. Funct. Integr.

Genomics 3, 180-188.

[4] D. R. Rhodes, T. R. Barrette, M. A. Rubin, D. Ghosh and A. M.

Chinnaiyan, (2002). Meta-analysis of microarrays: Interstudy

validation of gene expression profiles reveals pathway dysregu-

lation in prostate cancer. Cancer Res. 62, 4427-4433.

[5] D. R. Rhodes, J. Yu, K. Shanker, N. Deshpande, R. Varambally,

D. Ghosh, T. Barrette, A. Pandey and A. M. Chinnaiyan (2004).

Large-scale meta-analysis of cancer microarray data identifies

common transcriptional profiles of neoplastic transformation and

progression. Proc. Natl. Acad. Sci. USA 101, 9309-9314.

[6] J. Wang, K. R. Coombes, W. E. Highsmith, M. J. Keating and L.

V. Abruzzo (2004). Differences in gene expression between

B-cell chronic lymphocytic leukemia and normal B cells: A

meta-analysis of three microarray studies. Bioinformatics 20,

3166-3178.

[7] S. Draghici, P. Khatri, A. L. Tarca, K. Amin, A. Done, C.

Voichita, C. Georgescu and Romero, R. (2007). A systems biol-

ogy approach for pathway level analysis. Genome Res. 17,

1537-1545.

[8] J. Stelling, (2004). Mathematical models in microbial systems

biology. Curr. Opin. Microbiol. 7, 513-518.

[9] G. Joshi-Tope, M. Gillespie, I. Vasrik, P. D’Eustachio, E.

Schmidt, B. de Bone, B. Jassal, G. R. Gopinath, G. R. Wu, L.

Matthews, et al. (2005). A knowledgebase of biological path-

ways. Nucleic Acids Res. 33, D428-D432.

[10] A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L.

Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub,

E. S. Lander, et al. (2005). Gene set enrichment analysis: A

knowledge-based approach for interpreting genome-wide ex-

pression profiles. Proc. Natl. Acad. Sci. 102: 15545-15550.

[11] S. Khalid, F. Fraser, M. Khan, P. Wang, X. Liu and S. Li,

(2006a). Analysing Microarray Data using the Multi-functional

Immune Ontologiser. J. Integrative Bioinformatics 3, 25.

[12] S. Khalid, M. Khan, P. Wang, X. Liu and S. -L. Li, (2006b).

Application of bioinformatics in the design of gene expression

microarrays. Second International Symposium on Leveraging

Applications of Formal Methods, Verification and Validation

(isola 2006), pp. 146-160.

[13] M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno and M. Hattori,

(2004) The KEGG resource for deciphering the genome. Nucleic

Acids Res, 32.

[14] K. J. Bussey, D. Kane, M. Sunshine, S. Narasimhan, S. Nishi-

zuka, W. C. Reinhold, B. Zeeberg, W. Ajay and J. N. Weinstein,

(2003) MatchMiner: a tool for batch navigation among gene and

gene product identifiers. Genome Biology. 4, R27.

[15] G. F. Berriz and F. P. Roth, The Synergizer service for translat-

ing gene, protein, and other biological identifiers. (2008). Bio-

informatics. [Epub ahead of print].

[16] GenePix pro 4.1: http://www.axon.com

[17] Acuity 4.0: http://www.moleculardevices.com/pages/software/

gn_acuity.html

[18] M. Safford, S. Collins, M. A. Lutz, A. Allen, C. Huang, J.

Kowalski, A. Blackford, M. R. Horton, C. Drake, R. H.

Schwartz and J. D. Powell, (2005) Egr-2 and Egr-3 are negative

regulators of T cell activation. Nature Immunology 6 472-480.

[19] L. E. Warner, J. Svaren, J. Milbrandt and J. R. Lupski, (1999)

Functional consequences of mutations in the early growth re-

sponse 2 gene (EGR2) correlate with severity of human myeli-

nopathies. Hum. Mol. Genet. 8 1245-1251.

[20] P. O. Anderson, B. A. Manzo, A. Sundstedt, S. Minaee, A. Sy-

monds, S. Khalid, M. E. Rodriguez-Cabezas, K. Nicolson, S. Li,

D. C. Wraith and P. Wang, (2006) Persistent antigenic stimula-

tion alters the transcription program in T cells, resulting in anti-

gen-specific tolerance. European Journal of Immunology. 36,

1374-85.

[21] H. Kuipers, F. Muskens, M. Willart, D. Hijdra, F. B. van As-

sema, A. J. Coyle, H. C. Hoogsteden and B. N. Lambrecht

(2006). Contribution of the PD-1 ligands/PD-1 signaling path-

way to dendritic cell-mediated CD4 (+) T cell activation. Euro-

pean Journal of Immunology. 36 (9), 2472-82.

[22] Y. Zhang, Y. Chung, C. Bishop, B. Daugherty, H. Chute, P.

Holst, C. Kurahara, F. Lott, N. Sun, A. A. Welcher and C.

Dong, (2006). Regulation of T cell activation and tolerance

by PDL2. Proc Natl Acad Sci U S A, 103(31), 11695-11700.

116 M. Khan et al. / J. Biomedical Science and Engineering 2 (2009) 106-116

[23] X. Li, K. Dou, H. Liu, F. Zhang and L. Cai, (2007). Immune

tolerance induced by IL-10 and methylprednisolone modified

dendritic cells in vitro. Chinese Journal of cellular and molecular

Immunol. 23 (5), 436-8.

[24] R. Shamir, A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R.

Sharan, Y. Shiloh and R. Elkon, (2005) EXPANDER-an integra-

tive program suite for microarray data analysis. BMC Bioinfor-

matics, 6: 232.

[25] G. Thijs, Y. Moreau, F. D. Smet, J. Mathys, M. Lescot, S. Rom-

bauts, P. Rouze, B. D. Moor and K. Marchal, (2002) INCLUSive:

Integrated Clustering, Upstream sequence retrieval and motif

Sampling. Bioinformatics, 18, 331-332.

[26] A. Nikitin, S. Egorov, N. Daraselia and I. Mazo,. (2003) Path-

way studio-the analysis and navigation of molecular networks.

Bioinformatics, 19, 2155-2157.

[27] BioCarta, Charting pathways of life. http://www.biocarta.com.

[28] S. Khalid, M. Khan, C. B. Gorle, K. Fraser, P. Wang, X. Liu and

S. Li, MaXlab: A novel application for the cross comparison and

integration of biological signatures from microarray studies. In

Silico Biology 8, 0029: 2008.