J. Abraham, T. LaFramboise / Open Journal of Genetics 2 (2012) 131-135
134
consequence of multiple testing.
3. RESULTS
The analysis described in the previous section was per-
formed on all three data sets. For a preliminary analysis
of the difference between the populations we focused on
p-values less than 0.005 and on correlations between
CNPs in different chromosomes. If the chromosomes are
different then the CNPs may be considered to be truly
independent and any correlation found is an indication of
the nonrandom nature of Copy Number Variants. It was
found that there were 33 such p-values in the YRI popu-
lation, 5 p-values in the CEU population and just 3 in the
CHBJPT population. The range of p-values is also dif-
ferent, in the YRI population the lowest p-value is 1.678
× 10−6, in the CEU population it is 1.127 × 10−3 and in
the CHBJPT population it is 1.169 × 10−3. It is also
noteworthy that the 7 lowest p-values in the YRI analysis
are smaller than the smallest p-values in the other popu-
lations. This is suggestive of the possibility that the cor-
relation structure between CNPs differs in different
populations. However, this might just be due to the fact
that the number of tests performed is much larger in the
YRI population than in the other two populations.
To rule out this possibility, the permutation test as de-
scribed in the previous section was carried out; to be
conservative in the permuted tests correlations between
CNPs both in the same chromosome and in different
chromosomes were included to decide which small
p-values could possibly be the consequence of multiple
testing. Assuming a significance level of 0.05, none of
the associations found in the CEU and CHBJPT sample
were, significant after compensating for multiple testing.
However, the most significant association found in the
YRI sample (p-value 1.678 × 10−6) remained significant
with a p-value of < 0.02 even after taking into account
multiple testing. This association is between polymor-
phisms on chromosome 6 (hg17 coordinates 202,353 to
326,149) and on chromosome 16 (hg 17 coordinates
33,208,395 to 33,618,281) with minor level frequencies
of 0.1 and 0.15. Such minor level frequencies are not
atypical of the other two populations, what distinguishes
these two CNPs in the YRI sample is the extent to which
Copy Numbers at these locations are correlated. These
CNPs have identifiers 902 & 2172 in [12]. Using 2 as the
baseline for defining no extra copies in [12] all the YRI
individuals could be considered have either one or two
extra copies at these locations. In the YRI population
unrelated individuals with two extra copies at one loca-
tion tend to have two extra copies at the other location.
4. DISCUSSION
Based on the discussion of the previous section we have
evidence for correlations in Copy Number Variants
which are statistically significant, and whose statistical
significance varies from population to population. This
represents a novel approach for analyzing Copy Number
Variants in the same population as well as for the con-
trasting the patterns of copy number variation in different
populations. Our methodology is conceptually similar to
using the structure of observed Linkage Disequilibrium
between markers and not just marker allele frequencies
in order to compare different populations. Furthermore,
the only significant long range correlations between
CNPs were found in the population where LD between
markers has the shortest range. It is also interesting to
note that no correlations of any significance were found
in the largest of all the samples, the CHBJPT sample.
This is not what one would expect if significant p-values
were determined only by sample sizes. Thus the differ-
ences observed between populations cannot be due to
different sample sizes, but may have their origins in the
differences in population histories. In Drosophila mela-
nogaster for example the pattern of Copy Number Varia-
tion is influenced by natural selection [11]. Selection can
also give rise to long range correlations between markers
[14]; this suggests that the strong correlations observed
in the YRI population could be driven by population ge-
netic events unique to that population. In this regard, it is
noteworthy that the region on chromosome 6 that we
identified overlaps with the location of the DUSP22 gene
which participates in the JnK signalling pathway [15]
whose role in cancer proliferation [16] is well docu-
mented. Any possible signals of selection in this region
region would be of considerable interest and worthy of
further study.
5. ACKNOWLEDGEMENTS
KJA was supported during the course of this investigation by the
United States Department of Agriculture, National Research Iniative
Grant USDA NRI-2009-03924 and also by the program Professor
Visitante do Exterior of Coordenação de Aperfeiçoamento de Pessoal
de Nível Superior (CAPES), Brasil. In addition, KJA wishes to thank
Prof. Cheryl Thompson for valuable and encouraging discussions.
REFERENCES
[1] Sebat, J., Lakshmi, B., Troge, J., et al. (2004) Large-scale
copy number polymorphism in the human genome. Sci-
ence, 316, 445-449. doi:10.1126/science.1138659
[2] Iafrate, A.J., Feuk, L., Rivera, M.N., et al. (2004) Detec-
tion of large-scale variation in the human genome. Nature
Genetics, 36, 949-951. doi:10.1038/ng1416
[3] Feuk, L., Carson, A.R. and Scherer, S.W. (2006) Struc-
tural variation in the human genome. Nature Review Ge-
netics, 7, 85-97.
[4] Cooper, G.M., Nickerson, D.A. and Eichler, E.E. (2007)
Copyright © 2012 SciRes. OPEN ACCESS