The genome of herpes simplex virus type 1 is prone to form short repeat sequences

doi:10.4236/jbm.2013.13006

Paper Menu >>

Journal Menu >>

Journal of Biosciences and Medicines, 2013, 1, 26-30 JBM

http://dx.doi.org/10.4236/jbm.2013.13006 Published Online December 2013 (http://www.scirp.org/journal/jbm/)

OPEN ACCESS

The genome of herpes simplex virus type 1 is prone to form

short repeat seq uen c es

Xiangyan Zhao1*, Xiaolong Wu1*, Lv Qin2, Zhongyang Tan1#, Shifang Li2, Qingjian Ouyang1,

You Tian1

1College of Biology, State Key Laboratory for Chemo/Biosensing and Chemometrics, Hunan University, Changsha, China

2State Key Laboratory of Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural

Sciences, Beijing, China

Email: #zhongyang@hnu.edu.cn

Received September 2013

ABSTRACT

Herein, we report a very high content of simple se-

quence repeats (SSRs) covering 66.12% of the herpes

simplex virus type 1 (HSV-1) genome when a low

threshold is adopted to define SSRs, indicating that

repeat sequence is a very important character of the

HSV-1 genome. The repeats with two iterations ac-

count for 68.33% of the total repeats. In reality, the

genome of HSV-1 is prone to form shorter repeat se-

quences. For mono-, di- and trinucleotide repeats, the

repeat numbers decreased with the increase of re-

peats iterations, implicating that the formation ten-

dency of SSRs might be from low iterations to high

iterations. The high iterations SSRs might have sub-

jected to strong selected pressure and survived to

perform different functions. The analysis suggested

that the repeats formation may be an essential evolu-

tionary driving force for the HSV-1 genome, and the

results might be helpful for studying the genome

structure, repeats genesis and genome evolution of

HSV-1.

Keywords: Simple Sequence Repeat; HSV-1 Genome;

Microsatellite; SSR

1. INTRODUCTION

Herpes simplex virus type 1 (HSV-1) is a member of the

Simplexvirus genus in the Herpesviridae family and it is

widely distributed in the human population. HSV-1 is the

leading cause of viral induced blindness [1]; it gives rise

to a variety of clinical disorders and is a major cause of

morbidity and mortality worldwide [2]. The global pre-

valence of HSV-1 is approximately 90%, including 65%

more or less in the USA [3] and 52% - 67% in northern

Europe [4]. Like other creatures, the HSV-1 genome

contains amounts of repeats. What’s interesting is that

the content of GC repeats is far high in the HSV-1 ge-

nome, which may be related to some pathogenesis of

HSV-1 [5]. To our knowledge, the small scope of muta-

tion in repeated regions can cause many diseases [6]; in

addition, expansion or contraction of repeats can change

the sequence length [7]. Therefore, researching the re-

petitive sequences we can learn something not only

about the interrelationship between genome structure and

pathogenesis, but about genome evolution of the HSV-1

from the perspective of molecular biology.

In recent decades, increased attention has been paid to

genomic repeat sequences. According to differential

lengths of repetitive units, the repeat sequences are usu-

ally divided into three basic types: microsatellites, mini-

satellites and satellite DNA [8]. Mono-, di-, tri-, te- tra-,

penta- and hexanucleotide repeats are usually classified

as microsatellite-also referred to as short tandem repeats

(STRs) or simple sequence repeats (SSRs) [9], and it

widely studied in the genomes of prokaryotes and euka-

ryotes. Repeats of longer units are referred to as minisa-

tellites, the extreme long units, called satellite DNA [6].

Long tandem repeats are observed to be hypermutable,

but are rare in exons and only occasionally related with

diseases in human [7,10]. However, STRs are extremely

common and scattered in both coding and non-coding

regions in eukaryote [11], prokaryote [12] and also viral

[13] genomes. Why are the SSRs so common? Do they

perform some functions or are they just “junk DNA”

sequences that should perhaps be regarded as “selfish

DNA”? Exploring these questions is important to under-

stand how genomes originated, organized and expanded.

In previous studies, microsatellites were researched by

using the statistics of relative abundance, relative density,

composition and location, etc. In this paper, however, we

mainly use repeat iterations to study repeats of short

length. Different research ers used different thresholds

*Xiaolong Wu a nd Xiangyan Zhao are the C o-First Author.

#Corresponding author.

X. Y. Zhao et al. / Journal of Biosciences and Medicines 1 (2013) 26-30

and indicators to identify a motif as a simple repeat, and

no real agreement has been reached on this issue; such as

someone applied a minimum number of base pairs, whe-

reas, others used a minimum number of repeat units [6].

All these thresholds have made mor e important contribu -

tions in the history of exploring the genome repeat se-

quences. Here, a relatively low threshold is adopted to

study the repeat sequences in the HSV-1 genome, and the

analysis may provide new insight into roles of repeat

sequences in genome origination, organization and evo-

lution.

2. MATERIALS AND METHODS

2.1. Sequences

The HSV-1 genome sequence was selected for analysis

of relationship between repeat sequence and molecular

evolution in the level of genome-wide. The genome se-

quence with FASTA format was downloaded from the

NCBI, and its accession number is NC_001806.1. Addi-

tionally, ten genes were randomly selected from the

HSV-1 genome in order to make a survey whether the

repeats distributed evenly among different fragments.

2.2. Repeats Extraction

A program called IMEx (imperfect microsatellite extrac-

tor) was selected, which can be used to extract perfect

microsatellites [14]. All thresholds of repeat iterations

were set to 2 for mono-, di-, tri-, tetra-, penta- and hex-

anucleotide repeats in this paper. We detected perfect

microsatellites using the “Advanced mode” of IMEx.

Basic information of SSRs in Supplementary Table 1

Table 1. Percentage of repetitive sequences in the entire ge-

nome and ten randomly selected genes of HSV-1.

Gnome

/Gene Starta End G. L.

(bp)b R. L. (bp)c

P (%)d

Entire

genome 1 152,261 152,261 100,674 66.12

UL2 9884 10,888 1005 670 66.67

UL8 18,224 20,476 2253 1476 65.51

UL9 20,704 23,259 2556 1563 61.15

UL17 31,386 33,497 2112 1351 63.97

UL25 48,813 50,555 1743 1102 63.22

UL36 71,049 80,468 9420 6178 65.58

UL44 96,311 97,846 1536 969 63.09

UL52 109,048 112,224 3177 2049 64.49

RS1 127,232 131,128 3897 2812 72.16

US8 141,243 142,895 1653 1056 63.88

aLocation of genes in the HSV-1 genome; bLength of genome/gene se-

lected for analysis; cLength of repeats in the corresponding genome/genes;

dPercentage, it equals that the repeats length divide by the lengths of the

genome or gene. For example, the percentage of gene UL2 = 670/1005 =

66.67%.

and Supplementary Table 2 is referenced in detail.

3. RESULTS AND DISCUSSION

When detecting a sequence for microsatellite, definition

of the minimum number of repeat iterations is an impor-

tant empirical criterion. In most previous studies, the

minimum number of iterations were set to 6, 3, 3, 3, 3

and 3 for mono-, di-, tri, tetra-, penta- and hexanucleo-

tide repeats respectively [15,16], or a repeat spanned a

minimum of 12 nt [6,9,17] or longer [11]. Theoretically,

however, as long as the iteration is more than once, the

motif should be called a repeat sequence. What’s more

important, the formation of repeats with high iterations

isn’t only one step in evolution. Therefore, many shorter

repeats were neglected and a large number of useful in-

formation were not excavated in their studies. Here, set-

ting the threshold value of all kinds of repeat iterations to

2, the results suggested that it significantly increased the

amount of simple repeats, and the results also have more

significance of evolution.

3.1. Repeat Content in Panoramic Scope

In order to give a snapshot of various repeats in the entire

genome, a fragment was randomly selected from the

HSV-1 genome, in which repeats were painted in differ-

ent color depending on difference of iterations (Figure 1).

The figure showed that SSRs were distributed relatively

equally throughout the genome and presented a mosaic-

shaped. Moreover, the longer SSRs (such as pentanuc-

leotide, hexanucleotide repeats…) are more abundant in

short and long terminal repeats (TRL and TRS), com-

pared with unique long (UL) and unique short regions

(US) (data not shown). The observations are consistent

with many previous studies about microsatellites, where

they demonstrated that the SSRs distribution is not ran-

dom in the genome.

Previously, many reports have revealed that a large

number of SSRs are located in transcribed regions of

genomes, including expressed sequence tags (ESTs) and

protein-coding genes [18]. For example, it has been

found that: 10% of identified SSRs in primate [9], 15%

in rabbit [19] located in the open reading frames (ORFs)

or protein-coding genes. In protein-coding regions of all

known proteins, 14% proved to contain repeated se-

quences, with a three times higher abundance of repeats

in eukaryotes as in prokaryotes [20]. However, w e found

31,705 repetitive sequences in the HSV-1 genome under

the criterion of setting the minimum number of iterations

to 2 for mono- ~hexanucleotide repeats (Table 1, Sup-

plementary Table 1). The 31,705 repeat tracts are

100,674 bp in nucleotide length covering 66.12% of the

full genome (Table 1, Supplementary Table 2). In pre-

vious study of the HSV-1 genome, a total of only 1377

X. Y. Zhao et al. / Journal of Biosciences and Medicines 1 (2013) 26-30

Table 2. The count of repeats with different iterations in different repeat motifs of the HSV-1 genome.

Repeat

motif

Iterations Total

2 3 4 5 6 7 8 9 10 11 ≥12

Mono- 16,008a

(62.67)b 5646

(22.10) 2171

(8.50) 1057

(4.14) 417

(1.63) 181

(0.71) 28

(0.11) 9

(0.04) 8

(0.03) 10

(0.04) 8

(0.03) 25,543

[80.56]c

Di- 3256

(90.60) 283

(7.87) 43

(1.20) 8

(0.22) 3

(0.08) 1

(0.03) - - - - - 3594

[11.33]

Tr i- 1776

(92.60) 119

(6.20) 16

(0.83) 4

(0.21) 2

(0.10) 1

(0.05) - - - - - 1918

[6.05]

Tetra- 370

(95.61) 17

(4.39) - - - - - - - - - 387

[1.22]

Penta- 161

(97.58) 2

(1.21) - - 2

(1.21) - - - - - - 165

[0.52]

Hexa- 94

(95.92） 3

(3.06) - - - - - - - - 1

(1.02) 98

[0.31]

Total 21,665

(68.33) 6070

(19.15) 2230

(7.03) 1069

(3.37) 424

(1.34) 183

(0.58) 28

(0.09) 9

(0.03) 8

(0.03) 10

(0.03) 9

(0.03) 31,705

aThe number shown in boldface and italics were neglected in previous studies (in previous studies, a sequence was defined as an repeats when the minimum

number of iterations was set to 6, 3, 3, 3, 3 and 3 for mono-, di- , t ri-, tetra-, penta- and hex anucl eotid e repeat s, res pecti vely). The summation of these numbers

accounts for 96.32% (3 0,539/31,705) of the total r epeats; bThe per centage of r epeats with different i terations. Fo r example, when iteration s = 2, the mononuc-

leotides percentage = 16,008/25,543 = 62.67%; cThe percentage of six kinds of repeat motif. For example, the percentage of mononucleotide repeats =

25,543/31,705 = 80.56%; - Absence of repeats.

Figure 1. Distribution of repeats in partial genome (Start:

10,441; End: 10,860). All repeat motifs in the sequence

that selected randomly from HVS-1 genome were painted

in different c olor d epe ndi n g on d i fferent it er at ion s.

repeat tracts were identified [5]. Therefore, considerable

parts of the short repeats (approximately 96%) were

neglected in previous studies (Ta b l e 2 ). The repeat con-

tent is the highest in gene RS1 and the lowest in UL9

gene; it varies from 60% to 70% among the ten genes

selected randomly from the HSV-1 genome. These num-

bers showed that the percentages of repeats were quite

high in both genome and genes of HSV-1, and the re-

peats were widespread in fragments of the genome with

little preference.

Obviously, simple repeats are the main components of

the genome and they may play important roles in HSV-1.

The significantly high content of repeat sequences in the

genome indicated that the occurrence of repeats is not

random but an essential feature. Therefore, there may be

a mechanism existed to make the genome prone to keep

repeat sequences in the process of replication. In the long

evolutionary history of the HSV-1 genome, the occur-

rence repeats should be ever more or less than 66.12%,

but maybe it holds at this level to f ace selection pressure

and adapt to the environment.

3.2. The Variation Trend of Repeats

The two iterations repeats were found to be the most

abundant accounting for 68.33% of the total repeats (Ta-

ble 2). In detail, the main elements of SSRs with two

iterations include about 62.67% of mononucleotide, and

more than 90% of di-, tri-, tetra-, penta-, and hexanuc-

leotide repeats, respectively. What’s more, the repeat

numbers decreased with the increase of repeats iterations.

In addition, the mononucleotide repeats make up 80.56%

of the total repeats, and repeat numbers also found to

decrease with the increase of length of repeat unit. These

data clearly showed that the HSV-1 genome tends to

form a lot of two iterations repeats, which may be the

basis of forming high repetitive sequences, and it also

tends to form more simple repeats than complicated re-

peats in the genome.

From Table 2, we can also summarized that mono-

nucleotide repeats seldom exceeded 10 bp in length, and

di-, tri-, and tetranucleotide repeats rarely exceeded 12

bp, penta- and hexanucleotide repeats scarcely ever ex-

ceeded 20 bp. Expansions of slippage replication trend to

get long of the SSRs, on the contrary, point mutations

lead to shorten the long SSRs. One opinion thought that

microsatellite repeat length is an equilibrium results from

a balance between length and point mutations [21,22].

The existence of upper limits of repeats lengths had

X. Y. Zhao et al. / Journal of Biosciences and Medicines 1 (2013) 26-30

another explanation that the tendency for repeat length at

a locus to rise via mutation is counteracted by selection,

and such selection might function through an unchara-

cterized mechanism on the length of the repeat sequence

itself or on gene expression as affected by the SSR se-

quence at issue [23].

Microsatellite length might be an important factor in

affecting mutation rate in HSV-1 genome. The count of

SSRs reduces fast with the increase of iterations, and this

indicated that it is relatively hard to form repeats with

high iterations and long-unit, and these repeats seem to

be more instability in HSV-1 genome. However, tetra-,

penta- and hexanucleotide repeats were not seriously

analyzed in the paper, on account of the low content,

compared to mono-, di- and trinucleotide repeats.

3.3. Short Repeats and Genome Evo lutio n

The number of repeats with two iterations w as 21,665 in

the entire genome, accounting for 68.33% of the total

repeats, and they absolutely predominate in the HSV-1

genome (Table 2). In addition, each percentage of two

iterations repeats was significantly higher than 90% ex-

cept mononucleotides (62.67%) in the six kinds of repeat

motifs, and the summation of repeats with iterations less

than 5 accounts for 94.15% of the total repeats. The two

iterations repeats should be subjected to weaker selective

pressure in the HSV-1 evolutionary process. High per-

centage of two times repeats suggested that two times

repeats may be the basis of forming repeats with high

iterations in the HSV-1 genome. The genome of HSV-1

is prone to form shorter repeat sequences, and these re-

peats may provide a molecular bias for fast adapting to

environmental change in the HSV-1 genome.

3.4. Long Repeats of Low Frequency

The exceptions were that there were only two pentanu-

cleotide repeats found with six iterations and one hexa-

nucleotide repeat with more than 12 iterations. The ob-

servation of low frequency of longer, complicated re-

peats possibly result from those repeat sequences are

faced with stronger selection pressure than short simple

repeats. In the long evolutionary history, the occurrence

of long repeat sequences may be lost resulting from

harmful or lethal to genome, this may make the genome

unable to observe growing longer very quickly. Some

short repeats may be neutral to the genomes and been

fixed by DNA replication. The remains of long repeat

sequences may be benefit to the genomes and have sur-

vived from selection pressure to maintain different re-

ported functions [6]. And this repeats prone mechanism

possible lead the genome growing longer and longer.

This mechanism may explain the observation of the mi-

crosatellite polymorphisms deriving mainly from vari-

ability in length rather than in the primary sequence [24]

and an experimental variant of citrus exocortis viroid

(CEVd) expanding its genome with a 96 nucleotide re-

peat sequences [25]. This is possibly an evidence for the

repeat prone mechanism with relation to genome ex-

panding.

4. CONCLUSION

By setting low thresholds, the frequency of repeats in-

creased a lot. SSRs were distributed relatively equally

throughout the genome and presented a mosaic-shaped;

and repeats were quite high in both genome and genes of

HSV-1. Simple repeats are the main components of the

genome and they may play important roles in HSV-1.

HSV-1 genome tends to form a lot of two iterations re-

peats, which may be the basis of forming high repetitive

sequences, and it also tends to form more simple repeats

than complicated repeats in the genome. Formation of

repeat may be essential for evolution of HSV-1 genome,

and the results might be helpful for studying the genome

structure, repeats genesis and evolution of HSV-1.

5. ACKNO WLEDGEMENTS

This work was supported by the Special project for biodiversity 2012,

2013 of the Chinese ministry of environ mental protection and Ministry

of Science and Technology torch plan [12C26214304703].

REFERENCES

[1] Karimi, A. and MacLean, A. (2005) Replication charac-

teristics of herpes simplex virus type 1 (HSV-1) recom-

binants in 3 types of tissue cultures. Iranian Biomedical

Journal, 9, 95-101.

[2] Arduino, P.G. and Porter, S.R. (2006) Oral and perioral

herpes simplex virus type 1 (HSV-1) infection: Review of

its management. Oral Diseases, 12, 254-270.

http://dx.doi.org/10.1111/j.1601-0825.2006.01202.x

[3] Xu, F., Schillinger, J.A., Sternberg, M.R., Johnson, R.E.,

Lee, F.K., et al. (2002) Seroprevalence and coinfection

with herpes simplex virus type 1 and type 2 in the United

States, 1988-1994. The Journal of Infectious Diseases,

185, 1019-1024. http://dx.doi.org/10.1086/340041

[4] Pebody, R.G., Andrews, N., Brown, D., Gopal, R., Melk-

er, H.D., et al. (2004) The seroepidemiology of herpes

simplex virus type 1 and 2 in Europe. Sexually Transmit-

ted Infections, 80, 185-191.

http://dx.doi.org/10.1136/sti.2003.005850

[5] Ouyang, Q., Zhao, X., Feng, H., Tian, Y., Li, D., et al.

(2012) High GC content of simple sequence repeats in

Herpes simplex virus type 1 genome. Gene, 499, 37-40.

http://dx.doi.org/10.1016/j.gene.2012.02.049

[6] Ellegren, H. (2004) Microsatellites: Simple sequences

with complex evolution. Nature Reviews Genetics, 5,

435-445. http://dx.doi.org/10.1038/nrg1348

[7] Mirkin, S.M. (2007) Expandable DNA repeats and hu-

X. Y. Zhao et al. / Journal of Biosciences and Medicines 1 (2013) 26-30

man disease. Nature, 447, 932-940.

http://dx.doi.org/10.1038/nature05977

[8] Ramel, C. (1997) Mini- and Microsatellites. Environ-

mental Health Perspectives, 5, 781-789.

[9] Jurka, J. and Pethiyagoda, C. (1995) Simple repetitive

DNA sequences from primates: Compilation and analy-

sis. Journal of Molecular Evolution, 40, 120-126.

http://dx.doi.org/10.1007/BF00167107

[10] Sutherland, G.R. and Richard, R.L. (1995) Simple tan-

dem DNA repeat and human genetic disease. Proceed-

ings of the National Academy of Sciences, 92, 3636-3641.

http://dx.doi.org/10.1073/pnas.92.9.3636

[11] Toth, G., Gaspari, Z. and Jurka, J. (2000) Microsatellites

in different eukaryotic genomes: Survey and analysis.

Genome Research, 10, 967-981.

http://dx.doi.org/10.1101/gr.10.7.967

[12] Mrazek, J., Guo, X.X. and Shah, A. (2007) Simple se-

quence repeats in prokaryotic genomes. Proceedings of

the National Academy of Sciences, 104, 8472-8477.

http://dx.doi.org/10.1073/pnas.0702412104

[13] Zhao, X., Tian, Y., Yang, R., Feng, H., Ouyang, Q., et al.

(2012) Coevolution between simple sequence repeats

(SSRs) and virus genome size. BMC Genomics, 13, 435.

http://dx.doi.org/10.1186/1471-2164-13-435

[14] Mudunuri, S.B. and Nagarajaram, H.A. (2007) IMEx:

Imperfect microsatellite extractor. Bioinformatics, 23,

1181-1187.

http://dx.doi.org/10.1093/bioinformatics/btm097

[15] Rajendrakumar, P., Biswal, A.K., Balachandran, S.M.,

Srinivasarao, K., Sundaram, R.M., et al. (2007) Simple

sequence repeats in organellar genomes of rice: Frequen-

cy and distribution in genic and intergenic regions. Bio-

informatics, 23, 1-4.

http://dx.doi.org/10.1093/bioinformatics/btl547

[16] Zhao, X., Tan, Z., Feng, H., Yang, R., Li, M., et al. (2011)

Microsatellites in different Potyvirus genomes: Survey

and analysis. Gene, 488, 52-56.

http://dx.doi.org/10.1016/j.gene.2011.08.016

[17] Chen, M., Zeng, G., Tan, Z., Jiang, M., Zhang, J., et al.

(2011) Compound microsatellites in complete Escheri-

chia coli genomes. FEBS Letters, 585, 1072-1076.

http://dx.doi.org/10.1016/j.febslet.2011.03.005

[18] Morgante, M., Hanafey, M. and Powell, W. (2002) Mi -

crosatellitesare preferentially associated with nonrepeti-

tive DNA in plant genomes. Nature Genetics, 30, 194-

200. http://dx.doi.org/10.1038/ng822

[19] van Lith, H.A. and van Zutphen, L.F. (1996) Characteri-

zation of rabbit DNA microsatellites extracted from the

EMBL nucleotide sequence database. Animal Genetics,

27, 387-395.

http://dx.doi.org/10.1111/j.1365-2052.1996.tb00505.x

[20] Marcotte, E.M., Pellegrini, M., Yeates, T.O. and Eisen-

berg, D. (1999) A census of protein repeats. Journal of

Molecular Biology, 293, 151-160.

http://dx.doi.org/10.1006/jmbi.1999.3136

[21] Bell, G.I. and Jurka, J. (1997) The length distribution of

perfect dimer repetitive DNA is consistent with its evolu-

tion by an unbiased single-step mutation process. Journal

of Molecular Evolution, 44, 414-421.

http://dx.doi.org/10.1007/PL00006161

[22] Kruglyak, S., Durrett, R.T., Schug, M.D. and Aquadro,

C.F. (1998) Equilibrium distributions of microsatellite

repeat length resulting from a balance between slippage

events and point mutations. Proceedings of the National

Academy of Sciences, 95, 10774-10778.

http://dx.doi.org/10.1073/pnas.95.18.10774

[23] Gur-Arie, R., Cohen, C.J., Eitan, Y., Shelef, L., Haller-

man, E.M. et al. (2000) Simple sequence repeats in Es-

cherichia coli: Abundance, distribution, composition, and

polymorphism. Genome Research, 10, 62-71.

[24] Mrazek, J. (2006) Analysis of distribution indicates di-

verse functions of simple sequence repeats in mycoplas-

ma genomes. Molecular Biology and Evolution, 23, 1370-

1385. http://dx.doi.org/10.1093/molbev/msk023

[25] Fadda, Z., Daros, J.A., Flores, R. and Duran-Vila, N.

(2003) Identification in eggplant of a variant of citrus

exocortis viroid (CEVd) with a 96 nucleotide duplication

in the right terminal region of the rod-like secondary

structure. Virus Resear ch , 97, 145-149.

http://dx.doi.org/10.1016/j.virusres.2003.08.002