Repeat blocks, microsatellites or simple sequence repeats (SSRs) can produce good co-dominant molecular markers for genetic diversity analysis and the determination of self-pollination rates in progenies originating from open pollination of selected genotypes. The enrichment of guarana genomic libraries was underway when it was confirmed that we are working with a complex polyploid species with 210 chromosomes. The probes (CA)12, (CT)12 and (TC)14 were used to finish the enrichment of four libraries for repeat blocks and the screening of a databank of expressed sequence tags (ESTs) from guarana seeded-fruits was accomplished as well. Fifteen clonal cultivars were genotyped with three replicas at 10 out of 27 identified loci using the 59 alleles that passed the reproducibility criterion. A large number of short repeat blocks were identified and this was considered to be a consequence of the recent polyploidization event. However, blocks with eight or more repeats ideal for genotyping were scarce. Annealing of most probes to short blocks by partial complementarity could explain the scarcity of longer blocks in genomic libraries but cannot explain why they were rare in the ESTs. Due to the complexity of the genotypes, alleles were treated as dominant traits. ESTs harboring repeat blocks had the functional annotation renewed. Locus GRN07 is inserted in a homologue of the MOTHER OF FLOWERING LOCUS T AND TFL1 (MFT), in which 3’-UTR displays clear post-transcriptional regulatory features. MFT and its variants are probably involved in the determination of seed germination and embryo growth characteristics. Other accessed loci can be involved in plant architecture and defense reactions. It was concluded that the alleles described in the present work can be used to distinguish guarana cultivars and possibly to analyze segregation using the progenies of controlled pollinations between divergent genitors. Also, the fingerprints obtained can be added to the morphological and agronomic descriptors of the cultivars.
The guarana plant (Paullinia cupana Kunth sorbilis (Mart.) Ducke) is a liana native to the Amazon rainforest that acquires the shrub habit when cultivated in open fields. It is included in tribe Paullinieae [
Over the past 40 years, Embrapa Western Amazon has conducted a breeding program of selected guarana plants aimed at improving productivity and disease resistance. Molecular markers can be important tools in plant breeding programs. They have been used for the identification and distinction (or discrimination) of genotypes and for the quantification of variability in the DNA, which can subsequently be correlated with the divergence of phenotypes. The integration of recombination by breeding, selection, and molecular data analyses leads to a faster gain in results [
Dominant markers were used for diversity analysis in guarana plants, but dominant markers usually only support approximate assertions about identical individuals. The high degree of multi-allelism and the co-domi- nant Mendelian inheritance of microsatellites (single sequence repeats or repeat blocks) provide a powerful sys- tem for the unique identification of individuals for fingerprinting purposes and parentage testing, particularly when the individuals are expected to be related [
We had already initiated the search for repeat blocks when it was definitely stated that guarana plants have 210 chromosomes and are allopolyploids [
The objectives of the present work were to identify repeat blocks in the genome of guarana plants, to design primers, to amplify and choose the loci displaying reliable variability to use as microsatellite markers and, to va- lidate these loci by genotyping guarana cultivars, which were selected by their valuable agronomic characteris- tics. We used 10 primer pairs, some of which were located within genes involved in physiological processes such as seed viability, determination of the plant architecture and defense against pathogens. Although we still cannot determine which ancestral genomes contributed to the currently cultivated genotypes of the guarana plant or which are the dosages of the alleles, the fingerprints can be used as descriptive characteristics [
Four libraries of Sau3AI and Mse I genomic fragments were enriched for repeat blocks using probes (CA)12, (CT)12, (CA)12 + (CT)12 or (TC)14, according to the procedures described in [
The guarana (Paullinia cupana Kunth sorbilis (Mart.) Ducke) seeded-fruits EST data bank (GenBank accession numbers EC763506-EC778393) was screened for sequences harboring repeat blocks using the TROLL routine from the Staden software package [
DNA was extracted from liquid N2-frozen leaves of 15 of the only guarana cultivars developed to date worldwide, which are maintained at Embrapa Western Amazon (03˚06'07"S - 60˚01'30"W) and recommended for cultivation in the State of Amazonas (Brazil). The codes used for the identification of these cultivars are 189, 372, 388, 505, 608, 610, 611, 612, 624, 626, 648, 861, 871, 882 and 850. Chemicals and instructions for DNA extraction were provided with the YGP100 kit (BMC Technologies). DNA extracts were quantified using a NanoDrop and examined following resolution in 0.8% agarose gels.
Primers were designed for single ESTs when repeat blocks were identified in the CONTIGs (sets of singlets aligned to construct the most reliable complete sequence) of the databank. The software Primer 3 (
The primers reported for lychee [
Due to the complex polyploid nature of the species under study, the alleles that were reproduced twice within three replicas were tabled as dominant characters and given a value of “1” for presence and “0” for absence. The power of each allele to distinguish guarana cultivars was evaluated with the software pars designating the value “1” for a specific allele per turn and “0” for all the others and applying sufficient turns to assure that the value “1” has been given to each allele at least once. The 59 phylograms produced were compared using the software treedist and through visual examination. For the analysis of divergence, 2000 bootstrap permutations were generated using seqboot. Peaks that were reproduced twice within three replicas but displayed characteristically very low intensities and/or very irregular areas were maintained in the analysis with the half of the weights given to the other peaks (weight = “1” in half the permutations and weight = “0” in the other half). Matrices of paired distances were produced with restdist set to use the neighbor-joining method and site length = 20. Trees
(phylip files) were produced for each of the 2,000 permutations using the software neighbor. These phylip files were submitted to consense and to drawtree to obtain the cladogram for the 15 cultivars. Pars, treedist, seqboot, restdist, neighbor, consense and drawtree are part of the PHYLIP package, version 3.6 [
Among the 688 sequences obtained from the enrichment of the genomic libraries, 28% had perfect blocks with at least four dinucleotide repeats (
The data described above differ from those obtained for most plant species using (TC)12-14 and (AC)12 probes, such as Caryocar brasiliense [
The results from the seeded-fruits ESTs followed the same trend. Repeat blocks of dinucleotides were present in 3814 of the 4999 ESTs harboring microsatellites. AG/TC repeats were the most frequent, followed by AC/TG, AT and CG repeats (
A low frequency of perfect dinucleotide blocks can be related to large genome sizes, which are generally associated with the accumulation of “junk DNA”. The wheat haploid genome consists of 5600 Mbp, with 43.4 dinucleotide blocks per Mbp in nontranscribed (genomic sequences in general) regions and 61.0 dinucleotide blocks per Mbp in transcribed regions (ESTs). These numbers are lower than those reported for the 125 Mbp genome of A. thaliana, which displays 77.7 and 137.0 dinucleotide blocks per Mbp in nontranscribed and tran- scribed regions, respectively. A generalized increase in genome sizes and the accumulation of “junk DNA” were most likely simultaneous consequences of transposon amplification occurring in the evolutionary eras subsequent to the Tertiary, after the ancient genomic polyploidizations that apparently affected most plant species. Transposons are more efficiently eliminated from transcriptionally active hypomethylated regions of chromatin, where most repeat blocks occur. Therefore, the frequency of repeat blocks would be negatively correlated with that of “junk DNA”, which is abundant in large genomes, but would be positively correlated with that of singlecopy and low-copy-number genes, which are better represented in ESTs than in other regions of the genome [
Number of repeat blocks identified in the genomic libraries (a) and in the databank of ESTs from seeded-fruits (b) of Paullinia cupana var. sorbilis, the guarana plant. Each genomic library in (a) is identified according to the probe used for the enrichment. The percentage of repeat blocks is relative to the number of cloned and sequenced fragments that were complementary to the probes. Repeat blocks identified in ESTs (b) are represented according to their composition. A logarithmic scale was used to facilitate visualization
15,387 ESTs with average length of 773 nt were accessed [
The accumulation of short repeat blocks in guarana is evident (
Genotyping of 10 loci (
Fifty nine alleles were included in the diversity analyses. Reproducible peaks with very low intensities or very irregular areas were given half the weight given to more intense peaks displaying classical approximately triangular areas. Whether these peaks are unique characteristics valuable for the identification of certain cultivars will be verified in future experiments. No two cultivars were identical among the 15 that were analyzed. Bootstrap values above the partitions in the cladogram (
Due to the allopolyploidy, even more than five reproducible alleles per guarana plant were present at some loci (
. Characteristics of the primer pairs used to access ten microsatellite loci and genotype guarana (Paullinia cupana var. sorbilis) cultivars.
LOCUS | REPEAT BLOCK | PRIMERS F AND R | AL | EXP SIZE |
---|---|---|---|---|
GRN01 | (CA)6(CA)3 | AGAACTGGTCCAACCGTCTC and CGTGAAAGGTCTGAGTGAAGC | 4 | 110 |
GRN04 | CA(CCA)4CA(CCA) | CATCCATTGTCACCTCTTGG and TGGCATGAGACAATTTGTGG | 1 | 209 |
GRN05 | (TA)5(AT)5 | GATCAGGTGCCACTCCAAGT and TTGTGTGAGTCCTCAGCCTCT | 5 | 233 |
GRN07 | (TA)16 | GGTTCTTTTCAGCGCAGTTG and GGCATAGAGCACCGAGAGAC | 8 | 249 |
GRN09 | (AT)8C(TA)12 | CTGAATGCTGTCCAAGCA and GCAGCCTTCCCATTTTACC | 5 | 256 |
GRN12 | (TG)5(TA)7 | TGTCAAACCCCTTTATGTTC and CAATGGTGCCAGTAAATACA | 5 | 244 |
GRN13 | (CT)5(TA)7 | TCGAAATATATGTGGCATGA and TGCATAAAATCACCAAATCA | 1 | 252 |
GTN14 | (TC)11 | ATGATCATCAGCATCATGG and TATCCAACTCAATTCCCAGA | 13 | 191 |
GRN17 | (CT)9 | TGCGACTGTGAGTGAGCTA and TTGAGATGAACTCAGCCAAC | 10 | 180 |
GRN20 | (CGC)5(CGC)3 | TGGGAGGAGAGAACCGTACA and GCCTTCCCTTCCTATCAAGC | 7 | 222 |
OBS.: annealing temperature was 57˚C for all the primer pairs. EXP SIZE = expected size of the amplicons. AL = alleles passing the reproducibility criterion.
Cultivars of guarana (Paullinia cupana var. sorbilis) grouped by the neighbor- joining method after 2000 bootstrap permutations. The numbers above partitions represent their frequency in the permuted data sets. Cultivars are identified by their code in the breeding program
Chromatograms produced by genotyping guarana (Paullinia cupana var. sorbilis) plants with primers for the microsatellite loci GRN01 (a), GRN14 (b), GRN17 (c) and GRN20 (d)
tree [
It was not possible to conclude about the number of copies (the dosage) of the alleles. Some work using the areas below the peaks can be useful to accomplish this type of analysis, which depends on information about the ancestors as demonstrated for hemisexual and polyploid Rosa L. Sect. Caninae D.C. [
Seven of the 10 loci used in the present work were identified in the sequences of ESTs included in the guarana seeded-fruits databank [
Locus GRN07 was identified in a CONTIG of ESTs that is homologous to the MOTHER OF FLOWERING LOCUS T AND TFL1 (MFT) coding sequence from Fragaria vesca (
. Orthologues of the ESTs in the seeded-fruits databank accessed to develop microsatellite markers used for genotyping guarana (Paullinia cupana var. sorbilis) plants.
ORTHOLOGUES IN THE GenBank | ||||
---|---|---|---|---|
LOCUS | CODING SEQUENCE | SPECIES | ID | E−VALUE |
GRN07 | mother of flowering locus T and TFL1 | Fragaria vesca | XP4299541.1 | 2 × 10−89 |
GRN09 | endo-xyloglucan transferase | Gossypium hirsutum | BAA21107.1 | 1 × 10−136 |
GRN12 | cyclase (tryptophan catabolism) | Arabidopsis thaliana | EFH43390.1 | 2 × 10−39 |
GRN13 | gibberellin oxidase | Pisum sativum | ABI64150.1 | 4 × 10−42 |
GRN14 | unnamed protein (TCP transcription factor) | Vitis vinifera | CA048409.1 | 8 × 10−26 |
GRN17 | anfipatic channel (membrane) | Jatropha curcas | ADU56185.1 | 6 × 10−24 |
GRN20 | zinc finger protein C2H2 | Ricinus communis | EEF34900.1 | 9 × 10−49 |
(a)—Sequence from the guarana (Paullinia cupana var. sorbilis) seeded-fruit databank that harbors the TA repeat block (in the box) in microsatellite locus GRN07. The amino acids below the codons constitute the deduced peptide homologous to the mother of flowering locus T and TFL1 transcription factor. Three predicted splicing sites are indicated by black arrow heads. A stop codon is represented by an asterisk. The first six adenines in the poly(A) tail are represented by italicized bold letters. The predicted regulatory motifs in the long 3’-UTR are identified by different colors: a consensus sequence for interaction with the EIN3 transcription factor activated by ethylene is shown in blue; the homologous of a smRNA from A. thaliana that also targets the cytosolic malate dehydrogenase coding sequence (ASRP and [39] ) is displayed in red; 4-mer motives significantly enriched among Putative Recognition Elements (PREs) involved in post-transcriptional “regulons” directed by RNA-binding proteins [40] are displayed in green. Putative polyadenylation signals [41] are presented in bold italicized underlined capital letters and often coincide with other potential regulatory signals. (b)—Secondary structure for the pre-miRNA predicted to be transcribed from nucleotide 614 to 733. The bracket indicates the predicted mature miRNA
These characteristics were identified through the analysis of untranslated regions in six dicotyledonous families and have presumably been preserved for the last 70 million years, despite they are not protein coding sequences. Corroborating the hypothesis of regulatory power, these elements are preferentially observed in association with major open reading frames that code for transcription factors [
Variations in the number of TA repeats of the block inside the microsatellite locus GRN07 could influence transcript maturation or processing and be related to differences in the time necessary for seed germination. Changes in the 3’ portion of the TA block could result in the emergence or disruption of proximal/distal polyadenylation signals, which are relatively free variations on the themes UUUGUA and AAUGAA. The efficiency of polyadenylation, the length and the age of poly(A) tails are directly related to the stability and translatability of mRNAs [
Locus GRN13 is inserted in a homologue of gibberellin oxidase from Pisum sativum (
The primers developed for Litchi chinensis [
In conclusion, dinucleotide blocks harboring eight or more repeats are less frequent in the genome of guarana plants compared with most plant species. The primer set reported in the present work can provide descriptive characteristics and can be used to distinguish between the genotypes of the 15 guarana (Paullinia cupana var. sorbilis) analyzed cultivars. Some of the primer pairs described here, such as GRN07 and GRN14, can also be useful in future research on seed viability, fruit setting time, the uniformity of fruit maturation and/or plant/in- florescence architecture and defense against pathogens using progenies arising from controlled pollination between phenotyped divergent genitors.
We are thankful to FAPEAM (grant # 9242003) and EMBRAPA for the financial support and to CNPq for the fellowship to MPSL while she was working in this project. We thank Dr. Izeni P. Farias for the collaboration to the cloning and sequencing trails that took place at the Federal University of Amazonas (Manaus-AM) and Dr. Vânia C. R. Azevedo and Dr. Peter W. Inglis for the collaboration to the cloning and sequencing trails that took place at Embrapa Genetic Resources and Biotechnology (Brasília-DF). We thank Cleiton Fantin, Waleska Gravena and Jeferson C. da Cruz for the help to clone and sequence genomic libraries at the Federal University of Amazonas (Manaus-AM); Natália D. M. Carvalho and Susan K. B. Soares at Embrapa Western Amazon (Manaus-AM) and Alexandra M. B. de Souza at INPA (Manaus-AM) for the contribution to the visual curation of data produced by sequencing genomic libraries and using TROLL for the EST databank.