Amongst endonuclease, the homodimeric variety is found in many prokaryotes for processing of the introns out from pre-RNAs. But as the variety and the complexity of introns rise with evolution, do the homodimeric endonuclease adapt to the changes? The correlations between evolving pre-RNAs and adapting homodimeric endonuclease in lower prokaryotes is investigated in this paper. First, we construct and observe the appearance of a long branch in the phylogeny based on homodimeric endonuclease. To appreciate the finer aspects of accelerating evolution near this long branch, we delve deeper into the pre-RNA substrates of the endonuclease. Computational evidence of an as-yet-unreported noncoding RNA gene then emerges from this study. The capabilities of homodimeric endonuclease and the complexities of its pre-RNA substrates appear to evolve in steps together.
In recent years computational approaches to annotation and investigation of noncoding RNAs have become widespread. The subject of noncoding RNAs has grown for more than half a century. It began with ribosomal RNAs and transfer RNAs, but a whole host of newer types have come up in the last couple of decades. Through the years many different aspects of the subject have been extensively studied, and the links between them analysed and established. RNA genes, especially the ribosomal ones, have been used extensively for study of phylogeny and gene-evolution [
In this paper we study phylogeny of lower prokaryotes based on homodimeric endonuclease. The reason for choosing homodimeric endonuclease is its close interaction with noncoding pre-RNAs; it processes the introns out of pre-RNAs. Our main interest is in methanogens because methanogens have shown promising new features amongst its tRNAs. For one, there are absolutely new tRNA genes that decode UAG stop codon [
In deciphering the evolutionary history of archaea, the phylogeny was mainly based on 16s small ribosomal RNA sequence [8,9]. The 16s rRNA based tree suggests two main phyla, the euryarchaeota and crenarchaeota, their specific order of emergence, and mutual relationship among their lineages. The other phylogenetic approach, based on “whole genome”, does not recover the monophyly of euryarchaeota as halobacteriales are at the base of the archaeal tree [
Among the methanogenic euryarchaea there are five phylogenetically divergent orders: methanobacteriales, methanococcales, methanomicrobiales, methanosarcinales and methanopyrales [
For analysis of phylogeny the software MEGA was used. Four different phylogenetic trees were investigated based on 1) maximum likehood method, 2) neighbor joining, 3) upgma and 4) minimum evolution. We required a high level of congruence between the trees from these four different methods.
To check if tRNAser(CGA) also lies hidden in the genome, the standard and highly successful tRNA gene finding algorithms http://lowelab.ucsc.edu/tRNAscanSE/ and http://130.235.46.10/ARAGORN/ and databases were used. These algorithms also locate with high precision if the gene appears with one intron. The possibility that the gene may have more than one intron is investigated using the following algorithm. Introns in archaea have been found to occur at a few positions in tRNA. The length of an intron is bounded above by 200. Taking the consensus archaeal tRNAser(CGA) we cut them into pieces at the probable intron locations. These pieces of tRNAser(CGA) were then homology searched through the genome of Methanosaeta thermophila by varying the intervening intron length between 6 and 200. We did not assume anything about the nucleotide composition of the intron sequences.
The trees resulting from endonuclease dataset are in
the maximum likelihood method;
To appreciate the acceleration of evolution of homodimeric endonuclease near M. thermophila, we delve deeper into its pre-RNA substrates. The phylogenetic trees delineate quite clearly the ‘neighbourhood’ of each element. Yet, when we look at the spectrum of RNAs, there are clear indications of “anomalies”. For instance, in NCBI the tRNAser[CGA] gene, which is present in all its closest neighbours, appears to be absent in M. thermophila. In its neighbourhood lie RC1 and M. burtonii; both have tRNAser[CGA]. Interestingly, in both RC1 and M. burtonii the corresponding tRNAser[CGA] genes have introns that are cleaved by homodimeric endonuclease. It is puzzling, therefore, that phylogeny based on homodimeric endonuclease places M. thermophila near RC1 and M. burtonii , and yet the substrate of the endonuclease is so prominently absent in M. thermophila. We are, therefore, prompted to search for the missing tRNAser [CGA] in M. thermophila.
The search for missing/new tRNAs has to satisfy several known constraints. Archaeal tRNAs, especially the ones that are missing, are likely to have canonical [i.e., between 37 and 38] intron and/or noncanonical [at any position other than 37/38] introns. The boundary features between exons and introns require detailed attention. The exon-intron boundaries form a folded motif generically termed Bulge-Helix-Bulge [BHB]. This structure consists of two 3 nt [nucleotide] bulges on opposite strands, separated by a 4 bp central helix -- the so-called “3-4-3 motif”. The 5′ half of central helix is in exonic region; complementary 3′ half is intronic. This generic BHB, or more precisely hBHBh motif, has been observed for both canonical and noncanonical introns. For a few noncanonical introns, however, the canonical hBHBh/ motifs are not always observed. Instead a simplified hBH or HBh/ motif, including two helices [h and H or H and h/] and one bulge can be isolated [7,19-21]. Moreover in the central helix [H] of the exon-intron boundary motif for canonical introns, a few miss-pairings, such as A: C, A: G, C: U and U: U, have been observed. One thing that appears to hold with reasonable certainty is that for all types of intron, canonical or noncanonical, and for every possible BHB motif, hBHBh/ or hBH or HBh′, the cleaving sites are always located two bases away from the central helix. The cleaving of introns is catalysed by endonucleases. Recent investigations have found that in archaea all 3 types of intron cleaving endonucleases–homodimer [a2], homotetramer [a4] and heterotetramer [a2b2]-- can interact and splice the 3-4-3 structural substrate [
Taking a cue from the results of phylogeny based on homodimeric endonuclease, especially the close connection linking M. thermophila with RC1 and M. burtonii , we look for the missing tRNAser[CGA] gene in M. thermophila assuming it occurs in some novel way. The genome encodes homodimeric tRNA-endonuclease. Taking the consensus archaeal tRNAser[CGA] we cut the genome of M. thermophila into pieces [to take care of the introns] at all probable intron locations. These pieces of tRNAser[CGA] are then homology searched through the genome of M. thermophila. We varied the intervening intron length between 6 and 200. We did not assume anything about the nucleotide composition of the intron sequences. For M. thermophila the above procedure did identify a putative tRNAser[CGA] gene. Even though the endonuclease is homodimeric, this putative tRNAser [CGA] has two noncanonical introns, one 33 bases long in D-arm between 21 and 22; another of length 30 bases located in T-loop between 59 and 60. After the introns are removed the cloverleaf structure of tRNAser[CGA] is recovered. All the conserved bases and base-pairs of
archaeal tRNAser are precisely in place. Notable amongst them are G73, G26 and U44, the unique identity elements of tRNAser recognized by seryl tRNA synthetase [
The evolving complexity of genomes involves subtle, yet unmistakable, correlations connecting the various encoded components. First, there are the protein coding parts. But, even within it are the recently discovered hidden invariant correlating patterns [
due to their interactions with phages and other hosts [
The phylogeny based on homodimeric endonuclease is new. Since the methanogens in euryarchaeal domain all have this enzyme, a finer characterization and classification emerge. While the trees derived are all in reasonable congruence with the classification based on 16s rRNA, the grouping of RC1 with M. thermophila in the neighbourhood of M. burtonii is noteworthy. Equally noteworthy is the long branch, indicative of paraphyly, for M. thermophila . We interpret it as a signal of an acceleration of evolution of endonuclease. Interestingly, while RC1 and M. burtonii both have tRNAser[CGA], in M. thermophila it remains unreported. Since pre tRNAser[CGA] in RC1 and M. burtonii are substrates of homodimeric endonuclease, its complete absence in M. thermophila is inexplicable. We interpret it as a signal of the accelerating capabilities of the endonuclease to search anew for tRNAser[CGA] in M. thermophila.
tRNAser[CGA] is characterized by a large number of unique features. First, its secondary cloverleaf structure is so intricate. And on this coverleaf are special identity elements at very well defined locations [