﻿On Conjugation Partitions of Sets of Trinucleotides

Applied Mathematics
Vol. 3  No. 1 (2012) , Article ID: 16758 , 6 pages DOI:10.4236/am.2012.31017

On Conjugation Partitions of Sets of Trinucleotides

Lorenzo Bussoli1, Christian J. Michel2, Giuseppe Pirillo1,3

1Dipartimento di Matematica U.Dini, Firenze, Italia

2Equipe de Bioinformatique Théorique, Université de Strasbourg,

Strasbourg, France

3Université de Marne-la-Vallée, Marne-la-Vallée, France

Email: {bussoli, pirillo}@math.unifi.it, michel@dpt-info.u-strasbg.fr

Received October 2, 2011; revised December 5, 2011; accepted December 13, 2011

Keywords: trinucleotide; conjugated trinucleotides; code; circular code; self-complementary circular code; complementary circular codes

ABSTRACT

We prove that a trinucleotide circular code is self-complementary if and only if its two conjugated classes are complement of each other. Using only this proposition, we prove that if a circular code is self-complementary then either both its two conjugated classes are circular codes or none is a circular code.

1. Introduction

We continue our study of the combinatorial properties of trinucleotide circular codes. A trinucleotide is a word of three letters (triletter) on the genetic alphabet. The set of 64 trinucleotides is a code in the sense of language theory, more precisely a uniform code, but not a circular code [1,2]. In order to have an intuitive meaning of these notions, codes are written on a straight line while circular codes are written on a circle, but, in both cases, unique decipherability is required.

Comma free codes, a very particular case of circular codes, have been studied for a long time, e.g. [3-5]. After the discovery of a circular code in genes with important properties [6], circular codes are mathematical objects studied in combinatorics, theoretical computer science and theoretical biology, e.g. [7-23].

There are 528 self-complementary circular codes of 20 trinucleotides [6,24,25] and, as proved here, they are naturally partitioned into two quite symmetric classes.

Let be the four trinucleotides with identical nucleotides. In this paper, we study some particular partitions of. Indeed, each circular code can be associated with two other subsets and of simply by operating two circular permutations of one letter and two letters on the trinucleotides of. Then, we prove our main result, i.e. a circular code is self-complementary if and only if the remaining two classes are complement of each other. Furthermore, we also show that a subset of is a circular code if and only if the set consisting of all its complements is a circular code.

As a consequence of these results, we also prove that if a circular code is self-complementary then either both its two conjugated classes are circular codes or none is a circular code.

In Section 2, we give the necessary definitions and a characterization for a set of trinucleotides to be a circular code. In Section 3, we give the results, mainly expressed by Proposition 7 and Proposition 8.

2. Definitions

The classical notions of alphabet, empty word, length, factor, proper factor, prefix, proper prefix, suffix, proper suffix, lexicographical order, etc. are those of [1]. Let denote the genetic alphabet, lexicographically ordered with. We use the following notation:

(respectively) is the set of words (respectively non-empty words) over;

is the set of the 16 words of length 2 (diletters or dinucleotides);

is the set of the 64 words of length 3 (triletters or trinucleotides).

We now recall two important genetic maps, the definitions of code and circular code, and the property of C3- self-complementarity for a circular code, in particular [1,6,17,24,25].

Definition 1. The complementarity map: is defined by, , and, and by for all, e.g.,.

The map on words is naturally extended to a word set: its complementary trinucleotide set is obtained by applying the complementarity map to all the trinucleotides of.

Definition 2. The circular permutation map: permutes circularly each trinucleotide as follows.

The map on words is also naturally extended to a word set: its permuted trinucleotide set is obtained by applying the circular permutation map to all the trinucleotides of. We shortly write for.

Definition 3. A set of words is a code if, for each, , the condition implies and for.

Definition 4. A trinucleotide code is circular if, for each, , , , the conditions and imply, (empty word) and for .

Definition 5. A trinucleotide code is self-complementary if, for each,.

Definition 6. If is a subset of, we denote by the permuted trinucleotide set and by the permuted trinucleotide set and we call and the conjugated classes of.

Definition 7. A trinucleotide circular code is - self-complementary if, and are circular codes satisfying the following properties: (self-complementary), (and).

We have proved that there are exactly 528 self-complementary trinucleotide circular codes having 20 elements [6,24,25].

The concept of necklace was introduced by Pirillo [17] in order to characterize the circular codes for an efficient algorithm development. Let be letters in, diletters in and an integer.

Definition 8. Letter Diletter Continued Necklace (LDCN): We say that the ordered sequence is an for a subset if

and .

Any trinucleotide set is a code (more precisely, a uniform code [1]) but only few of them are circular codes. We have the following proposition.

Proposition 1. [17] Let be a trinucleotide code. The following conditions are equivalent:

1) is a circular code;

2) has no 5LDCN.

The figure below explains the notion of 5LDCN.

3. Results

Proposition 2. If is a trinucleotide circular code having 20 elements and and are its two conjugated classes then, and constitute a partition of.

Proof. It is enough to prove that . Suppose that the trinucleotide belongs both to the classes and. Then and are both in class. As no two conjugated trinucleotides can belong to a circular code, we are in contradiction. Suppose that the trinucleotide belongs both to the classes and. Then and are both in class. As no two conjugated trinucleotides can belong to a circular code, we are in contradiction. Suppose that the trinucleotide belongs both to the classes and. Then and are both in class. As no two conjugated trinucleotides can belong to a circular code, we are in contradiction. So,.

Proposition 3. The class of self-complementary circular codes with both and in the class of circular codes is non-empty.

Proof. Consider, for example, the following set of 20 trinucleotides

It is enough to prove that is a self-complementary circular code and that its two conjugated classes and are also circular codes.

is a self-complementary circular code.

is self-complementary. Obvious by inspection.

is a circular code. We use Proposition 1 [17]. By way of contradiction, suppose that admits a 5LDCN. As can be, , or, it is enough to prove that each choice leads to a contradiction.

1) If then there is no possible as is not a suffix of any trinucleotide of, contradiction.

2) If, there are three possible:

• if (a) or (b) then (c) but there is no possible as is not a prefix of any trinucleotide of, contradiction

• if (d), there is a contradiction as no trinucleotide of has a prefix.

3) If, there are six possible:

• if or, contradiction (a) and (b)

• if or then or:

if, there are three possible: if or then, similarly to (c), contradiction, and if, similarly to (d), contradiction

4) If, similarly to (c), contradiction.

As, for each letter, we cannot complete the assumed 5LDCN for, we are in contradiction. Hence, is a circular code.

is a circular code. We have to prove that

is a circular code. By way of contradiction, assume that admits a 5LDCN.

1) If, there are four possible:, , and, but no possible, contradiction.

2) If, there are three possible:, and, but no possible, contradiction.

3) If, there are six possible:, and, and the cases, and already seen, but no possible, contradiction.

4) If, there is no possible, contradiction.

Hence, is also a circular code.

is a circular code. Finally, we have to prove that

is a circular code. By way of contradiction, assume that admits a 5LDCN.

1) If, there is no possible, contradiction.

2) If, there are six possible:, , , , and, but no possible, contradiction.

3) If, there are three possible:, and which are cases already seen, contradiction.

4) If, there are four possible:, , and, but no possible, contradiction.

Hence, as and, is also a circular code.

Proposition 4. The class of self-complementary circular codes having 20 elements with neither nor in the class of circular codes is non-empty.

Proof. Consider, for example, the following set of 20 trinucleotides

It is enough to prove that is a self-complementary circular code and that neither its conjugated class nor its conjugated class are circular codes.

is a self-complementary circular code.

is self-complementary. Obvious by inspection.

is a circular code. We use Proposition 1 [17]. By way of contradiction, assume that admits a 5LDCN.

1) If then there is one possible but no possible, contradiction.

2) If, there are two possible:

• if then (a) and (b) but there is no possible, contradiction

• if (c) then there is no possible, contradiction.

3) If we have seven possible:

• if then or:

if (d) then or:

-       if then and but there is no possible, contradiction

-       if then there is no possible, contradiction

• if, similarly to (b), contradiction

• if, or then, contradiction (a)

• if then or, contradiction (a) and (d)

4) If, similarly to (a), contradiction.

Hence, is a circular code.

is not a circular code. We have

We use a technique developed in [23]. Observe that contains So,

is a 5LDCN for this 4-element subset of and, a fortiori, for itself which, consequently, is not a circular code.

is not a circular code. We have

We again use a technique developed in [23]. Remark that contains. So,

is a 5LDCN for this 4-element subset of and, a fortiori, for itself which, consequently, is not a circular code.

We need the propositions hereafter and, in particular the following one which states a general property of the involutional antiisomorphisms such as the complementary map.

Proposition 5. A subset of is a circular code if and only if is a circular code.

Proof. Suppose, first, that is not a circular code and that is a circular code. So has a 5LDCN. This means that there are 13 nucleotides, say

such that the trinucleotides

and

Now, consider the sequence

All the following trinucleotides belong to:

and

as they are the complement of trinucleotides in. So, admits a 5LDCN and it cannot be a circular code. Contradiction.

The case is a circular code and is not a circular code is similar.

Proposition 6. Let be a self-complementary subset of. If is partitioned into three classes such that two of them are the complement of each other then necessarily the third one is self-complementary.

Proof. Let, and be the three classes of an arbitrary partition of and suppose that and are complementary, i.e. and satisfy. Let be a trinucleotide of. We claim that. Indeed, in the opposite case, should not be the complement of because. We also claim that. Indeed, in the opposite case, should not be the complement of because. It remains the case. So, is self-complementary.

Remark 1. Clearly, if, and constitute an arbitrary partition of then the self-complementarity of is not enough to ensure that and are complementary of each other. This remark is again true if, in addition, is a self-complementary circular code having 20 elements. Indeed in this case, it is easy to make a partition in two classes and that are not complementary of each other. Any case, if we consider the partition of in the three classes given by a self-complementary trinucleotide circular code having 20 elements and by its two conjugated classes and then the necessary and sufficient condition holds (Proposition 7 below).

Proposition 7. A trinucleotide circular code having 20 elements is self-complementary if and only if and are complement of each other.

Proof if part. It is a trivial consequence of Proposition 6.

Only if part. Suppose that is self-complementary and consider the partition, and of. Suppose that the trinucleotide, say, belongs to. Then, also

.

We have

and

.

As is a generic trinucleotide of and as

and

then is the complement of.

As a consequence, we have the following proposition.

Proposition 8. If a trinucleotide circular code having 20 elements is self-complementary then either 1) and are both circular codes or 2) and are not circular codes (both have a necklace).

Proof. We have four possibilities:

is a circular code and is a circular code;

is a circular code and is not a circular code;

is not a circular code and is a circular code;

is not a circular code and is not a circular code.

Now, by applying Propositions 3 and 4, we have that the first and the last possibilities can be effectively realized.

Suppose that, by way of contradiction, the second possibility is realized. So, is a circular code. By Proposition 7, we have. So, by Proposition 5, must also be a circular code. Contradiction.

Suppose that, by way of contradiction, the third possibility is realized. So, is a circular code. By Proposition 7, we have. So, by Proposition 5, must also be a circular code. Contradiction.

So, only the first and the last possibilities can occur.

Hence, our proposition holds.

Proposition 9. The 528 self-complementary circular codes having 20 elements are partitioned into two classes: one class contains codes with the two permuted sets and which are both circular codes while the other class contains codes with the two permuted sets and which both are not circular codes.

Proof. It is enough to apply Proposition 8 to each of the 528 trinucleotide circular codes having 20 elements.

4. Acknowledgements

We thank Jacques Justin for his advices. The second author thanks the Dipartimento di matematica U. Dini for giving him a friendly hospitality.

REFERENCES

1. J. Berstel and D. Perrin, “Theory of Codes, Vol. 117, (Pure and Applied Mathematics),” Academic Press, London, 1985.
2. J.-L. Lassez, “Circular Codes and Synchronization,” International Journal of Computer & Information Sciences, Vol. 5, 1976, pp. 201-208.
3. F. H. C. Crick, J. S. Griffith and L. E. Orgel, “Codes without commas,” Proceedings of the National Academy of Sciences of the USA, Vol. 43, 1957, pp. 416-421. doi:10.1073/pnas.43.5.416
4. S. W. Golomb, B. Gordon and L. R. Welch, “CommaFree Codes,” Canadian Journal of Mathematics, Vol. 10, No. 2, 1958, pp. 202-209. doi:10.4153/CJM-1958-023-9
5. S. W. Golomb, L. R. Welch and M. Delbrück, “Construction and Properties of Comma-Free Codes,” Biologiske Meddel Danske Vidensk Selsk, Vol. 23, 1958, pp. 1-34.
6. D. G. Arquès and C. J. Michel, “A complementary circular code in the protein coding genes,” Journal of Theoretical Biology, Vol. 182, No. 1, 1996, pp. 45-58. doi:10.1006/jtbi.1996.0142
7. A. J. Koch and J. Lehman, “About a Symmetry of the genetic code,” Journal of Theoretical Biology, Vol. 189, No. 2, 1997, pp. 171-174. doi:10.1006/jtbi.1997.0503
8. M.-P. Béal and J. Senellart, “On the bound of the synchronization delay of a local automaton,” Theoretical Computer Science, Vol. 205, No. 1-2, 1998, pp. 297-306. doi:10.1016/S0304-3975(98)80011-X
9. F. Bassino, “Generating function of circular codes,” Advances in Applied Mathematics, Vol. 22, No. 1, 1999, pp. 1-24. doi:10.1006/aama.1998.0613
10. N. Štambuk, “On circular coding properties of gene and protein sequences,” Croatica Chemica Acta, Vol. 72, No. 4, 1999, pp. 999-1008.
11. R. Jolivet and F. Rothen, “Peculiar symmetry of DNA sequences and evidence suggesting its evolutionary origin in a primeval genetic code,” First European Workshop Exo-/Astro-Biology, Noordwijk, 21-23 May 2001, pp. 173- 176.
12. G. Frey and C. J. Michel, “Circular codes in archaeal genomes,” Journal of Theoretical Biology, Vol. 223, No. 4, 2003, pp. 413-431. doi:10.1016/S0022-5193(03)00119-X
13. C. Nikolaou and Y. Almirantis, “Mutually symmetric and complementary triplets: difference in their use distinguish Systematically between coding and non-coding genomic sequences,” Journal of Theoretical Biology, Vol. 223, No. 4, 2003, pp. 477-487. doi:10.1016/S0022-5193(03)00123-1
14. E. E. May, M. A. Vouk, D. L. Bitzer and D. I. Rosnick, “An error-Correcting framework for genetic sequence analysis,” Journal of the Franklin Institute, Vol. 341, No. 1-2, 2004, pp. 89-109. doi:10.1016/j.jfranklin.2003.12.009
15. G. Frey and C. J. Michel, “Identification of circular codes in bacterial genomes and their use in a factorization method for retrieving the reading frames of genes,” Computational Biology and Chemistry, Vol. 30, No. 2, 2006, pp. 87-101. doi:10.1016/j.compbiolchem.2005.11.001
16. J.-L. Lassez, R. A. Rossi and A. E. Bernal, “Crick’s hypothesis revisited: the existence of a universal coding frame,” Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops/Symposia (AINA'07), Niagara Falls, Vol. 2, 21-23 May 2007, pp. 745-751.
17. G. Pirillo, “A characterization for a set of trinucleotides to be a circular code,” In: C. Pellegrini, P. Cerrai, P. Freguglia, V. Benci and G. Israel, Eds., Determinism, Holism, and Complexity, Kluwer, Boston, 2003.
18. C. J. Michel, G. Pirillo and M. A. Pirillo, “Varieties of Comma-Free Codes,” Computers & Mathematics with Applications, Vol. 55, No. 5, 2008, pp. 989-996. doi:10.1016/j.camwa.2006.12.091
19. G. Pirillo, “A hierarchy for circular codes,” Theoretical Informatics and Applications, Vol. 42, No. 4, 2008, pp. 717-728.
20. M. V. José, T. Govezensky, J. A. García and J. R. Bobadilla, “On the evolution of the standard genetic code: vestiges of critical scale invariance from the RNA world in current prokaryote genomes,” PLoS One, Vol. 4, No. 2, 2009, p. e4340. doi:10.1371/journal.pone.0004340
21. G. Pirillo, “Some remarks on prefix and suffix codes,” Pure Mathematics and Applications, Vol. 19, No. 2-3, 2008, pp. 53-60.
22. G. Pirillo, “Non Sharing Border Codes,” Advances in Applied Mathematics, Vol. 3, No. 2, 2010, pp. 215-223.
23. L. Bussoli, C. J. Michel and G. Pirillo, “On some forbidden configurations for self-complementary trinucleotide Circular codes,” Journal for Algebra Number Theory Academia, Vol. 2, 2011, pp. 223-232.
24. G. Pirillo and M. A. Pirillo, “Growth Function of SelfComplementary Circular Codes,” Biology Forum, Vol. 98, 2005, pp. 97-110.
25. C. J. Michel, G. Pirillo and M. A. Pirillo, “A relation between trinucleotide comma-free codes and trinucleotide circular codes,” Theoretical Computer Science, Vol. 401, No. 1-3, 2008, pp. 17-25. doi:10.1016/j.tcs.2008.02.049