Applied Mathematics
Vol. 3 No. 1 (2012) , Article ID: 16758 , 6 pages DOI:10.4236/am.2012.31017
On Conjugation Partitions of Sets of Trinucleotides
1Dipartimento di Matematica U.Dini, Firenze, Italia
2Equipe de Bioinformatique Théorique, Université de Strasbourg,
Strasbourg, France
3Université de Marne-la-Vallée, Marne-la-Vallée, France
Email: {bussoli, pirillo}@math.unifi.it, michel@dpt-info.u-strasbg.fr
Received October 2, 2011; revised December 5, 2011; accepted December 13, 2011
Keywords: trinucleotide; conjugated trinucleotides; code; circular code; self-complementary circular code; complementary circular codes
ABSTRACT
We prove that a trinucleotide circular code is self-complementary if and only if its two conjugated classes are complement of each other. Using only this proposition, we prove that if a circular code is self-complementary then either both its two conjugated classes are circular codes or none is a circular code.
1. Introduction
We continue our study of the combinatorial properties of trinucleotide circular codes. A trinucleotide is a word of three letters (triletter) on the genetic alphabet. The set of 64 trinucleotides is a code in the sense of language theory, more precisely a uniform code, but not a circular code [1,2]. In order to have an intuitive meaning of these notions, codes are written on a straight line while circular codes are written on a circle, but, in both cases, unique decipherability is required.
Comma free codes, a very particular case of circular codes, have been studied for a long time, e.g. [3-5]. After the discovery of a circular code in genes with important properties [6], circular codes are mathematical objects studied in combinatorics, theoretical computer science and theoretical biology, e.g. [7-23].
There are 528 self-complementary circular codes of 20 trinucleotides [6,24,25] and, as proved here, they are naturally partitioned into two quite symmetric classes.
Let be the four trinucleotides with identical nucleotides. In this paper, we study some particular partitions of
. Indeed, each circular code
can be associated with two other subsets
and
of
simply by operating two circular permutations of one letter and two letters on the trinucleotides of
. Then, we prove our main result, i.e. a circular code is self-complementary if and only if the remaining two classes are complement of each other. Furthermore, we also show that a subset of
is a circular code if and only if the set consisting of all its complements is a circular code.
As a consequence of these results, we also prove that if a circular code is self-complementary then either both its two conjugated classes are circular codes or none is a circular code.
In Section 2, we give the necessary definitions and a characterization for a set of trinucleotides to be a circular code. In Section 3, we give the results, mainly expressed by Proposition 7 and Proposition 8.
2. Definitions
The classical notions of alphabet, empty word, length, factor, proper factor, prefix, proper prefix, suffix, proper suffix, lexicographical order, etc. are those of [1]. Let denote the genetic alphabet, lexicographically ordered with
. We use the following notation:
• (respectively
) is the set of words (respectively non-empty words) over
;
• is the set of the 16 words of length 2 (diletters or dinucleotides);
• is the set of the 64 words of length 3 (triletters or trinucleotides).
We now recall two important genetic maps, the definitions of code and circular code, and the property of C3- self-complementarity for a circular code, in particular [1,6,17,24,25].
Definition 1. The complementarity map:
is defined by
,
,
and
, and by
for all
, e.g.,
.
The map on words is naturally extended to a word set
: its complementary trinucleotide set
is obtained by applying the complementarity map
to all the trinucleotides of
.
Definition 2. The circular permutation map:
permutes circularly each trinucleotide
as follows
.
The map on words is also naturally extended to a word set
: its permuted trinucleotide set
is obtained by applying the circular permutation map
to all the trinucleotides of
. We shortly write
for
.
Definition 3. A set of words is a code if, for each
,
, the condition
implies
and
for
.
Definition 4. A trinucleotide code is circular if, for each
,
,
,
, the conditions
and
imply
,
(empty word) and
for
.
Definition 5. A trinucleotide code is self-complementary if, for each
,
.
Definition 6. If is a subset of
, we denote by
the permuted trinucleotide set
and by
the permuted trinucleotide set
and we call
and
the conjugated classes of
.
Definition 7. A trinucleotide circular code is
- self-complementary if
,
and
are circular codes satisfying the following properties:
(self-complementary),
(and
).
We have proved that there are exactly 528 self-complementary trinucleotide circular codes having 20 elements [6,24,25].
The concept of necklace was introduced by Pirillo [17] in order to characterize the circular codes for an efficient algorithm development. Let be letters in
,
diletters in
and
an integer.
Definition 8. Letter Diletter Continued Necklace (LDCN): We say that the ordered sequence is an
for a subset
if
and
.
Any trinucleotide set is a code (more precisely, a uniform code [1]) but only few of them are circular codes. We have the following proposition.
Proposition 1. [17] Let be a trinucleotide code. The following conditions are equivalent:
1) is a circular code;
2) has no 5LDCN.
The figure below explains the notion of 5LDCN.
3. Results
Proposition 2. If is a trinucleotide circular code having 20 elements and
and
are its two conjugated classes then
,
and
constitute a partition of
.
Proof. It is enough to prove that
. Suppose that the trinucleotide
belongs both to the classes
and
. Then
and
are both in class
. As no two conjugated trinucleotides can belong to a circular code, we are in contradiction. Suppose that the trinucleotide
belongs both to the classes
and
. Then
and
are both in class
. As no two conjugated trinucleotides can belong to a circular code, we are in contradiction. Suppose that the trinucleotide
belongs both to the classes
and
. Then
and
are both in class
. As no two conjugated trinucleotides can belong to a circular code, we are in contradiction. So,
.
Proposition 3. The class of self-complementary circular codes with both
and
in the class of circular codes is non-empty.
Proof. Consider, for example, the following set of 20 trinucleotides
It is enough to prove that is a self-complementary circular code and that its two conjugated classes
and
are also circular codes.
is a self-complementary circular code.
is self-complementary. Obvious by inspection.
is a circular code. We use Proposition 1 [17]. By way of contradiction, suppose that
admits a 5LDCN. As
can be
,
,
or
, it is enough to prove that each choice leads to a contradiction.
1) If then there is no possible
as
is not a suffix of any trinucleotide of
, contradiction.
2) If, there are three possible
:
• if (a) or
(b) then
(c) but there is no possible
as
is not a prefix of any trinucleotide of
, contradiction
• if (d), there is a contradiction as no trinucleotide of
has a prefix
.
3) If, there are six possible
:
• if or
, contradiction (a) and (b)
• if then
, contradiction (c)
• if or
then
or
:
if, there are three possible
: if
or
then
, similarly to (c), contradiction, and if
, similarly to (d), contradiction
if, contradiction (c)
• if, contradiction (d).
4) If, similarly to (c), contradiction.
As, for each letter, we cannot complete the assumed 5LDCN for, we are in contradiction. Hence,
is a circular code.
is a circular code. We have to prove that
is a circular code. By way of contradiction, assume that admits a 5LDCN.
1) If, there are four possible
:
,
,
and
, but no possible
, contradiction.
2) If, there are three possible
:
,
and
, but no possible
, contradiction.
3) If, there are six possible
:
,
and
, and the cases
,
and
already seen, but no possible
, contradiction.
4) If, there is no possible
, contradiction.
Hence, is also a circular code.
is a circular code. Finally, we have to prove that
is a circular code. By way of contradiction, assume that admits a 5LDCN.
1) If, there is no possible
, contradiction.
2) If, there are six possible
:
,
,
,
,
and
, but no possible
, contradiction.
3) If, there are three possible
:
,
and
which are cases already seen, contradiction.
4) If, there are four possible
:
,
,
and
, but no possible
, contradiction.
Hence, as and
,
is also a circular code.
Proposition 4. The class of self-complementary circular codes having 20 elements with neither
nor
in the class of circular codes is non-empty.
Proof. Consider, for example, the following set of 20 trinucleotides
It is enough to prove that is a self-complementary circular code and that neither its conjugated class
nor its conjugated class
are circular codes.
is a self-complementary circular code.
is self-complementary. Obvious by inspection.
is a circular code. We use Proposition 1 [17]. By way of contradiction, assume that
admits a 5LDCN.
1) If then there is one possible
but no possible
, contradiction.
2) If, there are two possible
:
• if then
(a) and
(b) but there is no possible
, contradiction
• if (c) then there is no possible
, contradiction.
3) If we have seven possible
:
• if then
or
:
if (d) then
or
:
- if then
and
but there is no possible
, contradiction
- if then there is no possible
, contradiction
if, contradiction (a)
• if, similarly to (b), contradiction
• if,
or
then
, contradiction (a)
• if then
or
, contradiction (a) and (d)
• if, contradiction (c).
4) If, similarly to (a), contradiction.
Hence, is a circular code.
is not a circular code. We have
We use a technique developed in [23]. Observe that contains
So,
is a 5LDCN for this 4-element subset of and, a fortiori, for
itself which, consequently, is not a circular code.
is not a circular code. We have
We again use a technique developed in [23]. Remark that contains
. So,
is a 5LDCN for this 4-element subset of and, a fortiori, for
itself which, consequently, is not a circular code.
We need the propositions hereafter and, in particular the following one which states a general property of the involutional antiisomorphisms such as the complementary map.
Proposition 5. A subset of
is a circular code if and only if
is a circular code.
Proof. Suppose, first, that is not a circular code and that
is a circular code. So
has a 5LDCN. This means that there are 13 nucleotides, say
such that the trinucleotides
and
Now, consider the sequence
All the following trinucleotides belong to:
and
as they are the complement of trinucleotides in. So,
admits a 5LDCN and it cannot be a circular code. Contradiction.
The case is a circular code and
is not a circular code is similar.
Proposition 6. Let be a self-complementary subset of
. If
is partitioned into three classes such that two of them are the complement of each other then necessarily the third one is self-complementary.
Proof. Let,
and
be the three classes of an arbitrary partition of
and suppose that
and
are complementary, i.e.
and
satisfy
. Let
be a trinucleotide of
. We claim that
. Indeed, in the opposite case,
should not be the complement of
because
. We also claim that
. Indeed, in the opposite case,
should not be the complement of
because
. It remains the case
. So,
is self-complementary.
Remark 1. Clearly, if,
and
constitute an arbitrary partition of
then the self-complementarity of
is not enough to ensure that
and
are complementary of each other. This remark is again true if, in addition,
is a self-complementary circular code having 20 elements. Indeed in this case, it is easy to make a partition
in two classes
and
that are not complementary of each other. Any case, if we consider the partition of
in the three classes given by a self-complementary trinucleotide circular code
having 20 elements and by its two conjugated classes
and
then the necessary and sufficient condition holds (Proposition 7 below).
Proposition 7. A trinucleotide circular code having 20 elements is self-complementary if and only if
and
are complement of each other.
Proof if part. It is a trivial consequence of Proposition 6.
Only if part. Suppose that is self-complementary and consider the partition
,
and
of
. Suppose that the trinucleotide, say
, belongs to
. Then, also
.
We have
and
.
As is a generic trinucleotide of
and as
and
then is the complement of
.
As a consequence, we have the following proposition.
Proposition 8. If a trinucleotide circular code having 20 elements is self-complementary then either 1)
and
are both circular codes or 2)
and
are not circular codes (both have a necklace).
Proof. We have four possibilities:
is a circular code and
is a circular code;
is a circular code and
is not a circular code;
is not a circular code and
is a circular code;
is not a circular code and
is not a circular code.
Now, by applying Propositions 3 and 4, we have that the first and the last possibilities can be effectively realized.
Suppose that, by way of contradiction, the second possibility is realized. So, is a circular code. By Proposition 7, we have
. So, by Proposition 5,
must also be a circular code. Contradiction.
Suppose that, by way of contradiction, the third possibility is realized. So, is a circular code. By Proposition 7, we have
. So, by Proposition 5,
must also be a circular code. Contradiction.
So, only the first and the last possibilities can occur.
Hence, our proposition holds.
Proposition 9. The 528 self-complementary circular codes having 20 elements are partitioned into two classes: one class contains codes with the two permuted sets and
which are both circular codes while the other class contains codes with the two permuted sets
and
which both are not circular codes.
Proof. It is enough to apply Proposition 8 to each of the 528 trinucleotide circular codes having 20 elements.
4. Acknowledgements
We thank Jacques Justin for his advices. The second author thanks the Dipartimento di matematica U. Dini for giving him a friendly hospitality.
REFERENCES
- J. Berstel and D. Perrin, “Theory of Codes, Vol. 117, (Pure and Applied Mathematics),” Academic Press, London, 1985.
- J.-L. Lassez, “Circular Codes and Synchronization,” International Journal of Computer & Information Sciences, Vol. 5, 1976, pp. 201-208.
- F. H. C. Crick, J. S. Griffith and L. E. Orgel, “Codes without commas,” Proceedings of the National Academy of Sciences of the USA, Vol. 43, 1957, pp. 416-421. doi:10.1073/pnas.43.5.416
- S. W. Golomb, B. Gordon and L. R. Welch, “CommaFree Codes,” Canadian Journal of Mathematics, Vol. 10, No. 2, 1958, pp. 202-209. doi:10.4153/CJM-1958-023-9
- S. W. Golomb, L. R. Welch and M. Delbrück, “Construction and Properties of Comma-Free Codes,” Biologiske Meddel Danske Vidensk Selsk, Vol. 23, 1958, pp. 1-34.
- D. G. Arquès and C. J. Michel, “A complementary circular code in the protein coding genes,” Journal of Theoretical Biology, Vol. 182, No. 1, 1996, pp. 45-58. doi:10.1006/jtbi.1996.0142
- A. J. Koch and J. Lehman, “About a Symmetry of the genetic code,” Journal of Theoretical Biology, Vol. 189, No. 2, 1997, pp. 171-174. doi:10.1006/jtbi.1997.0503
- M.-P. Béal and J. Senellart, “On the bound of the synchronization delay of a local automaton,” Theoretical Computer Science, Vol. 205, No. 1-2, 1998, pp. 297-306. doi:10.1016/S0304-3975(98)80011-X
- F. Bassino, “Generating function of circular codes,” Advances in Applied Mathematics, Vol. 22, No. 1, 1999, pp. 1-24. doi:10.1006/aama.1998.0613
- N. Štambuk, “On circular coding properties of gene and protein sequences,” Croatica Chemica Acta, Vol. 72, No. 4, 1999, pp. 999-1008.
- R. Jolivet and F. Rothen, “Peculiar symmetry of DNA sequences and evidence suggesting its evolutionary origin in a primeval genetic code,” First European Workshop Exo-/Astro-Biology, Noordwijk, 21-23 May 2001, pp. 173- 176.
- G. Frey and C. J. Michel, “Circular codes in archaeal genomes,” Journal of Theoretical Biology, Vol. 223, No. 4, 2003, pp. 413-431. doi:10.1016/S0022-5193(03)00119-X
- C. Nikolaou and Y. Almirantis, “Mutually symmetric and complementary triplets: difference in their use distinguish Systematically between coding and non-coding genomic sequences,” Journal of Theoretical Biology, Vol. 223, No. 4, 2003, pp. 477-487. doi:10.1016/S0022-5193(03)00123-1
- E. E. May, M. A. Vouk, D. L. Bitzer and D. I. Rosnick, “An error-Correcting framework for genetic sequence analysis,” Journal of the Franklin Institute, Vol. 341, No. 1-2, 2004, pp. 89-109. doi:10.1016/j.jfranklin.2003.12.009
- G. Frey and C. J. Michel, “Identification of circular codes in bacterial genomes and their use in a factorization method for retrieving the reading frames of genes,” Computational Biology and Chemistry, Vol. 30, No. 2, 2006, pp. 87-101. doi:10.1016/j.compbiolchem.2005.11.001
- J.-L. Lassez, R. A. Rossi and A. E. Bernal, “Crick’s hypothesis revisited: the existence of a universal coding frame,” Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops/Symposia (AINA'07), Niagara Falls, Vol. 2, 21-23 May 2007, pp. 745-751.
- G. Pirillo, “A characterization for a set of trinucleotides to be a circular code,” In: C. Pellegrini, P. Cerrai, P. Freguglia, V. Benci and G. Israel, Eds., Determinism, Holism, and Complexity, Kluwer, Boston, 2003.
- C. J. Michel, G. Pirillo and M. A. Pirillo, “Varieties of Comma-Free Codes,” Computers & Mathematics with Applications, Vol. 55, No. 5, 2008, pp. 989-996. doi:10.1016/j.camwa.2006.12.091
- G. Pirillo, “A hierarchy for circular codes,” Theoretical Informatics and Applications, Vol. 42, No. 4, 2008, pp. 717-728.
- M. V. José, T. Govezensky, J. A. García and J. R. Bobadilla, “On the evolution of the standard genetic code: vestiges of critical scale invariance from the RNA world in current prokaryote genomes,” PLoS One, Vol. 4, No. 2, 2009, p. e4340. doi:10.1371/journal.pone.0004340
- G. Pirillo, “Some remarks on prefix and suffix codes,” Pure Mathematics and Applications, Vol. 19, No. 2-3, 2008, pp. 53-60.
- G. Pirillo, “Non Sharing Border Codes,” Advances in Applied Mathematics, Vol. 3, No. 2, 2010, pp. 215-223.
- L. Bussoli, C. J. Michel and G. Pirillo, “On some forbidden configurations for self-complementary trinucleotide Circular codes,” Journal for Algebra Number Theory Academia, Vol. 2, 2011, pp. 223-232.
- G. Pirillo and M. A. Pirillo, “Growth Function of SelfComplementary Circular Codes,” Biology Forum, Vol. 98, 2005, pp. 97-110.
- C. J. Michel, G. Pirillo and M. A. Pirillo, “A relation between trinucleotide comma-free codes and trinucleotide circular codes,” Theoretical Computer Science, Vol. 401, No. 1-3, 2008, pp. 17-25. doi:10.1016/j.tcs.2008.02.049