^{1}

^{1}

A TMRCA (Time to the Most Recent Common Ancestor) calculator has been developed, with a capacity to handle up to 10,000 haplotypes simultaneously, for haplotypes being in any format within the 111 markers in the FTDNA (Family Tree DNA, a leading company in systematics of the haplotypes) nomenclature, for haplotypes being in any combination with respect to number of their markers, and for the TMRCA values from a few hundred years to millions of years. The calculator shows the TMRCA data calculated separately and simultaneously in the 6-, 12-, 25-, 37, 67, and 111-marker formats by the linear method, and for haplotypes of any format, such as 7-, 8-, 9-, 10-, 17-, 19-, 23- and any other format by the quadratic method. The calculator also shows a number of mutations (in the whole given dataset of haplotypes), so the TMRCA values can be verified manually, if desired so. The calculator automatically makes corrections for back mutations (in the linear method; there is no need for corrections in the quadratic method), and considers multi-marker mutations and zero alleles, counting them correctly as one mutation. The calculator can be navigated to exclude markers which show an excessive dispersion, which likely is an indication of “admixtures”, which do not belong to the given set of haplotypes. The paper provides a number of examples of TMRCA calculations for datasets of different haplogroups, and shows that the mutation rate constants are the same in different haplogroups. The papers provides a comparison of mutation rate tables by Chandler (2006), Ballantyne et al. (2010), Heinila (2012) and an anonymous investigator (2014) with the mutation rate constants determined and examined in this study. It is shown that the above authors noticeably and significantly overestimated their mutation rates, which often lead to unrealistic TMRCAs.

The TMRCA values (Time for the Most Recent Common Ancestor) provide valuable data for DNA genealogy. In one common approach, they are calculated by counting a number of mutations in a given set of haplotypes (in the same format, such as 12 marker haplotypes, or 17 marker, or 25, 37, 67, 111 marker haplotypes, or any other format) from a so-called base haplotype, which is assumed to be the ancestral haplotype for the haplotype set. The total number of mutations is divided by the number of haplotypes in the set and by the mutation rate constant. For example, if a set of one hundred of 67 marker haplotypes shows 250 mutations total, then we have 250/100/0.12 = 20.83 conditional generations (25 years each), that is typically rounded to 21 conditional generations (since a generation is a discrete figure), and get 525 ± 60 years to a common ancestor. Here 0.12 (a number of mutations per conditional generation, calibrated for 25 years in the generation) is the mutation rate constant for the 67 marker haplotype, and the margin of error is calculated following common rules of statistics (Klyosov, 2009a; Klyosov & Rozhanskii, 2012a) . If the number of 20.83 is not rounded, one obtains the TMRCA of 521 ± 60 years, that is practically the same number within the margin of error. The method described above was coined the linear method. If one does not like 25 years as a conditional generation and wants to employ, say, 30 years, then the mutation rate constant is 0.144 (mutations per conditional generation of 30 years), and the TMRCA is exactly the same: 250/100/0.144 = 17.36 conditional generations, that is 17.36 × 30 = 521 ± 60 years. Obviously, the linear method is applicable only to a set of haplotypes in the same format, since a number of mutations is divided by a number of haplotypes.

Another approach, coined the quadratic method (Klyosov, 2009a) , or ASD (average square distance, Goldstein et al., 1995 ), can be applied to haplotypes in various formats in the same dataset, and does not require a correction for back mutations. However, it is too tedious for manual calculations, since it requires calculations of average square distances, as its name indicates, pair wise between all alleles for each marker in the dataset. There is also a logarithmic method for the TMRCA calculations (Klyosov, 2009a) which we will consider very briefly in this paper.

Generally, a high level of skills for the TMRCA calculations is required to be correct, as much as this term is applicable to statistical mathematical operations. There are seven major difficulties confronting those who think it is easy. First, the dataset (a series of haplotypes) should be subdivided to separate lineages, or branches, each of them had their common ancestor; otherwise, a “phantom” common ancestor, which is a product of various branches, will be “dated”. Second, one should know the mutation rate constants for the haplotype dataset chosen (or available) for calculations. For the linear method the cumulative rate constants should be known (that is for the 12-, 17-, 25-, 37-, 67-, 111-marker panels, or for any other haplotype format). For the quadratic method in which calculations are conducted along each of the marker, the individual mutation rate constants should be known, ideally for each of the 111 markers. Those (both the cumulative and individual mutation rate constants) are largely unknown in the scientific community, or questionable and highly debated. The name “constants” is employed not because all of them are identical across all the markers, but because the mutation rate constant for any given marker remains the same for different haplogroups, in the course of mutations in the same lineage, and does not depend on a size of the dataset. An analogue is radioactive decay in the course of which the half- life time (hence, the decay rate constant) remains the same and does not depend on a size/weight of the sample.

Third, the generation length should be settled, otherwise the TMRCA values remain to be uncertain. Assumptions do not help there, such as typical “assuming a generation time of 30 years” ( Poznik et al., 2016,as an example of the recent publication). Fourth, corrections for back mutations should be introduced when the linear (and the logarithmic) method is employed, and they are largely unknown for population geneticists. Fifth, it is not easy to count mutations in haplotypes, since a number of markers are multi-copy, and they mutate in pairs (DYS385, DYS459, YCAII, DYS413) or quadruples (DYS464). Sixths, some alleles reproducibly show zero values, and many users do not know how to count them. Seventh, compared with a few years ago, datasets now often contain many hundreds and thousands of haplotypes, which makes it practically impossible for manual counting of mutations.

The first four difficulties have been resolved in our studies published earlier (Klyosov, 2009a, 2009b, 2009c; Rozhanskii & Klyosov, 2011; Klyosov et al., 2012, 2012a, 2012b), exemplified with manual calculations. Fifth and sixths difficulties have been generally resolved in the literature, though very few users employ multi markers and zero alleles in TMRCA calculations. In this publication, we resolve that and the last complication, by offering an automatic calculation method as a multifunctional calculator with a capacity up to 10,000 haplotypes in a format from a single marker to the 111 marker haplotypes, and calculations are conducted simultaneously by the linear method in the 6-, 12-, 17-, 25, 37-, 67-, and 111-marker format (the FTDNA nomenclature) along with the 22 “slow markers” selected from the 67 marker panel, as well as by the quadratic method in the any format up to the 111 marker haplotypes along with the 22 “slow marker” haplotypes. Therefore, at least ten TMRCAs could be obtained simultaneously at the same display along with margins of error for each of them, and reliability of the TMRCAs can be compared in real, practical terms.

The Calculator can be downloaded from either of two locations: http://www.anatole-klyosov.com/ (in the section “DNA Genealogy TMRCA Calculator”), or http://dna-academy.ru/kilin-klyosov/ (the first textual link from the bottom). Both downloads are rather slow, due to a size of the file (37 Mb).

The calculator is multifunctional, and runs concurrently two methods, quadratic and linear.

The first one shows the TMRCAs in the first two columns, indexed as KKK111 and KKK22. KKK22 is essentially a part of the KKK111, however, it is based on 22 slowest markers, listed below. The KKK22 method is applicable only to the 22 marker haplotypes. For all other marker datasets the KKK22 column should be ignored. KKK111 is applicable to any haplotype format, and haplotypes can be totally assorted with respect to their set of markers and length of haplotypes. Calculations in the KKK (quadratic) approach are based on the discrete random walk numerical model, known in mathematics.

The linear method is employed for 6, 12, 17, 25, 37, 67, and 111 marker haplotypes, as well as for the slow 22 marker haplotypes. All haplotypes in the dataset should have the same format, since the method employs the specific mutation rate constants for specific length of the haplotypes, as follows (the first column shows a number of markers in the haplotype according the FTDNA nomenclature, the second column shows the mutation rate constant, in mutations per a conditional generation of 25 years):

6 marker 0.0074

12 0.0200

17 0.0365 (with DYS385)

25 0.0460

37 0.0900

67 0.1200

111 0.1980

22 slow markers 0.0054

In the linear method employing “non-standard” haplotypes, such as 5-, 7-, 8-, 9-, 10-, 17 (with DYS426 and DYS388), 18-, 19-, 23-, 39-, 43-marker, as well as any others, the cumulative mutation rate constant can be calculated by summing up the individual mutation rate constants in the calculator (line 7), or in

(line 13) in the Calculator. In the quadratic method all said haplotype formats can be employed without any restrictions. The mutation rate constants for all the 111 markers are shown in

The calculator handles all seven difficulties named above, some of them directly, other indirectly. Regarding the first item (in the list of difficulties), it is preferred that branches (lineages, subclades) to be identified before the calculator is employed, for example by composing a haplotype tree, see, for example,

Figures 7-9 for haplogroup J1-M267, compared to

Regarding the second item, the calculator provides all 111 mutation rate constants for automatic and manual (if needed) calculations. Regarding the third item above, the calculator automatically counts a number of mutations for each panel of haplotypes; it needs just to highlight a sequence of mutations for the given haplotype format and read the cumulative number at the bottom of the computer screen. Regarding the fourth item, the

calculator does not “assume” a length of generation, it employs the conditional generation of 25 years, since the mutation rate constants are adjusted to this generation length, as explained in the Introduction. Regarding the fifth item, in both methods, quadratic and linear, a correction for back mutations is imbedded in the calculator (as it was noticed above, the correction is not required in the quadratic method). Regarding the sixth item, the Calculator counts each multimarker mutation and each zero mutation as a single mutation. Finally, regarding the seventh item, the calculator operates with a number of haplotypes up to 10 thousand, and calculations typically are completed in a few seconds.

If haplotypes are presented in the 12 marker format, the TMRCA will be identical in windows for 12-, 25-, 37, 67-, and 111-marker haplotypes. In the 17 marker window the TMRCA will be different and, clearly, incorrect, since the 12 marker haplotype misses a number of markers for the 17 marker.

The first two short lines, shown in the downloaded calculator, are for guidance only, they show where to paste

a haplotype dataset. The second column (column B) in the downloaded calculator should be kept empty for present time haplotypes. Column B is needed when ancestral (base) haplotypes are employed in calculations (for example, when TMRCA of two ancient base haplotypes is sought). In those cases dates (calculated or actual, from excavated haplotypes, should be written into Column B.

In this section some examples of calculations are provided. It should be noticed that some discrepancies between KKK (quadratic method) and LM (linear method), and between various haplotype formats (in the respective LM columns ? D (6 marker haplotypes), E (12 marker), F (17 marker), G (25 marker), H (37 marker), I (67 marker) and J (111 marker haplotypes) do not necessarily reflect shortcomings of the methods or mistakes in the individual or cumulative values of mutation rate constants. Those discrepancies more likely reflect some non-un- iformity of the haplotype datasets. Those are never ideal, and could not be ideal. Many mutations, and particularly slow mutations (assembled in the 22 marker panel) are inherited and not random in any given haplotype dataset. There are always some distortions in datasets, however, their contributions are compensated in more extended haplotypes and in more numerous haplotype dataset. As one can see, in some extended datasets a fit between KKK and LM in all the columns is fairly consistent.

A haplotype tree for 204 haplotypes of downstream subclades of haplogroup R1b-M269 is shown in

The obtained values of the TRMCA employing the Calculator can be verified using the mutation counter (Abs Deviation line). For all the 111 markers the sum of the individual mutations in each is shown at the bottom of the screen. It says Count: 111, Sum: 6370. For manual calculations, all 204 haplotypes in the 111 marker format show the apparent (observed) number of conditional generations to the common ancestor to be equal 6370/204/0.198 = 157.7, which should be corrected for back mutations as follows:

where:

and the correction equals 0.326/0.281 = 1.16. Therefore, 157.7 (observed) conditional generations become 157.7 × 1.16 = 183 conditional generations, or 4575 ± 461 years to the common ancestor. The calculator gave 4583 ± 462 years, which is practically the same value. The reason for the insignificant difference is because the Calculator does not round up the intermediate numbers. The error margins are calculated using standard statistical approaches (Klyosov, 2009a). In this case the correction is equal to a noticeable figure of 16%, which is accumulated during 4600 years. “Back mutations” in this particular case are those 16% of the mutation alleles which are returned to the initial state, and we cannot see them in the counting by the linear method.

The obtained TMRCA for said 204 individuals to be 4600 ± 600 years which is significantly lower compared to the time of 13,500 ybp when the R1b-M269 subclade was formed as it was calculated employing the SNP values (https://www.yfull.com/tree/R1b/), however, and even significantly lower compared to the formation time for the downstream subclade R1b-M269-L23, which is between 5500 and 7300 ybp for various datasets. It seems that the tree in

A more extended dataset of 596 haplotypes of R1b-M269-L23 in the 67 marker format gave the TMRCA of 4661 ± 468 years, which is practically the same as 4583 ± 462 years, shown above, within the margin of error. When rounded, they gave 4700 ± 500 and 4600 ± 500 years, respectively. Again, it seems that most of those 596 haplotypes belong to downstream L23 subclades. In practical terms, subclade M269 arose in Siberia, and L23 arose apparently to the immediate west of the Ural mountains, however, the dataset represents haplotypes of mainly the Caucasus and Middle East origin (Klyosov, 2012).

A haplotype tree for 968 haplotypes of haplogroup I1-M253 in the 111 marker format is shown in

As

A haplotype tree of 244 haplotypes of haplogroup I2-M438 in the 111 marker format is shown in

The TMRCAs for the 111 and 67 marker haplotypes equal to 7285 ± 734 and 6986 ± 707 years, respectively, which differ by only 4.0% and practically is the same within the margin of error. The 12- and 17-marker haplotypes produce 7542 ± 802 and 6586 ± 686 years, KKK111 (quadratic method) results in 6281 ± 489 years. Overall, the TMRCA is around 7000 ± 700 years. KKK22 and LM22 produce, based on slow markers, a decreased and increased TMRCA, respectively, albeit with a large margin of error, which shows that the 22 markers, per se, do not introduce a systematic error. The problem is in a some non-uniformity of the branch, as it is seen from

A smaller branch of 52 haplotypes, at the lower right side, has the base haplotype as follows: 13 23 16 10 12 12 11 14 12 13 1129 ? 168911 11 25 152130 11 14 14 15 ? 11 1111 21 14 12 1819 33 34 ? 12 10 11 8 16 168 12 10 8 117 12 21 21 16 11 12 12 148 12 22 20 13 13 10 13 11 11 12 11 ? 30 148 16 11 26 27 18 11 11 10 11 11 9 12 11 10 12 12 30 11 13 22 16 10 12 21 15 19 11 25 16 12 14 26 12 22 18 12 15 159 11 11.

The TMRCAs for the 111 and 67 marker haplotypes equal to 5066 ± 521 and 5098 ± 533 years, respectively, which is practically the same. The KKK111 produces 5547 ± 515 years, which is almost equal to LM17, LM25 and LM37, which in turn are equal to 5649 ± 641, 5782 ± 638, and 5168 ± 549 years, within the margin of error.

Overall TMRCA is around 5100 ± 600 years. KKK22 and LM22 produce, based on slow markers, elevated TMRCA, again with a large margin of error, overlapping with all other TMRCA values for this branch.

The third branch, of 35 haplotypes, at the lower left side, has the base haplotype as follows: 13 23 15 10 12 15 11 15 12 14 1130 ? 188911 11 26 141829 11 14 14 15 ? 10 1021 21 14 10 1717 33 35 ? 12 10 11 8 16 178 11 10 8 1210 12 21 21 17 10 12 12 158 14 26 20 11 14 12 13 10 11 12 11 ? 29 158 15 11 26 27 19 11 11 12 12 10 9 13 11 10 11 12 30 11 13 22 16 11 10 23 15 20 10 24 18 12 14 26 12 21 18 12 14 1810 12 11.

The TMRCA for the 111, 67, and 25 marker haplotypes equal to 3542 ± 374, 3517 ± 384, and 3584 ± 437 years, respectively, which is practically the same. Other panels gave either elevated figures (KKK111, KKK22, LM12, LM22) or reduced ones (LM6, LM17, LM37), albeit within margins of error with other panels. Again, there is no any systematic trend in the TMRCAs for any panel, the deviations are due to some heterogeneity of the branches. Overall, the TMRCA is around 3600 ± 400 years.

All three base haplotypes have 108 mutations between them, which gives 108/3/0.198 = 182 à 222 conditional generations, that is 5550 years. The TMRCA for the whole tree in

A haplotype tree of 739 haplotypes of haplogroup J1-M267 in the 111 marker format is shown in

The TMRCAs for the 111, 67 and 37 marker haplotypes equal to 8469 ± 858, 8114 ± 830 and 8274 ± 854 years, respectively, which differ by only 4.0% and are practically the same within the margin of error. KKK111 (quadratic method) results in 9819 ± 1105 years, again within the same margin of error. Overall, the TMRCA is around 8800 ± 700 years. The deviations are clearly in some non-uniformity of the branch, as it is seen from

The branch of 417 haplotypes on the right-hand side of the tree is obviously a rather young, since the bars, representing haplotypes, are short, compared to other two branches. The branch represents maily Arabic haplotypes from rather recent common ancestors. It has the base haplotype as follows: 12 23 14 11 13 19 11 17 11 13 1130 ? 198911 11 26 142025 12 14 16 17 ? 10 1022 22 14 14 1818 32 35 ? 11 10 11 8 15 168 11 10 8 119 12 21 22 18 10 12 12 158 12 26 21 14 12 11 13 12 12 12 11 ? 34 158 15 11 25 2720 13 12 13 11 13 9 1111 10 11 1129 11 13 22 15 11 10 20 15 2010 24 15 11 15 24 12 21 189 15 189 11 11.

The TMRCA values in all the ten panels of the Calculator are shown in

The branch of 248 haplotypes on the left-hand side is a remarkable one. It consists of Jewish and Arabic haplotypes, 98of them contain the “Cohen Modal Haplotype” 12 23 14 10 16 11 in the 6 marker format (DYS 393, 390, 19, 391, 388, 392). None of the 417 haplotypes on the right-hand side contains the CMH “signature”, which, of course, does not belong to Cohens only. Many Arabs have the same “CMH”, which makes it rather the “Abraham Modal Haplotype”, if to continue the Biblical line. Theremaining 138 haplotypes in the left branch contain a mutated “signature”, in a full accord with mutation dynamics. A “half-life time” of the 6-market haplotypes is (ln2)/0.0074 = 94 à 104 conditional generations, that is 2600 years, where 0.0074 is the mutation rate constant for the 6-marker haplotypes, and [ln(N/n)]/k = A is the basic formula for the logarithmic method for the TMRCA calculations (Klyosov, 2009), where N is the total number of haplotypes in the dataset, n is non-muta- ted/ancestral haplotypes in the dataset, and k is the mutation rate constant for the given haplotype format. If to apply the logarithmic formula to the branch on the left, we obtain the TMRCA of the branch equal to [ln(248/98)]/0.0074 = 125 à 145 conditional generations, that is 3625 ± 515 years. It is equal within the margin of error to 3849 ± 388 years, obtained from the 111 marker haplotypes by the linear method of calculations (

The branch representing the “Abraham” branch of the tree in

It is remarkable again, that the TMRCA values are practically the same for all the ten panels in the Calculator, within the margin of error (

All 248 haplotypes in the “Abraham” branch have 6651 mutations from the base haplotype, shown above (the figure shown in the calculator display when all 111 numbers in line 13 are highlighted), which gives for manual calculations the TMRCA of 6651/248/0.198 = 135 à 154 conditional generations, or 3850 ± 390 years. This is virtually identical with the TMRCA of 3849 ± 388 years produced by the Calculator.

The base haplotypes for the branches on the left and on the right hand side in

A haplotype tree of 275 haplotypes of haplogroup N1c1-M46 in the 111 marker format is shown in

The TMRCA values in all the ten panels of the Calculator, shown in

A haplotype tree of 829 haplotypes of haplogroup R1b-U106 in the 111 marker format is shown in

The TMRCA values in all the ten panels of the Calculator, shown in

The base haplotypes for R1b-M269-… (see above) and R1b-U106 are surprisingly similar, there are only 8 mutations between them. This translates to 8/0.198 = 40 à 42 conditional generations, or 1050 years between them. This date can be interpreted differently depending on whether the R1b-M269… subclades are parent ones with respect to R1b-M269-L23-U106, or they are “parallel”, such as R1b-M269-L23-Z2103.

Hundreds of examples could have been listed here, illustrating usage of the Calculator for the last several years, however, we restrict ourselves with a few more figures for some important subclades which are often considered in the literature. For the 3466 haplotypes of R1b-L21 subclade, the TMRCAs for 111, 67, 37, 25 and 17 marker haplotypes are equal to 3810 ± 381, 3841 ± 384, 3576 ± 358, 3571 ± 358, and 3679 ± 369 years. Here, across this paper, we give dates with excessive number of digits, unrealistic for practical purposes, just to show the Calculator outputs. For any practical goals, the figures should be rounded.

For 113 haplotypes of R1a-Z283 subclade, the TMRCAs for 67, 37, 17 and 12 marker haplotypes are equal to 4503 ± 461, 4898 ± 505, 4549 ± 492, and 4529 ± 513 years. For 24 haplotypes of the same haplogroupin the 111 marker format the KKK111 equals to 4281 ± 553 years.

For 754 haplotypes of R1a-M458 subclade, the TMRCAs for 67, 37, 25, 17 and 12 marker haplotypes are equal to 3668 ± 368, 3799 ± 382, 3866 ± 391, 3308 ± 336, and 3833 ± 393 years.

We will give here only two examples which illustrate an appropriate fit of the Calculator data to the documented genealogy, representing two quite different cases. In one, a series, albeit small, of 111 marker haplotypes was available. In another, a set of assorted haplotypes in different formats was provided. Other cases of documented genealogy along with several haplotypes are typically positioned between these two extreme cases. Larger sets of haplotypes coupled with documented genealogy data practically always are non-uniform ones, such as those of the Donald Clan of haplogroup R1a (

“phantom” common ancestor of the dataset of 151 haplotypes with the TMRCA of 1031 ± 110, 1034 ± 113, and 1054 ± 128 years for the 67, 37, and 17 marker haplotypes, respectively. The KKK111 gave the TMRCA of 828 ± 116 years. The discrepancy obviously reflects a non-uniformity of the haplotype dataset.

Let us consider five 111 marker haplotypes of Ashkenazi Jews (the Horowitz rabbinical family) of R1a-Z93- Z94-YP264 subclade. Their known documented genealogy places their common ancestor to 1507-1572, that is 442 - 507 years ago. [https://sites.google.com/site/levitedna/y-dna-analysis/snp-analysis---klyosov-s-comment]. The Calculator gives 436 ± 114, 466 ± 148, and 454 ± 167 years ago by the 111, 67, and 37 marker panels.

Another example concerns Capt. Thomas Osborne, b 1580 in England, who came to Virginia in 1619 (http://freepages.genealogy.rootsweb.ancestry.com/~tlosborne/AusburnSurnameProject/fdnatip.htm), that is 397 years ago. A set of 10 assorted haplotypes in the 12, 25, and 37 marker format was provided in the above link, in which the first 12 markers were identical in all the ten haplotypes, eight 25 marker haplotypes contained four mutations, and four 37 marker haplotypes contained two mutations. Only the quadratic method could have been employed in this case, and the KKK111 panel showed 379 ± 270 years to a common ancestor.

R1a-R1b

R1a-A0

A0-A00

Let us consider the most ancient Homo sapiens Y Chromosomal haplogroup known to date, which is haplogroup A00. An approach based on counting SNP-mutations, provided the TMRCA values between 208,000 years (Elhaik et al., 2014) and 235,900 years (https://www.yfull.com/tree/A00/); the earlier value of 338,000 years (Mendez et al., 2013) was dismissed in the literature, since the calculations employed too low mutation rates taken from autosomal data.

R1b-Chimpanzee

It is assumed in the literature that a common ancestor of chimpanzee and modern humans lived about 4 - 5 million years ago. Those assumptions are based on some anthropological data, and direct estimates are absent. We have conducted search for chimpanzee Y chromosome markers in the European Nucleotide Archive (ENA) database, employing Whole Genome Shotgun (WGS), and in the National Center for Biotechnology Information (NCBI) GenBank and ENA, as described in (Klyosov et al., 2012). We have succeeded in retrieving alleles of 16 chimpanzee Y chromosome markers, as indicated in

The database Y Search contains a different haplotype of chimpanzee, under ID 6RCUU (http://www.ysearch.org/lastname_view.asp?uid=&letter=&lastname=chimpanzee&viewuid=6RCUU&p=0) lis- ted there by Thomas Krahn. The haplotype differs, albeit significantly, from the base R1b-P312 haplotype only in a few markers, and in the rest it is too close to R1b-P312. It resulted in an unrealistically recent TMRCA for chimpanzee and humans, that was about 143,000 years (

Since 2006 a few sets of individual mutation rate constants for 67 and 111 markers have appeared. Among them are Chandler table (2006) for the first 37 markers, then upgraded to 67 markers, Heinila table (Heinila, 2012) for a set of 111 markers (Genealogy-DNA Digest, vol. 9, Issue 232), and an unnamed set of mutation rates known as “estimated for 3565 haplotypes” (Anonymous, 2014). Besides, there are extended father-son studies for intended 111 markers by Ballantyne et al. (2010) and Burgarella et al. (2011), which have many omissions in their series, and, as it will be shown below, the Ballantyne table is practically not applicable for the TMRCA calculations. Since the Burgarella et al. data are similar in kind with the latter, it is not considered here.

The Chandler’s table of mutation rates, despite its history for 10 years, was barely used for TMRCA calculations in the literature. Partly it was neglected because of so-called “Zhivotovsky mutation rates”, or (a synonym) “population mutation rates” was published in 2004 (Zhivotovsky et al., 2004), and soon became the only accepted mutation rates for reviewers of academic publications. Besides, it did not reach the scientific community because 1) the scientific community was not ready for individual mutation rate constants, and 2) the mutation rates were set per “generation” while chronology is history is not measured in “generations”, it is measured in years; however, Chandler did not provide a factor which would allow to translate “generations” to years. The same was essentially applicable to the Heinila and the “3565 haplotypes” tables, which were not published in the scientific literature, and were known only in the net. It is not very productive to provide a loose criticism of the tables, instead, we will give here some specific examples what those tables result in.

The Ballantyne table presents one big confusion. Unfortunately, those who employ their table, do not realize that their data are not applicable for actual calculations. First, the table has too many omissions. For about 1700 father-son pairs (tested for intended 111 markers), 24 markers were not tested at all, in 17 additional markers there were no any mutation, in 15 additional markets there was only one mutation (per 1700 father-son pairs), which altogether makes 56 markers out of 111 (that is 50% of all) being non-functional in terms of the mutation rate constants. On top of it, in 11 markers there were only two mutations, which do not provide any meaningful statistics. Overall, two-thirds of those 111 markers are practically non-usable. Some estimated mutation rates are obviously erroneous, due to, probably, poor statistics, such as mutation rate for DYS393 reported to be faster than that for DYS390 (0.00211 and 0.00152, respectively, in mutations per generation), while anyone who works with mutations in Y chromosome knows that reverse is true (0.00059 and 0.00220, respectively, in mutations per 25 years, see

Ballantyne et al. (2010) were 70 years old fathers. As a result, the Ballantyne and Burgarella tables of mutation rates cannot be employed for meaningful TMRCA calculations.

Let us show it by employing several specific examples, taking a conditional generation for 25 years (if someone wants to take it for 30 years or any other timespan, the data can be easily recalculated, by adjusting the mutation rate constants accordingly).

Example 1. The TMRCA for five 111 marker haplotypes of the Horowitz rabbinical family (see above), whose known documented genealogy places their common ancestor to 1507-1572 AD, that is 442 - 507 years ago. A comparison of the TMRCAs for said four mutation rate tables is shown in

The Ballantyne mutation rate table misses 20 markers out of the 111 markers, hence, it misses several mutations with them. The differences with the Calculator TMRCAs can be explained if a “generation” in the Ballantyne table would be taken as 35 to 39 years, rather than a “conditional generation” of 25 years employed for a calibration of mutation rate constants in our study, but how people who calculate the TMRCAs would know it?

The Chandler mutation rate table (data for only 67 markers are available) gives the TMRCAs, which present only half of the timing of the documented genealogy, and of the Calculator data. This was expected, since the Chandler cumulative mutation rates for the 12, 25, 37, and 67 marker haplotypes (0.0224, 0.0695, 0.182, and 0.224 per generation, respectively) were progressively inflated compared to the Calculator values (0.020, 0.046, 0.009, and 0.120 mutations per 25 years).

The Heinila table gives the TMRCAs, which are again much lower compared to the documented genealogy data and to the Calculator data. This again was expected, since the Heinila cumulative mutation rates for the 12, 25, 37, 67, and 111 marker haplotypes (0.0243, 0.0605, 0.132, 0.173, and 0.291 per generation, respectively) were progressively inflated compared to the Calculator values (0.020, 0.046, 0.009, 0.120, and 0.198 mutations per 25 years).

The anonymous “estimated with 3565 haplotypes” set gives the TMRCAs, which are again much lower compared to the documented genealogy data and to the Calculator data. This again was expected, since the “3565 haplotypes” cumulative mutation rates for the 12, 25, 37, 67, and 111 marker haplotypes (0.021, 0.054, 0.122, 0.158, and 0.255 per generation, respectively) were progressively inflated compared to the Calculator values (0.020, 0.046, 0.009, 0.120, and 0.198 mutations per 25 years).

Example 2. The TMRCAs for 829 haplotypes of haplogroup R1b-U106 in the 111 marker format are shown in

As always, the Chandler mutation rate table (data for only 67 markers are available) gives the TMRCAs, which present only half of the Calculator TMRCA data. The reason for it is explained in the preceding section.

The Heinila table gives the TMRCAs, which are again much lower compared to the Calculator TMRCA.

111 marker haplotype | 67 marker haplotype | 37 marker haplotype | |
---|---|---|---|

Documented genealogy | ********************442 - 507********************* | ||

Kilin-Klyosov calculator | 436 | 466 | 454 |

Chandler | n.a. | 249 | 225 |

Ballantyne | 317 | 393 | 313 |

Heinila | 297 | 323 | 309 |

Anonymous “3565 haplo” | 338 | 354 | 335 |

The anonymous “estimated with 3565 haplotypes” set gives the TMRCAs, which are again much lower compared to the Calculator TMRCA.

Example 3. The TMRCAs for 3466 haplotypes of haplogroup R1b-L21 in the 111 marker format are shown in

As always, the Chandler mutation rate table (data for only 67 markers are available) gives the TMRCAs, which present only half of the Calculator TMRCA data. The reason for it is explained in the preceding section.

The Heinila table gives the TMRCAs, which are again is significantly (33% lower) compared to the Calculator TMRCA.

The anonymous “estimated with 3565 haplotypes” set gives the TMRCAs, which are again significantly lower (22% - 26% lower) compared to the Calculator TMRCA.

Example 4. The TMRCAs for 968 haplotypes of haplogroup I1-M253 in the 111 marker format are shown in

As always, the Chandler mutation rate table (data for only 67 markers are available) gives the TMRCAs, which present only half of the Calculator TMRCA data. The reason for it is explained in the preceding sections.

111 marker haplotype | 67 marker haplotype | 37 marker haplotype | |
---|---|---|---|

Kilin-Klyosov calculator | 3584 | 3780 | 3958 |

Chandler | n.a. | 2022 | 1956 |

Ballantyne | 2230 | 2406 | 2944 |

Heinila | 2440 | 2622 | 2691 |

Anonymous “3565 haplo” | 2780 | 2868 | 2919 |

111 marker haplotype | 67 marker haplotype | 37 marker haplotype | |
---|---|---|---|

Kilin-Klyosov calculator | 3810 | 3841 | 3576 |

Chandler | n.a. | 2055 | 1768 |

Ballantyne | 2298 | 2338 | 2462 |

Heinila | 2594 | 2665 | 2431 |

Anonymous “3565 haplo” | 2955 | 2914 | 2637 |

111 marker haplotype | 67 marker haplotype | 37 marker haplotype | |
---|---|---|---|

Kilin-Klyosov calculator | 3686 | 3618 | 3469 |

Chandler | n.a. | 1936 | 1715 |

Ballantyne | 2201 | 2264 | 2582 |

Heinila | 2510 | 2511 | 2358 |

Anonymous “3565 haplo” | 2680 | 2746 | 2558 |

The Heinila table gives the TMRCAs, which are again is significantly (23% - 32% lower) compared to the Calculator TMRCA.

The anonymous “estimated with 3565 haplotypes” set gives the TMRCAs, which are again significantly lower (22% - 26% lower) compared to the Calculator TMRCA.

Example 5. The TMRCA for several pairs of haplotypes descended from ancient common ancestors are shown in

As always, the Chandler mutation rate table (data for only 67 markers are available) gives the TMRCAs, which present only half of the Calculator TMRCA data. The reason for it is explained in the preceding sections.

The Heinila table gives the TMRCAs, which are again is significantly (23% - 32% lower) compared to the Calculator TMRCA.

The anonymous “estimated with 3565 haplotypes” set gives the TMRCAs, which are again significantly lower (22% - 26% lower) compared to the Calculator TMRCA.

This paper presents a multifunctional automatic TMRCA Calculator for haplotypes of any number of markers within 111 markers in the FTDNA nomenclature. A number of haplotype datasets from a few assorted (in terms of their markers) haplotypes to thousands of 111 marker haplotypes were presented, within a time span of their TMRCAs from a few hundred years to four million years. The Calculator showed an appropriate reproducibility and cross-verification between the datasets and with respect to documented genealogy and SNP-based and anthropological data. In the course of examining the Calculator, with hundreds of datasets over several years, the conclusion was that the capacity and reliability of the Calculator is unmatched. It operates with multi marker alleles, with zero markers, it detects “foreign” haplotypes which do not belong to the dataset (spotting excessive dispersion of alleles) and switches the respective marker off, and this feature of the Calculator can be regulated on different levels of sensitivity.

An attention was paid to other mutation rate tables, exemplified with tables by Chandler, Ballantyne et al., Heinila, and an anonymous author whose table is mentioned not once in the literature. It was shown that those tables commonly results in significant underestimates of the TMRCAs. The most deviated TMRCAs have resulted by using the Ballantyne et al. table of mutation rates, particularly when a fraction of “slow” markers is rather high. The reason is simple―Ballantyne et al. employed the father-son estimates, and typically could not detect even a single mutation between “slow markers” in ~1700 father-son pairs, that is below 1/1700 = 0.0006 mutation/generation. There are at least 40 of such markers among the 111 marker panel. Surprisingly, Balantyne et al. assigned “material” mutation rates to each one of them, and almost all of them were significantly overestimated. As a result, the TMRCA for the R1a ? R1b haplogroups (which by definition is R1 haplogroup, which arose 28,200 ± 2300 years ago, as determined using SNP data (https://www.yfull.com/tree/R1/), and for which the Calculator gave 28,000 years, Ballantyne et al. mutation rates gave 9000 years, which is totally unrealistic figure. The trend continues across all datasets, considered in this work: the TMRCA for A0 (187,000 years by

R1a ? R1b | R1a ? A0 | A0 ? A00 | Human-chimp | |
---|---|---|---|---|

Kilin-Klyosov calculator | 28,000 | 187,000 | 217,000 | 4,290,000 |

Chandler | 34,000 | 82,000 | 121,000 | 1,930,000 |

Ballantyne | 9000 | 21,000 | 41,000 | 604,000 |

Heinila | 26,000 | 116,000 | 130,000 | 1,808,000 |

Anonymous “3565 haplotypes” | 34,000 | 61,000 | 163,000 | 2,355,000 |

the Calculator) Ballantyne et al. data gave 21,000 years, for A00 (217,000 years by the Calculator, and between 209,000 and 235,000 by the genomic data), Ballantyne et al. gave 41,000 years, for chimpanzee (4.29 million years by the Calculator), Ballantyne et al. data gave 604,000 years, which is absurd, of course.

The other three tables of mutation rates turned out to be suffered from overestimates of mutation rates from slow markers in particular. Generally, it is clear why. Slow markers are sensitive to inherited mutations, which stick to a dataset in multiple quantities, which are in fact not random, but inherited. Let us consider, as an example, a dataset of 100 haplotypes, in which DYS426 (a slow marker, which mutates on average once in 1/0.00009 = 11,111 generations) mutated―randomly―just once (statistically, it is quite possible, particularly if the TMRCA for the dataset equals 111 conditional generations, that is about 2800 years). Then the dataset contains just one mutated DYS426, if it does not includes members of the same lineage. If, however, the dataset includes 10 members of this particular lineage, each with the same mutation in DYS426, then the dataset contains 10 DYS426 mutated haplotypes, and the mutation rate for DYS426 would be taken as 10 times higher. To avoid such mistakes, haplotype trees should be considered for each analyzed data, in which such repeated allele, particularly in slow markers, would form a separate branch. Apparently, such an approach was not considered in Chandler, Heinila, and other mutation rate tables, and many slow markers produced overestimated values. The Heinila table as well as of the anonymous table showed better results compared to the Chandler table, however, they also deviated quite significantly for some datasets.

Haplotype trees were composed using PHYLIP, the Phylogeny Inference Package program (Felsenstein, 2004); for multiple examples of usage and calculations see Klyosov, 2009a, 2009b, 2009c, 2012; Klyosov and Rozhanskii, 2012a, 2012b, and references therein).

The authors express hope that a long debate over which mutation rates to employ, how to count mutations, and how to calculate the TMRCA in various cases, including complicated ones, is eventually over.

The authors are indebted to Susan M. Hedeen for her valuable help in examining the Calculator and with the preparation of the manuscript.