Molecular clock
The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleotide sequences for DNA, RNA, or amino acid sequences for proteins. The benchmarks for determining the mutation rate are often fossil or archaeological dates. The molecular clock was first tested in 1962 on the hemoglobin protein variants of various animals, and is commonly used in molecular evolution to estimate times of speciation or radiation. It is sometimes called a gene clock or an evolutionary clock.
Part of a series on |
Evolutionary biology |
---|
|
Early discovery and genetic equidistance
The notion of the existence of a so-called "molecular clock" was first attributed to Émile Zuckerkandl and Linus Pauling who, in 1962, noticed that the number of amino acid differences in hemoglobin between different lineages changes roughly linearly with time, as estimated from fossil evidence.[1] They generalized this observation to assert that the rate of evolutionary change of any specified protein was approximately constant over time and over different lineages (known as the molecular clock hypothesis).
The genetic equidistance phenomenon was first noted in 1963 by Emanuel Margoliash, who wrote: "It appears that the number of residue differences between cytochrome c of any two species is mostly conditioned by the time elapsed since the lines of evolution leading to these two species originally diverged. If this is correct, the cytochrome c of all mammals should be equally different from the cytochrome c of all birds. Since fish diverges from the main stem of vertebrate evolution earlier than either birds or mammals, the cytochrome c of both mammals and birds should be equally different from the cytochrome c of fish. Similarly, all vertebrate cytochrome c should be equally different from the yeast protein."[2] For example, the difference between the cytochrome c of a carp and a frog, turtle, chicken, rabbit, and horse is a very constant 13% to 14%. Similarly, the difference between the cytochrome c of a bacterium and yeast, wheat, moth, tuna, pigeon, and horse ranges from 64% to 69%. Together with the work of Emile Zuckerkandl and Linus Pauling, the genetic equidistance result directly led to the formal postulation of the molecular clock hypothesis in the early 1960s.[3]
Similarly, Vincent Sarich and Allan Wilson in 1967 demonstrated that molecular differences among modern Primates in albumin proteins showed that approximately constant rates of change had occurred in all the lineages they assessed.[4] The basic logic of their analysis involved recognizing that if one species lineage had evolved more quickly than a sister species lineage since their common ancestor, then the molecular differences between an outgroup (more distantly related) species and the faster-evolving species should be larger (since more molecular changes would have accumulated on that lineage) than the molecular differences between the outgroup species and the slower-evolving species. This method is known as the relative rate test. Sarich and Wilson's paper reported, for example, that human (Homo sapiens) and chimpanzee (Pan troglodytes) albumin immunological cross-reactions suggested they were about equally different from Ceboidea (New World Monkey) species (within experimental error). This meant that they had both accumulated approximately equal changes in albumin since their shared common ancestor. This pattern was also found for all the primate comparisons they tested. When calibrated with the few well-documented fossil branch points (such as no Primate fossils of modern aspect found before the K-T boundary), this led Sarich and Wilson to argue that the human-chimp divergence probably occurred only ~4–6 million years ago.[5]
Relationship with neutral theory
The observation of a clock-like rate of molecular change was originally purely phenomenological. Later, the work of Motoo Kimura[6] developed the neutral theory of molecular evolution, which predicted a molecular clock. Let there be N individuals, and to keep this calculation simple, let the individuals be haploid (i.e. have one copy of each gene). Let the rate of neutral mutations (i.e. mutations with no effect on fitness) in a new individual be . The probability that this new mutation will become fixed in the population is then 1/N, since each copy of the gene is as good as any other. Every generation, each individual can have new mutations, so there are N new neutral mutations in the population as a whole. That means that each generation, new neutral mutations will become fixed. If most changes seen during molecular evolution are neutral, then fixations in a population will accumulate at a clock-rate that is equal to the rate of neutral mutations in an individual.
Calibration
The molecular clock alone can only say that one time period is twice as long as another: it cannot assign concrete dates. For viral phylogenetics and ancient DNA studies—two areas of evolutionary biology where it is possible to sample sequences over an evolutionary timescale—the dates of the intermediate samples can be used to more precisely calibrate the molecular clock. However, most phylogenies require that the molecular clock be calibrated against independent evidence about dates, such as the fossil record.[7] There are two general methods for calibrating the molecular clock using fossil data: node calibration and tip calibration.[8]
Node calibration
Sometimes referred to as node dating, node calibration is a method for phylogeny calibration that is done by placing fossil constraints at nodes. A node calibration fossil is the oldest discovered representative of that clade, which is used to constrain its minimum age. Due to the fragmentary nature of the fossil record, the true most recent common ancestor of a clade will likely never be found.[8] In order to account for this in node calibration analyses, a maximum clade age must be estimated. Determining the maximum clade age is challenging because it relies on negative evidence—the absence of older fossils in that clade. There are a number of methods for deriving the maximum clade age using birth-death models, fossil stratigraphic distribution analyses, or taphonomic controls.[9] Alternatively, instead of a maximum and a minimum, a prior probability of the divergence time can be established and used to calibrate the clock. There are several prior probability distributions including normal, lognormal, exponential, gamma, uniform, etc.) that can be used to express the probability of the true age of divergence relative to the age of the fossil;[10] however, there are very few methods for estimating the shape and parameters of the probability distribution empirically.[11] The placement of calibration nodes on the tree informs the placement of the unconstrained nodes, giving divergence date estimates across the phylogeny. Historical methods of clock calibration could only make use of a single fossil constraint (non-parametric rate smoothing),[12] while modern analyses (BEAST[13] and r8s[14]) allow for the use of multiple fossils to calibrate the molecular clock. Simulation studies have shown that increasing the number of fossil constraints increases the accuracy of divergence time estimation.[15]
Tip calibration
Sometimes referred to as tip dating, tip calibration is a method of molecular clock calibration in which fossils are treated as taxa and placed on the tips of the tree. This is achieved by creating a matrix that includes a molecular dataset for the extant taxa along with a morphological dataset for both the extinct and the extant taxa.[9] Unlike node calibration, this method reconstructs the tree topology and places the fossils simultaneously. Molecular and morphological models work together simultaneously, allowing morphology to inform the placement of fossils.[8] Tip calibration makes use of all relevant fossil taxa during clock calibration, rather than relying on only the oldest fossil of each clade. This method does not rely on the interpretation of negative evidence to infer maximum clade ages.[9]
Total evidence dating
This approach to tip calibration goes a step further by simultaneously estimating fossil placement, topology, and the evolutionary timescale. In this method, the age of a fossil can inform its phylogenetic position in addition to morphology. By allowing all aspects of tree reconstruction to occur simultaneously, the risk of biased results is decreased.[8] This approach has been improved upon by pairing it with different models. One current method of molecular clock calibration is total evidence dating paired with the fossilized birth-death (FBD) model and a model of morphological evolution.[16] The FBD model is novel in that it allows for “sampled ancestors,” which are fossil taxa that are the direct ancestor of a living taxon or lineage. This allows fossils to be placed on a branch above an extant organism, rather than being confined to the tips.[17]
Methods
Bayesian methods can provide more appropriate estimates of divergence times, especially if large datasets—such as those yielded by phylogenomics—are employed.[18]
Non-constant rate of molecular clock
Sometimes only a single divergence date can be estimated from fossils, with all other dates inferred from that. Other sets of species have abundant fossils available, allowing the hypothesis of constant divergence rates to be tested. DNA sequences experiencing low levels of negative selection showed divergence rates of 0.7–0.8% per Myr in bacteria, mammals, invertebrates, and plants.[19] In the same study, genomic regions experiencing very high negative or purifying selection (encoding rRNA) were considerably slower (1% per 50 Myr).
In addition to such variation in rate with genomic position, since the early 1990s variation among taxa has proven fertile ground for research too,[20] even over comparatively short periods of evolutionary time (for example mockingbirds[21]). Tube-nosed seabirds have molecular clocks that on average run at half speed of many other birds,[22] possibly due to long generation times, and many turtles have a molecular clock running at one-eighth the speed it does in small mammals, or even slower.[23] Effects of small population size are also likely to confound molecular clock analyses. Researchers such as Francisco J. Ayala have more fundamentally challenged the molecular clock hypothesis.[24][25][26] According to Ayala's 1999 study, five factors combine to limit the application of molecular clock models:
- Changing generation times (If the rate of new mutations depends at least partly on the number of generations rather than the number of years)
- Population size (Genetic drift is stronger in small populations, and so more mutations are effectively neutral)
- Species-specific differences (due to differing metabolism, ecology, evolutionary history, ...)
- Change in function of the protein studied (can be avoided in closely related species by utilizing non-coding DNA sequences or emphasizing silent mutations)
- Changes in the intensity of natural selection.
Molecular clock users have developed workaround solutions using a number of statistical approaches including maximum likelihood techniques and later Bayesian modeling. In particular, models that take into account rate variation across lineages have been proposed in order to obtain better estimates of divergence times. These models are called relaxed molecular clocks[27] because they represent an intermediate position between the 'strict' molecular clock hypothesis and Joseph Felsenstein's many-rates model[28] and are made possible through MCMC techniques that explore a weighted range of tree topologies and simultaneously estimate parameters of the chosen substitution model. It must be remembered that divergence dates inferred using a molecular clock are based on statistical inference and not on direct evidence.
The molecular clock runs into particular challenges at very short and very long timescales. At long timescales, the problem is saturation. When enough time has passed, many sites have undergone more than one change, but it is impossible to detect more than one. This means that the observed number of changes is no longer linear with time, but instead flattens out. Even at intermediate genetic distances, with phylogenetic data still sufficient to estimate topology, signal for the overall scale of the tree can be weak under complex likelihood models, leading to highly uncertain molecular clock estimates.[29]
At very short time scales, many differences between samples do not represent fixation of different sequences in the different populations. Instead, they represent alternative alleles that were both present as part of a polymorphism in the common ancestor. The inclusion of differences that have not yet become fixed leads to a potentially dramatic inflation of the apparent rate of the molecular clock at very short timescales.[30][31]
Uses
The molecular clock technique is an important tool in molecular systematics, the use of molecular genetics information to determine the correct scientific classification of organisms or to study variation in selective forces. Knowledge of approximately constant rate of molecular evolution in particular sets of lineages also facilitates establishing the dates of phylogenetic events, including those not documented by fossils, such as the divergence of living taxa and the formation of the phylogenetic tree. In these cases—especially over long stretches of time—the limitations of the molecular clock hypothesis (above) must be considered; such estimates may be off by 50% or more.
See also
References
- Zuckerkandl, E. and Pauling, L.B. (1962). "Molecular disease, evolution, and genic heterogeneity". In Kasha, M.; Pullman, B (eds.). Horizons in Biochemistry. Academic Press, New York. pp. 189–225.CS1 maint: multiple names: authors list (link)
- Margoliash E (October 1963). "Primary Structure and Evolution of Cytochrome C". Proc. Natl. Acad. Sci. U.S.A. 50 (4): 672–9. Bibcode:1963PNAS...50..672M. doi:10.1073/pnas.50.4.672. PMC 221244. PMID 14077496.
- Kumar S (August 2005). "Molecular clocks: four decades of evolution". Nat. Rev. Genet. 6 (8): 654–62. doi:10.1038/nrg1659. PMID 16136655. S2CID 14261833.
- Sarich, V M; Wilson, A C (July 1967). "Rates of albumin evolution in primates". Proceedings of the National Academy of Sciences of the United States of America. 58 (1): 142–148. Bibcode:1967PNAS...58..142S. doi:10.1073/pnas.58.1.142. ISSN 0027-8424. PMC 335609. PMID 4962458.
- Sarich, Vincent M.; Wilson, Allan C. (1967). "Immunological Time Scale for Hominid Evolution". Science. 158 (3805): 1200–1203. Bibcode:1967Sci...158.1200S. doi:10.1126/science.158.3805.1200. JSTOR 1722843. PMID 4964406. S2CID 7349579.
- Kimura, Motoo (1968). "Evolutionary rate at the molecular level". Nature. 217 (5129): 624–626. Bibcode:1968Natur.217..624K. doi:10.1038/217624a0. PMID 5637732. S2CID 4161261.
- Benton, M. J. & Donoghue, P. C. J. (2007). "Paleontological evidence to date the Tree of Life". Molecular Biology & Evolution. 24 (1): 26–53. doi:10.1093/molbev/msl150. PMID 17047029.
- Donoghue, P.C.J. & Ziheng, Y. (2016). "The evolution of methods for establishing evolutionary timescales". Phil. Trans. R. Soc. B. 371 (1): 20160020. doi:10.1098/rstb.2016.0020. PMC 4920342. PMID 27325838.
- O'Reilly, J. E. & Mario D. R. (2015). "Dating Tips for Divergence-Time Estimation" (PDF). Trends in Genetics. 31 (11): 637–650. doi:10.1016/j.tig.2015.08.001. hdl:1983/ba7bbcf4-1d51-4b74-a800-9948edb3bbe6. PMID 26439502.
- Drummond A, Suchard MA, Xie D, Rambaut A (2012). "Bayesian phylogenetics with BEAUti and the BEAST 1.7". Molecular Biology and Evolution. 29 (8): 1969–1973. doi:10.1093/molbev/mss075. PMC 3408070. PMID 22367748.
- Claramunt, S.; Cracraft, J. (2015). "A new time tree reveals Earth history's imprint on the evolution of modern birds". Sci Adv. 1 (11): e1501005. Bibcode:2015SciA....1E1005C. doi:10.1126/sciadv.1501005. PMC 4730849. PMID 26824065.
- Sanderson, M. (1997). "A nonparametric approach to estimating divergence times in the absence of rate constancy" (PDF). Molecular Biology and Evolution. 14 (12): 1218–1231. doi:10.1093/oxfordjournals.molbev.a025731. S2CID 17647010.
- Drummond A, Suchard MA, Xie D, Rambaut A (2012). "Bayesian phylogenetics with BEAUti and the BEAST 1.7". Molecular Biology and Evolution. 29 (8): 1969–1973. doi:10.1093/molbev/mss075. PMC 3408070. PMID 22367748.
- Sanderson, M. (2003). "r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock" (PDF). Bioinformatics. 19 (2): 301–302. doi:10.1093/bioinformatics/19.2.301. PMID 12538260.
- Zheng Y. & Wiens J. J. (2015). "Do missing data influence the accuracy of divergence-time estimation with BEAST?". Molecular Phylogenetics and Evolution. 85 (1): 41–49. doi:10.1016/j.ympev.2015.02.002. PMID 25681677.
- Heath, T. A. & Huelsenbeck, J. P. (2014). "The fossilized birth–death process for coherent calibration of divergence-time estimates". PNAS. 111 (29): E2957–E2966. arXiv:1310.2968. Bibcode:2014PNAS..111E2957H. doi:10.1073/pnas.1319091111. PMC 4115571. PMID 25009181.
- Gavryushkina, A.; Heath, T. A.; Ksepka, D. T.; Stadler, T.; Welch, D. & Drummond, A. J. (2016). "Bayesian Total-Evidence Dating Reveals the Recent Crown Radiation of Penguins". Systematic Biology. 66 (1): 1–17. arXiv:1506.04797. doi:10.1093/sysbio/syw060. PMC 5410945. PMID 28173531.
- Dos Reis, M.; Inoue, J.; Hasegawa, M.; Asher, R. J.; Donoghue, P. C. J.; Yang, Z. (2012). "Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny". Proceedings of the Royal Society B: Biological Sciences. 279 (1742): 3491–3500. doi:10.1098/rspb.2012.0683. PMC 3396900. PMID 22628470.
- Ochman H, Wilson AC (1987). "Evolution in bacteria: evidence for a universal substitution rate in cellular genomes". J Mol Evol. 26 (1–2): 74–86. Bibcode:1987JMolE..26...74O. doi:10.1007/BF02111283. PMID 3125340. S2CID 8260277.
- Douzery, E.J.P., Delsuc, F., Stanhope, M.J. and Huchon, D. (2003). "Local molecular clocks in three nuclear genes: divergence times for rodents and other mammals, and incompatibility among fossil calibrations" (PDF). Journal of Molecular Evolution. 57: S201–S213. Bibcode:2003JMolE..57S.201D. CiteSeerX 10.1.1.535.897. doi:10.1007/s00239-003-0028-x. PMID 15008417. S2CID 23887665.CS1 maint: multiple names: authors list (link)
- Hunt, J.S., Bermingham, E., and Ricklefs, R.E. (2001). "Molecular systematics and biogeography of Antillean thrashers, tremblers, and mockingbirds (Aves: Mimidae)". Auk. 118 (1): 35–55. doi:10.1642/0004-8038(2001)118[0035:MSABOA]2.0.CO;2. ISSN 0004-8038.CS1 maint: multiple names: authors list (link)
- Rheindt, F. E. & Austin, J. (2005). "Major analytical and conceptual shortcomings in a recent taxonomic revision of the Procellariiformes – A reply to Penhallurick and Wink (2004)" (PDF). Emu. 105 (2): 181–186. doi:10.1071/MU04039. S2CID 20390465.
- Avise, J.C., Bowen, W., Lamb, T., Meylan, A.B. and Bermingham, E. (1 May 1992). "Mitochondrial DNA Evolution at a Turtle's Pace: Evidence for Low Genetic Variability and Reduced Microevolutionary Rate in the Testudines". Molecular Biology and Evolution. 9 (3): 457–473. doi:10.1093/oxfordjournals.molbev.a040735. PMID 1584014.CS1 maint: multiple names: authors list (link)
- Ayala, F.J. (1999). "Molecular clock mirages". BioEssays. 21 (1): 71–75. doi:10.1002/(SICI)1521-1878(199901)21:1<71::AID-BIES9>3.0.CO;2-B. PMID 10070256. Archived from the original on 16 December 2012.
- Schwartz, J. H. & Maresca, B. (2006). "Do Molecular Clocks Run at All? A Critique of Molecular Systematics". Biological Theory. 1 (4): 357–371. CiteSeerX 10.1.1.534.4502. doi:10.1162/biot.2006.1.4.357. S2CID 28166727. Lay summary – Science Daily.
- Pascual-García, A.; Arenas, M. & Bastolla, U. (2019). "The molecular clock in the evolution of protein structures". Systematic Biology. 68 (6): 987–1002. doi:10.1093/sysbio/syz022. PMID 31111152. Lay summary.
- Drummond, A.J., Ho, S.Y.W., Phillips, M.J. and Rambaut A. (2006). "Relaxed Phylogenetics and Dating with Confidence". PLoS Biology. 4 (5): e88. doi:10.1371/journal.pbio.0040088. PMC 1395354. PMID 16683862.CS1 maint: multiple names: authors list (link)
- Felsenstein, J (2001). "Taking variation of evolutionary rates between sites into account in inferring phylogenies". J Mol Evol. 53 (4–5): 447–55. Bibcode:2001JMolE..53..447F. doi:10.1007/s002390010234. PMID 11675604. S2CID 9791493.
- Marshall, D. C., et al. 2016. Inflation of molecular clock rates and dates: molecular phylogenetics, biogeography, and diversification of a global cicada radiation from Australasia (Hemiptera: Cicadidae: Cicadettini). Systematic Biology 65(1):16–34.
- Ho SY, Phillips MJ, Cooper A, Drummond AJ (2005). "Time dependency of molecular rate estimates and systematic overestimation of recent divergence times". Molecular Biology & Evolution. 22 (7): 1561–1568. doi:10.1093/molbev/msi145. PMID 15814826.
- Peterson GI, Masel J (2009). "Quantitative Prediction of Molecular Clock and Ka/Ks at Short Timescales". Molecular Biology & Evolution. 26 (11): 2595–2603. doi:10.1093/molbev/msp175. PMC 2912466. PMID 19661199.
Further reading
- Morgan, G.J. (1998). "Emile Zuckerkandl, Linus Pauling, and the Molecular Evolutionary Clock, 1959–1965". Journal of the History of Biology. 31 (2): 155–178. doi:10.1023/A:1004394418084. PMID 11620303. S2CID 5660841.
- Zuckerkandl, E.; Pauling, L.B. (1965). "Evolutionary divergence and convergence in proteins". In Bryson, V.; Vogel, H.J. (eds.). Evolving Genes and Proteins. Academic Press, New York. pp. 97–166.
- San Mauro, D.; Agorreta, A. (2010). "Molecular systematics: a synthesis of the common methods and the state of knowledge". Cellular & Molecular Biology Letters. 15 (2): 311–341. doi:10.2478/s11658-010-0010-8. PMC 6275913. PMID 20213503.