Cladograms and stratigraphy


Until recently, the only answer to the question ‘How complete is the fossil record?’ was a qualitative assertion that it was either wonderful, terrible, or all right. In the absence of a time machine, the only way to assess completeness was in a relative way. However, now that new semi-objective tree-making techniques are available to deal with both morphological and molecular data, the shapes of these trees can be compared with stratigraphic data.

  • Comparisons of tree shape vs. stratigraphy offer a variety of methods of assessing the quality of the fossil record and the congruence of competing trees with the known fossil record, on the basis of statistical analyses of large samples of trees.
  • Comparative metrics include methods to assess the match of node order and order of first appearance in the fossil record (Spearman Rank Correlation, SRC), the relative amount of cladistically implied gap to known record (Relative Completeness Index, RCI), the stratigraphic consistency of dating of nodes with nodes lower in the cladogram (Stratigraphic Consistency Index, SCI), and a measure of the relative amount of gap to the minimum possible (Gap Excess Ratio, GER).
  • The SRC technique was introduced for this purpose by Norell and Novacek (1992a), the RCI by Benton (1994), the SCI by Huelsenbeck (1994), and the GER by Wills (1999).
  • Further information can be found in these papers, and in Benton (1994, 1995), Benton and Hitchin (1996), Benton and Simms (1995), Benton and Storrs (1996), Benton et al. (1999, 2000) Hitchin and Benton (1997a, b), Norell and Novacek (1992b), Siddall (1996, 1997), and Smith (1994). Several more recent papers (Wagner 2000; Wagner and Sidor 2000; Angielczyk 2002; Finarelli and Clyde 2002; Fara and Langer 2004; Pol et al. 2004; Angielczyk and Fox 2006) present valuable critiques of the different metrics, highlighting how some are affected by tree size (number of terminals), tree balance, relative time span involved, and other factors. To some extent, the resampling error bars offered in the Ghosts package take account of such issues.
  • The data base presented here has been published in part in Benton and Hitchin (1996), but a further 700 trees have been added since then.

Testing cladograms with stratigraphy

Phylogenetic trees, whether based on molecular techniques or from cladistic analysis of morphological characters are essentially independent of stratigraphy. Therefore, it is reasonable to compare the results of stratigraphy and phylogeny: do they agree or not? If they do not agree (lack congruence), then either the fossil record, or the phylogeny, or both are wrong. If the results are broadly congruent, then they are both probably telling the same story, and it can be assumed that that story is the true story of the history of life.

In assessing congruence, a claim is not made for the primacy of tree over stratigraphy, or vice versa. Indeed, there are many uncertainties involved in the construction of any tree, and in the recording of any stratigraphic sequence. It is the question of congruence that is important. So, stratigraphy can be assessed for congruence with trees, and trees can be assessed for congruence with stratigraphy.

One application of these approaches has been to use stratigraphy to test cladograms. In particular, stratocladistics (Fisher, 1992) is a method whereby stratigraphic information is actually incorporated into the tree-finding methods. Stratigraphic data are converted into ‘character’ data, and then combined with apomorphy coding. We do not pursue this technique since we feel that it obscures the real information content of the characters on the one hand, and the stratigraphy on the other. It is impossible to say what the resulting trees mean: they have lost character-based parsimony, and they have also diluted the stratigraphic signal. The technique has been criticised for these reasons, and others:

  • Dilution of both the character-based and the stratigraphic signal.
  • Mixing of inclusive hierarchical data (from characters) with linear data (from stratigraphy) (Smith, 1994).
  • In particular, combining the two kinds of data obscures several specific issues in cladistics: the meaning of ancestors, initial grouping criteria, and asymmetry in rates of morphological evolution (Norell and Novacek, 1997).
  • A stratigraphic overlay on a most-parsimonious tree obscures the falsifiability of the phylogenetic hypothesis, and confuses two philosophical stances (Rieppel, 1997).

Techniques for comparison of cladograms with stratigraphic implications can, however, be informative in examining specific cases. For example, where several most parsimonious trees (MPTs) result from a cladistic analysis, it may be of interest to know which of these best fits current stratigraphic knowledge. The various metrics noted above can readily be calculated for each MPT.

Metrics and software

(1) The Spearman Rank Correlation (SRC) test is a well-established non-parametric test that simply compares the rank order of two series of numbers, in this case the order of first fossil appearances and the order of nodes. The technique has associated measures of confidence, but it is a rather poor estimator of the quality of matching of trees and stratigraphy, since it looks only at rank order, taking no account of the amount of time between specific fossils, and the relative ordering of nodes can be highly interdependent.

(2) The Relative completeness index (RCI), Stratigraphic consistency index (SCI), and Gap excess ratio (GER) are more informative, each assessing a different aspect of tree fit to stratigraphy. All three should be used in parallel. Hitherto, each has been a simple metric, but it is possible to assess a proxy for confidence intervals by means of randomization of the data sets. Methods are outlined more fully in Wills (1999) and Benton et al. (1999, 2000).

(3) Software is available to assess the significance of RCI, SCI, and GER values, by random permutation of the raw data, and tests of the values against means of the generated random distributions. In these papers from the 1990s, we used the software ‘Ghosts’ by Matt Wills, but the routines are now available in the ‘strap’ package for R (StratPhyloCongruence command) by Bell & Lloyd (2014).

Wills, M. A. 2007. Fossil ghost ranges are most common in some of the oldest and some of the youngest strata. Proceedings of the Royal Society, Series B 274, 2421-2427.

Wills (2007) showed that ghost ranges are not evenly spaced through time, based on our sample of 1000 cladograms. Ghost ranges are indeed relatively common in some of the oldest strata. Surprisingly, however, ghost ranges are also relatively common in some of the youngest, fossil-rich rocks. This pattern results from the interplay between several complex factors and is not a simple function of the completeness of the fossil record. The Early Palaeozoic record is likely to be less organismically and stratigraphically complete, and its fossils – many of which are invertebrates – may be more difficult to analyse cladistically. The Late Cenozoic is subject to the pull of the Recent, but this accounts only partially for the increased gappiness in the younger strata.

Benton, M. J. 2001. Finding the tree of life: matching phylogenetic trees to the fossil record through the 20th century. Proceedings of the Royal Society, Series B 268, 2123-2130.

The ‘tree of life’ shows how all plants, animals and microbes are related together and how they evolved. Many biologists and palaeobiologists spend their time trying to disentangle the millions of branches of that tree and new methods have speeded up the work in the 1990s. But where is it all heading? A comparison of 1000 evolutionary trees published during the twentieth century shows that knowledge is changing. In comparison with our knowledge of the order of fossils in rocks, the predictions of the order of branching in evolutionary trees have only improved slightly and estimates of the timing of branching has, if anything, become worse. Download pdf version of this paper.

Benton, M. J., Wills, M. A., and Hitchin, R. 2000. Quality of the fossil record through time. Nature, 403, 534-537.

In this paper, we showed that the fossil record is equally good through time: in certain contexts, the Cambrian fossil record is just as good as the Cenozoic. And yet, any Cambrian fossil locality is obviously much worse than any Cenozoic fossil locality. How can one explain this paradox?

The significance of the fossil record in documenting evolution worried Charles Darwin. He devoted two of the fourteen chapters of his On the origin of species (1859) to the fossil record, and one of them was specifically ‘On the imperfection of the geological record.’ Darwin wrote:

That our palaeontological collections are very imperfect, is admitted by every one. …numbers of our fossil species are known and named from single and often broken specimens, or from a few specimens collected on some one spot. Only a small portion of the surface of the earth has been geologically explored, and no part with sufficient care, as the important discoveries made every year in Europe prove. No organism wholly soft can be preserved. Shells and bones will decay and disappear when left on the bottom of the sea, where sediment is not accumulating.This rather negative view seems self-evident. And yet, in the to-and-fro of the controversy, no clear resolution has emerged. The protagonists on both sides have to resort to qualitative arguments and wild assertions.

The critics, following Darwin’s rather cautious lead, declare that fossils are only fragmentary remains of whole organisms, that many living plants, animals and microbes are unlikely ever to leave fossil remains of any kind, and that the chances of finding more than 1% of all the species that ever lived are remote.

In answer to this rather bleak view, the fossil apologists claim that, despite all the huge efforts of palaeontologists around the world, we have really not learnt much since 1859. Since Darwin’s day, fossils have been collected intensively from Greenland to Antarctica, and from Australia to Argentina. These have not changed the broad picture of the history of life one whit. We still record in our textbooks, as was done in the 1850s, that marine invertebrates became abundant during the Cambrian period, that fishes became abundant in the Silurian and Devonian, that dinosaurs existed through the Mesozoic, that Mesozoic mammals were small, and only flourished after the demise of the dinosaurs, from the beginning of the Tertiary.

The debate has exploded again in the past five years. Molecular biologists have claimed that the first half of the fossil record of many major groups is missing. As an example, molecular analyses of phylogeny (reconstructions of branching evolutionary patterns based on comparisons of the DNA and RNA of living forms) have projected the point of origin of modern bird and mammal groups deep into the Cretaceous, to a time some 100-120 million years ago when dinosaurs like Iguanodon and Polacanthus stalked the lush forests of southern England. The oldest fossils of modern bird and mammal groups are known, somewhat doubtfully from the very end of the Cretaceous, and then in abundance from the beginning of the Tertiary, 65 million years ago.

This is a major challenge to palaeontology. The claim is that the first half of the fossil record of these groups is missing. Perhaps the early ancestors of modern birds and mammals were so rare as not to be fossilized, or perhaps they lived in some region that has not yet been excavated. Nonetheless, my mind boggled at the thought of parrots and bats flitting through the steaming Mesozoic jungles, penguins skipping along the shores, and monkeys looking down quizzically at Tyrannosaurus rex. I am convinced, as are many molecular sequencers, that the calibration of the calculations is wrong. And yet the point has been accepted unquestioningly by many. This I find startling.

In our new work, we took advantage of the fact that there are several independent ways of looking at the history of life. The fossil record is one source of information – the order of fossils in the rocks. Independent approaches are now available, cladistics and molecular phylogenies. Cladistics is a set of methods for reconstructing tree-like patterns of relationships among living, or a mix of living and fossil, organisms, using only numerical scoring of their unique morphological characters. There is no recourse to the age of the fossils: the emerging pattern depends solely on records of shared characters. Molecular phylogenies are even more clearly independent of the fossil record. Comparison of the DNA and RNA sequences of living species allows the construction of a tree of relationships.

Cladistic and molecular trees give information on relationships and on the order of branching, and the order of branching points in the trees can be compared with the order of fossils in the rocks. Over the past few years, we have devised a number of metrics for effecting the comparison, and for assessing the significance of the claims for congruence. There is no claim that the fossil record, the cladistic trees, or the molecular trees are the standard of truth against which the other evidence is assessed. Any of the three approaches is equally likely to be wrong. However, we would assert that if any two, or all three, tell the same story, then that is probably the correct story. It is hard to see how the order of the fossils and a molecular tree could be biased in the same erroneous direction since the sources of data on which the results are based are so different.

The group in Bristol working on this problem consisted of Matthew Wills, now at the University of Bath, Becky Hitchin, and myself. Our earlier findings confirmed that the majority of fossil records and trees were congruent, indicating that palaeontological methods and tree-making methods were working, and giving the correct story. In our 2000 study, we assessed 1000 published cladistic and molecular trees from all kinds of organisms – microbes, plants, and animals – and collated the results. For all three metrics, we found no change in the quality of the fossil record through time. In other words, the sample of several hundred trees based on groups that originated in early geological times, the Palaeozoic, gave congruence values which were just as good as those originating in the Mesozoic and Cenozoic.

How can this be? How can we claim that the Palaeozoic fossil record is just as good as the Mesozoic and Cenozoic? This seems nonsensical when any geologist can tell you that the ancient rocks of the Cambrian have been folded, heated, buried, and eroded much more than the more recent rocks of the Jurassic. If you visit a Cambrian fossil locality, on the whole it is hard to collect fossils, and they are often damaged, whereas Jurassic localities yield undamaged shells and other fossils in abundance. The solution to the dilemma depends on the scale of observation. Close up, the Cambrian fossil record is obviously worse than the Jurassic. But step back, and look on a global scale at the broad outlines of evolution, and there is no difference in the quality of documentation.

Up to now, the fossil apologists and the critics have been talking at cross purposes. The apologists have been concentrating on the big picture, and they have been right: the fossil record does give the correct story of the history of life. The critics have also been right, but only at a fine, local scale, and they are wrong to assume that sparse and scrappy fossils in the Cambrian mean that palaeontologists can say nothing about what was going on 500 million years ago. Download pdf version of this paper.


  • Angielczyk, K. D. (2002) A character-based method for measuring the fit of a cladogram to the fossil record. Systematic Biology 51, 176-191.
  • Angielczyk, K. D. and Fox, D. L. (2006) Exploring new uses for measures of fit of phylogenetic hypotheses to the fossil record. Paleobiology 32, 147-165.
  • Bell, M.A. and Lloyd, G.T. (2014) strap: an R package for plotting phylogenies against stratigraphy and assessing their stratigraphic congruence. Palaeontology 58, 379–389.
  • Benton, M. J. (1994). Palaeontological data, and identifying mass extinctions. Trends in Ecology and Evolution 9, 181-185. Download pdf version.
  • Benton, M. J. (1995) Testing the time axis of phylogenies. Philosophical Transactions of the Royal Society, Series B 349, 5-10. Download pdf version.
  • Benton, M. J. (2001) Finding the tree of life: matching phylogenetic trees to the fossil record through the 20th century. Proceedings of the Royal Society of London, Series B 268, 2123-2130. Download pdf version.
  • Benton, M. J. and Hitchin, R. (1996) Testing the quality of the fossil record by groups and by major habitats. Historical Biology, 12, 111-157. pdf
  • Benton, M. J. and Hitchin, R. (1997) Congruence between phylogenetic and stratigraphic data on the history of life. Proceedings of the Royal Society, Series B 264, 885-890. Download pdf version.
  • Benton, M. J., Hitchin, R., and Wills, M. A. (1999) Assessing congruence between cladistic and stratigraphic data. Systematic Biology, 48, 581-596. Download pdf version.
  • Benton, M. J., Wills, M. A., and Hitchin, R. (2000) Quality of the fossil record through time. Nature, 403, 534-537. Download pdf version.
  • Benton, M. J. and Simms, M. J. (1995) Testing the marine and continental fossil records. Geology, 23, 601-604. Download pdf version.
  • Benton, M. J. and Storrs, G. W. (1994) Testing the quality of the fossil record: paleontological knowledge is improving. Geology, 22, 111-114. Download pdf version.
  • Benton, M. J. and Storrs, G. W. (1996) Diversity in the past: comparing cladistic phylogenies and stratigraphy. In Aspects of the genesis and maintenance of biological diversity, edited by M. E. Hochberg, J. Clobert, and R. Barbault, pp. 19-40. Oxford: Oxford University Press. pdf
  • Fara, E. and Langer, M. C. (2004) Estimates of phylogeny and biochronology. Revista Brasiliera de Paleontologia 7, 301-310.
  • Finarelli, J. A. and Clyde, W. C. (2002) Comparing the gap excess ratio and the retention index of the stratigraphic character. Systematic Biology 51, 166-176.
  • Fisher, D. C. (1992) Stratigraphic parsimony. In MacClade: Analysis of phylogeny and character evolution, edited by W. P. Maddison and D. R. Maddison, pp. 124-129. Sunderland, Mass.: Sinauer Associates.
  • Hitchin, R. and Benton, M. J. (1997a) Congruence between parsimony and stratigraphy: comparisons of three indices. Paleobiology, 23, 20-32. Download pdf version.
  • Hitchin, R. and Benton, M. J. (1997b) Stratigraphic indices and tree balance. Systematic Biology, 46, 563-569. Download pdf version.
  • Huelsenbeck, J. P. (1994) Comparing the stratigraphic record to estimates of phylogeny. Paleobiology, 20, 470-483.
  • Norell, M. A. and Novacek, M. J. (1992a) The fossil record and evolution: comparing cladistic and paleontologic evidence for vertebrate history. Science, 255, 1690-1693.
  • Norell, M. A. and Novacek, M. J.(1997) The ghost dance: a cladistic critique of stratigraphic approaches to paleobiology and phylogeny. Journal of Vertebrate Paleontology, Supplement, 17, 67A.
  • Norell, M. A. and Novacek, M. J. (1992b) Congruence between superpositional and phylogenetic patterns: Comparing cladistic patterns with fossil records: Cladistics, 8, 319-337.
  • Pol, D., Norell, M. A. and Siddall, M. E. (2004) Measures of stratigraphic fit to phylogeny and their sensitivity to tree size, tree shape, and scale. Cladistics 20, 64-75.
  • Rieppel, O. (1997) Falsificationist versus verificationist approaches to history. Journal of Vertebrate Paleontology, Supplement, 17, 71A.
  • Siddall, M. E. (1996) Stratigraphic consistency and the shape of things. Systematic Biology, 45, 111-115.
  • Siddall, M. E. (1997) Stratigraphic indices in the balance: A reply to Hitchin and Benton. Systematic Biology, 46, 569-573.
  • Smith, A. B. (1994) Systematics and the Fossil Record. Oxford: Blackwell Scientific.
  • Wagner, P. J. 2000a. The quality of the fossil record and the accuracy of estimated phylogenies. Systematic Biology 49, 65-86.
  • Wagner, P. J. and Sidor, C. A. (2000) Age Rank/Clade Rank Metrics-Sampling, Taxonomy, and the Meaning of “Stratigraphic Consistency”. Systematic Biology 49, 463-479.
  • Wills, M. A. (1999) Congruence between phylogeny and stratigraphy: randomization tests. Systematic Biology, 48, 559-580.