Data sets and trees used in "Inferring Phylogenies"

The following are most of the data sets and many of the trees that were used as examples in the book. When you click on the download link you may see the data set or tree as text in your browser. If so, then you can save it by using the "Save" or "Save As" choice in the browser's "File" menu. Alternatively, you can mark the text and copy it out of the browser window.

In some cases the data sets are described as simulated on particular trees. The trees will be found below, in the table of trees.

Data sets used in the book

name description link to download it
Example data set This is the simple discrete characters
data set used in Table 8.1. It also includes
The five species used in Table 1.1
example.dat
14-species
Primates data set
This is a 14-species data set with mitochondrial
sequences. It was selected by Masami Hasegawa from a
data set of sequences of 896 nucleotides collected by
a number of researchers in Japan. Hasegawa selected
sites from the D-loop noncoding region and third positions
of adjacent coding sequences, to achieve a set of sites that
was as close as possible to having no rate variation.
It is used in Figure 19.1, Table 19.1, and Figure 19.2.
primates.dna
7-species
Primates data set
This is a 7-species data set which is a subset
of the above 14-species subset. It contains only
cow, mouse, and 5 species of great apes.
It is used in Table 12.1 and Figure 12.3,
and in Figures 21.2 and 21.3.
primates2.dna
7-species
Primates distances
Distances computed from the preceding data set,
using the F84 distance with empirical base
frequencies and transition/transversion ratio set
to 2.0. The file is in the NEXUS format suitable
for the program Splitstree. This is
relevant to Table 12.1, Figure 12.3, and the
splits table in between them.
primates.dst
Sarich data set This is a set of immunological distances
collected by Vincent Sarich and published in
his 1969 paper "Pinniped phylogeny" published
in Systematic Zoology 22: 416-422.
The values have been averaged across the diagonal to
symmetrize them. The data set is used on pages 163-168.
sarich.dst
Example of Kimura
2-parameter model
The two sequences that are tabulated in
Table 13.4. Evolved with a Kimura 2-parameter
model with branch length 0.2 between them
and transition/transversion ratio 2.
k2ptable
4-species
Hadamard data set
This is the 4-species data set simulated on the
tree described on page 280, using the Jukes-Cantor model.
hadamard.dna
4-species
Hadamard data set (YR)
This is the same data set, coded as Purine/Pyrimidine (R and Y).
When analyzed using only these states, it gives the
Hadamard analysis shown in on pages 280-281 and in Table 17.4.
hadamard.yr
6-species
Hadamard data set
This is the 6-species data set simulated on the
tree shown in Figure 17.1.
hadamard2.dna
6-species
Hadamard data set (YR)
This is the same data set, coded as Purine/Pyrimidine (R and Y).
When sites are coded into these states, they have the spectrum
shown in that Figure.
hadamard2.yr
Two-peak data set This artificial data set, when analyzed by the
Jukes-Cantor method with a molecular clock, gives
the profile log-likelihood curves shown in Figure 19.3.
twopeak.dna
Bootstrap example numbers The histograms of the sample and the three bootstrap
samples from that, for the mixture of normal distributions
used to explain the bootstrap on page 336 in chapter 20.
bootexample
Numbers for KHT test The log-likelihood values for the 232 sites used
in the table in Figure 21.2, and the differences between
them given there and in Figure 21.3.
khtloglikes
khtdiffs
Character values and
contrasts for example in
Comparative Methods
chapter
The two-character data set used to plot Figure 25.3,
and the contrasts derived from them and plotted in Figure 25.5
twoclades
contrasts
Fossil example data set The example data set of present-day and fossil
species given in Table 32.1 and used for Figure 32.3.
The analysis is described on page 550.
fossil.dat

One data set used in the book seems to have been lost, and is therefore not available for download:

Trees used in the book

Here are some of the trees (aside from the small ones with just a few species) mentioned in the book. They are given in the Newick format. Clicking on the links in the right column will cause them to be displayed on your browser: from there you can save them by selecting the Save or Save As option on your browser's File menu. Alternatively, you can mark the text and copy it out of the browser window.

(More will be put here soon).

Page Description link to tree
166 UPGMA tree for Sarich carnivore data sarichupgma.tre
168 Neighbor-Joining tree for Sarich carnivore data sarichnj.tre
182 Trees found in split decomposition example splitdecomp.tre
260 Tree for ancestor reconstruction example ancestor.tre
282 6-species tree used to simulate Hadamard example hadtree6.tre
341 Five trees for majority-rule consensus example majorityrule.tre
366 Two trees for paired sites test example pairedsites.tre
396 Tree used to generate Brownian motion example (to be provided soon)
423 Example of punctuated equilibrium tree punctuated.tre
424 Example of 10 species tree from that tree punctuated10.tre
430 Five-species tree used to simulate threshold character (to be provided soon)
434 Two-clade tree used to show comparative method problem compare.tre
458 Nine random coalescent trees coalescent.tre
522 Trees for consensus-tree example consensus.tre
532 Trees for tree-distance example treedistances.tre
566 Birth-death process trees bd1.tre
bd2.tre
573 Tree used as example for tree-drawing drawing.tre


Those interested in a list of typos and their corrections
for the book should look at the web page here.

A list of reviews of the book that have appeared, and some reactions to them, will be found here.

Joe Felsenstein