Data sets and trees used in "Inferring Phylogenies"

The following are most of the data sets and many of the trees that were used as examples in the book. When you click on the download link you may see the data set or tree as text in your browser. If so, then you can save it by using the "Save" or "Save As" choice in the browser's "File" menu. Alternatively, you can mark the text and copy it out of the browser window.

In some cases the data sets are described as simulated on particular trees. The trees will be found below, in the table of trees.

Data sets used in the book

name description link to download it

Example data set This is the simple discrete characters
data set used in Table 8.1. It also includes
The five species used in Table 1.1 example.dat

14-species
Primates data set This is a 14-species data set with mitochondrial
sequences. It was selected by Masami Hasegawa from a
data set of sequences of 896 nucleotides collected by
a number of researchers in Japan. Hasegawa selected
sites from the D-loop noncoding region and third positions
of adjacent coding sequences, to achieve a set of sites that
was as close as possible to having no rate variation.
It is used in Figure 19.1, Table 19.1, and Figure 19.2. primates.dna

7-species
Primates data set This is a 7-species data set which is a subset
of the above 14-species subset. It contains only
cow, mouse, and 5 species of great apes.
It is used in Table 12.1 and Figure 12.3,
and in Figures 21.2 and 21.3. primates2.dna

7-species
Primates distances Distances computed from the preceding data set,
using the F84 distance with empirical base
frequencies and transition/transversion ratio set
to 2.0. The file is in the NEXUS format suitable
for the program Splitstree. This is
relevant to Table 12.1, Figure 12.3, and the
splits table in between them. primates.dst

Sarich data set This is a set of immunological distances
collected by Vincent Sarich and published in
his 1969 paper "Pinniped phylogeny" published
in Systematic Zoology 22: 416-422.
The values have been averaged across the diagonal to
symmetrize them. The data set is used on pages 163-168. sarich.dst

Example of Kimura
2-parameter model The two sequences that are tabulated in
Table 13.4. Evolved with a Kimura 2-parameter
model with branch length 0.2 between them
and transition/transversion ratio 2. k2ptable

4-species
Hadamard data set This is the 4-species data set simulated on the
tree described on page 280, using the Jukes-Cantor model. hadamard.dna

4-species
Hadamard data set (YR) This is the same data set, coded as Purine/Pyrimidine (R and Y).
When analyzed using only these states, it gives the
Hadamard analysis shown in on pages 280-281 and in Table 17.4. hadamard.yr

6-species
Hadamard data set This is the 6-species data set simulated on the
tree shown in Figure 17.1. hadamard2.dna

6-species
Hadamard data set (YR) This is the same data set, coded as Purine/Pyrimidine (R and Y).
When sites are coded into these states, they have the spectrum
shown in that Figure. hadamard2.yr

Two-peak data set This artificial data set, when analyzed by the
Jukes-Cantor method with a molecular clock, gives
the profile log-likelihood curves shown in Figure 19.3. twopeak.dna

Bootstrap example numbers The histograms of the sample and the three bootstrap
samples from that, for the mixture of normal distributions
used to explain the bootstrap on page 336 in chapter 20. bootexample

Numbers for KHT test The log-likelihood values for the 232 sites used
in the table in Figure 21.2, and the differences between
them given there and in Figure 21.3. khtloglikes
khtdiffs

Character values and
contrasts for example in
Comparative Methods
chapter The two-character data set used to plot Figure 25.3,
and the contrasts derived from them and plotted in Figure 25.5 twoclades
contrasts

Fossil example data set The example data set of present-day and fossil
species given in Table 32.1 and used for Figure 32.3.
The analysis is described on page 550. fossil.dat

One data set used in the book seems to have been lost, and is therefore not available for download:

The 50-sample, 500-base data used for Figure 27.2.

Trees used in the book

Here are some of the trees (aside from the small ones with just a few species) mentioned in the book. They are given in the Newick format. Clicking on the links in the right column will cause them to be displayed on your browser: from there you can save them by selecting the Save or Save As option on your browser's File menu. Alternatively, you can mark the text and copy it out of the browser window.

(More will be put here soon).

Page Description link to tree

166 UPGMA tree for Sarich carnivore data sarichupgma.tre

168 Neighbor-Joining tree for Sarich carnivore data sarichnj.tre

182 Trees found in split decomposition example splitdecomp.tre

260 Tree for ancestor reconstruction example ancestor.tre

282 6-species tree used to simulate Hadamard example hadtree6.tre

341 Five trees for majority-rule consensus example majorityrule.tre

366 Two trees for paired sites test example pairedsites.tre

396 Tree used to generate Brownian motion example (to be provided soon)

423 Example of punctuated equilibrium tree punctuated.tre

424 Example of 10 species tree from that tree punctuated10.tre

430 Five-species tree used to simulate threshold character (to be provided soon)

434 Two-clade tree used to show comparative method problem compare.tre

458 Nine random coalescent trees coalescent.tre

522 Trees for consensus-tree example consensus.tre

532 Trees for tree-distance example treedistances.tre

566 Birth-death process trees bd1.tre
bd2.tre

573 Tree used as example for tree-drawing drawing.tre

Those interested in a list of typos and their corrections
for the book should look at the web page here.

A list of reviews of the book that have appeared, and some reactions to them, will be found here.

Joe Felsenstein

name	description	link to download it
Example data set	This is the simple discrete characters data set used in Table 8.1. It also includes The five species used in Table 1.1	example.dat
14-species Primates data set	This is a 14-species data set with mitochondrial sequences. It was selected by Masami Hasegawa from a data set of sequences of 896 nucleotides collected by a number of researchers in Japan. Hasegawa selected sites from the D-loop noncoding region and third positions of adjacent coding sequences, to achieve a set of sites that was as close as possible to having no rate variation. It is used in Figure 19.1, Table 19.1, and Figure 19.2.	primates.dna
7-species Primates data set	This is a 7-species data set which is a subset of the above 14-species subset. It contains only cow, mouse, and 5 species of great apes. It is used in Table 12.1 and Figure 12.3, and in Figures 21.2 and 21.3.	primates2.dna
7-species Primates distances	Distances computed from the preceding data set, using the F84 distance with empirical base frequencies and transition/transversion ratio set to 2.0. The file is in the NEXUS format suitable for the program `Splitstree`. This is relevant to Table 12.1, Figure 12.3, and the splits table in between them.	primates.dst
Sarich data set	This is a set of immunological distances collected by Vincent Sarich and published in his 1969 paper "Pinniped phylogeny" published in Systematic Zoology 22: 416-422. The values have been averaged across the diagonal to symmetrize them. The data set is used on pages 163-168.	sarich.dst
Example of Kimura 2-parameter model	The two sequences that are tabulated in Table 13.4. Evolved with a Kimura 2-parameter model with branch length 0.2 between them and transition/transversion ratio 2.	k2ptable
4-species Hadamard data set	This is the 4-species data set simulated on the tree described on page 280, using the Jukes-Cantor model.	hadamard.dna
4-species Hadamard data set (YR)	This is the same data set, coded as Purine/Pyrimidine (R and Y). When analyzed using only these states, it gives the Hadamard analysis shown in on pages 280-281 and in Table 17.4.	hadamard.yr
6-species Hadamard data set	This is the 6-species data set simulated on the tree shown in Figure 17.1.	hadamard2.dna
6-species Hadamard data set (YR)	This is the same data set, coded as Purine/Pyrimidine (R and Y). When sites are coded into these states, they have the spectrum shown in that Figure.	hadamard2.yr
Two-peak data set	This artificial data set, when analyzed by the Jukes-Cantor method with a molecular clock, gives the profile log-likelihood curves shown in Figure 19.3.	twopeak.dna
Bootstrap example numbers	The histograms of the sample and the three bootstrap samples from that, for the mixture of normal distributions used to explain the bootstrap on page 336 in chapter 20.	bootexample
Numbers for KHT test	The log-likelihood values for the 232 sites used in the table in Figure 21.2, and the differences between them given there and in Figure 21.3.	khtloglikes khtdiffs
Character values and contrasts for example in Comparative Methods chapter	The two-character data set used to plot Figure 25.3, and the contrasts derived from them and plotted in Figure 25.5	twoclades contrasts
Fossil example data set	The example data set of present-day and fossil species given in Table 32.1 and used for Figure 32.3. The analysis is described on page 550.	fossil.dat

Page	Description	link to tree
166	UPGMA tree for Sarich carnivore data	sarichupgma.tre
168	Neighbor-Joining tree for Sarich carnivore data	sarichnj.tre
182	Trees found in split decomposition example	splitdecomp.tre
260	Tree for ancestor reconstruction example	ancestor.tre
282	6-species tree used to simulate Hadamard example	hadtree6.tre
341	Five trees for majority-rule consensus example	majorityrule.tre
366	Two trees for paired sites test example	pairedsites.tre
396	Tree used to generate Brownian motion example	(to be provided soon)
423	Example of punctuated equilibrium tree	punctuated.tre
424	Example of 10 species tree from that tree	punctuated10.tre
430	Five-species tree used to simulate threshold character	(to be provided soon)
434	Two-clade tree used to show comparative method problem	compare.tre
458	Nine random coalescent trees	coalescent.tre
522	Trees for consensus-tree example	consensus.tre
532	Trees for tree-distance example	treedistances.tre
566	Birth-death process trees	bd1.tre bd2.tre
573	Tree used as example for tree-drawing	drawing.tre