Indel Length Distributions and Evolutionary Divergence
Some of you may remember that several years ago that Britten (2002) argued that human-chimp divergence was 5% not ~1.2%. (See this press release for a refresher.) Of course, creationists jumped on this research and began harping that the more scientists looked, the more distant humans and chimps were. This is important to them because the number one rule of creationism is “no matter what, humans are not related to any other living creatures,” which is so difficult to maintain in our age of science and education.—Amusingly, humans and chimps are so similar to one another that creationists cannot create a consistent definition of “created kinds” that makes humans special and lumps all the boring animals together.
Britten (2002) derived his 5% divergence metric by considering the lengths of insertions and deletions (indels) along with point substitutions between human and chimp genomes. This is unlike other estimates that just consider the number of point substitutions that have occurred between the two species and find ~1.2% divergence. At the time I commented that these two numbers—1.2% and 5%—could not be compared because they are different metrics. Additionally, Britten’s metric is probably unfairly upweighting the contribution of indels because a single event can add or remove multiple residues at a time.
A recent study of mine, which was not directed at Bitten’s work, has found that it is actually worse than that. Simply put, the total length of indels separating humans and chimps is unrelated to the evolutionary divergence between them. This arises because the variance of indel length is “nearly-infinite”, which causes nonconservation of average indel length. Therefore, two pairs of species, equally divergent evolutionarily, can and probably will have very different proportions of nucleotides belonging to indels. One pair might be 5% divergent and the other 1.5% divergent, including indels, without any underlying change in the evolutionary process or time since speciation.
The upside is that traditional substitution based evolutionary distances are unaffected and can still be used to properly estimate the evolutionary divergence between species.