2014-02-06



Humans interbred with an unknown hominin in Europe, then crossed the Bering Sea—say what?

 

by John Timmer

 
When we first looked at the report of the bigfoot genome, it was an odd mixture of things: standard methods and reasonable looking data thrown in with unusual approaches and data that should have raised warning flags for any biologist. We just couldn't figure out the logic of why certain things were done or the reasoning behind some of the conclusions the authors reached. So, we spent some time working with the reported genome sequences themselves and talked with the woman who helped put the analysis together, Dr. Melba Ketchum. While it didn't answer all of our questions, it gave us a clearer picture of how the work came to be.
The biggest clarification made was what the team behind the results considered their scientific reasoning, which makes sense of how they ran past warning signs that they were badly off track. It provided an indication of what motivated them to push the results into a publication that they knew would cause them grief.

Melba Ketchum and the bigfoot genome
The public face of the bigfoot genome has been Melba Ketchum, a Texas-based forensic scientist. It was Ketchum who first announced that a genome was in the works, and she was the lead author of the paper that eventually described it. That paper became the one and only publication of the online journal De Novo; it's still the only one to appear there.
The paper itself is an odd mix of things. There's a variety of fairly standard molecular techniques mixed in with a bit of folklore and a link to a YouTube video that reportedly shows a sleeping Sasquatch. In some ways, the conclusions of the paper are even odder than the video. They suggest that bigfeet aren't actually an unidentified species of ape as you might have assumed. Instead, the paper claims that bigfeet are hybrids, the product of humans interbreeding with a still unknown species of hominin.



 
As evidence, it presents two genomes that purportedly came from bigfoot samples. The mitochondrial genome, a small loop of DNA that's inherited exclusively from mothers, is human. The nuclear genome, which they've only sequenced a small portion of, is a mix of human and other sequences. Some are closely related, others quite distant.
But my initial analysis suggested that the "genome sequence" was an artifact, the product of a combination of contamination, degradation, and poor assembly methods. And every other biologist I showed it to reached the same conclusion. Ketchum couldn't disagree more. "We've done everything in our power to make sure the paper was absolutely above-board and well done," she told Ars. "I don't know what else we could have done short of spending another few years working on the genome. But all we wanted to do was prove they existed, and I think we did that."
How do you get one group of people who looks at the evidence and sees contamination, while another decides "The data conclusively prove that the Sasquatch exists"? To find out, we went through the paper's data carefully, then talked to Ketchum to understand the reasoning behind the work.

Why they think it was genuine
Fundamentally, the scientific problems with the work seem to go back to the fact that some of the key steps—sample processing and preparation—were done by forensic scientists. As the name itself implies, forensic science is, like more general sciences, heavily focused on evidence, reproducibility, and other aspects shared with less applied sciences. But unlike genetics for example, forensic science is very goal-oriented. That seems to be what caused the problems here.
Over the decades that DNA has been used as forensic evidence, people in the field have come up with a variety of procedures that have been validated repeatedly. By following those procedures, they know the evidence they generate is likely to hold up in court. And, to an extent, it seems like the people behind the bigfoot genome wanted it to hold up in court.

“It's non-human hair—it's clearly non-human hair—it was washed and prepared forensically, and it gave a human mitochondrial DNA result. That just doesn't happen.”
Many of the samples they had were clumps of hair of various sizes. Hair is a common item in forensic analysis, where people have to identify whether the hair is human, whether it is a possible match for a suspect's, etc. In this case, the team was able to determine that the hair was not human. So far, so good.
In cases where the hair comes attached to its follicle, it's possible to extract DNA from its cells. And that is exactly what the bigfoot team did, using a standard forensic procedure that was meant to remove any other DNA that the hair had picked up in the interim. If everything worked as expected, the only DNA present should be from whatever organism the fur originated from.
And, in Ketchum's view, that's exactly what happened. They worked according to procedure, isolating DNA from the hair follicles and taking precautions to rule out contamination by DNA from anyone that was involved in the work. Because of this, Ketchum is confident that any DNA that came from the samples once belonged to whatever creature deposited the fur in the woods—no matter how confusing the results it produced were. "The mito [mitochondrial DNA results] should have done it," she argued. "It's non-human hair—it's clearly non-human hair—it was washed and prepared forensically, and it gave a human mitochondrial DNA result. That just doesn't happen."
Ketchum was completely adamant that contamination wasn't a possibility. "We had two different forensics labs extract these samples, and they all turned out non-contaminated, because forensics scientists are experts in contamination. We see it regularly, we know how to deal with mixtures, whether it's a mixture or a contaminated sample, and we certainly know how to find it. And these samples were clean."
But note the key phrase two paragraphs up: "if everything worked as expected." Anyone who's done much biology (or presumably, much science in general) knows that everything typically does not work as expected. In fact, things go badly wrong for all sorts of reasons. Sometimes it's obvious they went wrong, sometimes results look pretty reasonable but fall apart on careful examination.
In this case, there was no need for careful examination; the results the team got from the DNA was a mix of warning signs that things weren't right (internally inconsistent information) and things that simply didn't make any sense. But Ketchum believed so strongly in the rigor of the forensic procedures that she went with the results regardless of the problems. In fact, it seemed as if almost everything unusual about the samples was interpreted as a sign that there was something special about them.



Warning signs
Potential problems with the samples were apparent in what were likely the first experiments done with the DNA isolated from them. These were amplifications of specific human DNA sequences using a technique called the polymerase chain reaction, or PCR. By using short DNA sequences that match parts of the human genome, it's possible to start with a single DNA molecule and create many copies of it, which makes it simple to detect its presence. In this case, the PCR reactions targeted sequences that are known to vary in length in the human population—a feature that makes them useful for forensic identification.
If the DNA was human and had not degraded much during its time in the environment, then most of these reactions should produce a clear, human-like signal. The same would be true if, as Ketchum concluded, the samples contained DNA from a close relative of humans (remember, chimps' DNA is over 95 percent identical to ours). If the animal were more distantly related, you might expect some reactions to work and some to fail, with the percentage of failures going up as the degree of relatedness fell. In some cases, you might expect the reactions to produce a PCR product that was the wrong size due to changes in DNA content that occur during evolution.
But you can't necessarily expect the DNA to sit outdoors and remain intact. DNA tends to break into fragments, with the size of the fragments shrinking over time. Depending on how degraded the sample is, you might see more or fewer reactions failing.
What they saw was a chaotic mix of things. As Ketchum herself put it, "We would get these crazy different variants of sequence." Some reactions produced the expected human-sized PCR products. Others produced products with unexpected sizes. Still others produced the sorts of things you'd expect to see if the PCR had failed entirely or there was no DNA present. "We would get these things that were novel in genbank. We would get a lot of failure, and we'd get some that would have regular human sequence," Ketchum said. "We could not account for this, and it was repeatable."
All of which suggested that there was likely to be DNA present that was only distantly related to humans; anything that was from a human or close relative was probably seriously degraded.
In fact, the team did an experiment that suggested this was exactly what they were dealing with: they imaged the DNA using electron microscopy. This revealed exactly what their initial experiments suggested: shorter fragments of DNA, some of it a single (rather than double) helix. Strands that paired nicely for some stretches and then diverged into single stranded sections, which then paired again to a completely separate molecule. This sort of pattern is what you might see if there were some distantly related mammals present, where the protein-coding sequences would match fairly well, but the intervening sequences would probably be very different.
So all the initial data suggested that the DNA was badly preserved and probably contaminated. Which in turn suggests that whatever techniques they used to get DNA from a single, uncontaminated source just wasn't sufficient for the samples they were working with. But instead of reaching that conclusion, the bigfoot team had an alternative: their technique worked perfectly fine. It was the sample that was unusual.
The problem is that it simply couldn't be that unusual. The idea is that there was some other primate that was still capable of interbreeding with humans. In the cases where we know this happened (semi-modern humans like Neanderthals and Denisovans), the DNA sequences are so similar that it's quite hard to tell them apart. Here, the team was seeing indications that human DNA was mixed with something that was really quite distant—probably not even one of the great apes.
These were far from the last results that should have told them they were on the wrong track.

Looking suspiciously human
Nevertheless, the authors plowed on. And one of the first things they found was that at least some of the DNA was human. This, as it turned out, was the foundation for their conclusion that the DNA was from a human-primate hybrid.
It's often overlooked that human cells actually have two genomes. One lives in the chromosomes stored in the nucleus, and that's the one we're typically concerned with. But a second resides in our mitochondria, small compartments in the cell that provide most of the cell's ATP. These are the remains of what were once free-living bacteria but took up a symbiotic residence inside the cell billions of years ago; however, they still have a small genome of their own (circular, like bacteria's) with a handful of essential genes on it.
There are a few things that make mitochondrial DNA effective for tracking populations of humans and other species. Because this genome doesn't have a full DNA repair machinery at hand, and because it can't undergo recombination, it tends to pick up mutations far more rapidly than the nuclear genome. That means that even closely related populations are likely to have some differences in their mitochondrial DNA. There are also hundreds of mitochondria in each cell, and each of these may have dozens of copies of the genome. So it's relatively easy to get samples, even from badly degraded and/or contaminated DNA like that found in ancient bones.
So team bigfoot sequenced the mitochondrial genome of several of their samples. And rather than a novel primate sequence that was distantly related to humans, the sequences were human. Which is what you might expect if the species is a hybrid as the authors concluded. What you wouldn't expect is that the sequences would come from multiple humans—from the wrong side of the planet.
All indications are that successful interbreeding between humans and closely related groups like Neanderthals and Denisovans was relatively rare. You'd expect that something that looks like a walking shag carpet would be more distantly related, and that it would be much, much harder to successfully interbreed. This makes the hybrids even rarer. Instead, each sample tested produced a different mitochondrial DNA sequence, which implies the interbreeding had to have taken place many, many times. (And that the hybrids never bred with females of whatever the primate in question was. And that said primate is, apparently, extinct, since none of its mitochondrial DNA showed up.)
Who were these human females that ostensibly did the interbreeding? If you wanted to make a scientifically plausible guess, you'd bet on the mitochondrial DNA lineages that originate in Asia (most likely those branches that expanded into the Americas). Those are the only humans that are likely to have been around until a few hundred years ago. And that's exactly what they didn't find. Instead, most of the sequences originated in the human populations of Europe, with an African sample or two.
And at least one of them was recent—Ketchum described one of the mitochondrial sequences in detail, saying, "about 13000 years ago is when that haplotype came into existence. It was in Spain, basically, where it originated. So the hybridization could not have occurred before that haplotype came into existence." In her view, that put an upper limit on when these sequences made it to North America. "It couldn't have been longer than 13,000 years ago," she told Ars.
On the face of it, there's simply no way to make sense of this—the European and African DNA, the recent time frame for its arrival, the fact that there must have been so many interbreedings.... The obvious interpretation is that the samples were all from humans or contaminated with human DNA, which nicely explains the diversity and modernity of the sequences.
But remember, to Ketchum, that possibility had been ruled out. In the absence of the obvious, her team went with a far less obvious suggestion: sometime during the last glacial period, a diverse group of Europeans and Africans got together and wandered across the vast empty spaces of the Greenland ice sheet and found themselves in North America. "Several of the Smithsonian scientists even wrote a book about it, where they've gone below the Clovis layer and found artifacts that they feel came from [an] area in France," she said. But she wasn't committed to that idea and later suggested that the interbreeding might have taken place in Europe... after which the Sasquatch left to cross the Bering Sea-land bridge before the Ice Age ended. "It's feasible they could have crossed the world, basically," she said. "They're very fast."
Ultimately, though, Ketchum indicated these are just technical details. She wasn't especially interested in sorting them out. "We don't know how they got here, we just know they did."

A problem of technique
Most of the problems so far weren't really experimental ones; rather, they were problems with interpretation. It's only when the team went after sequences from the genome that things got a bit strange. A few of their samples appeared to have sufficient DNA to send them for sequencing on one of the current high-throughput sequencing platforms. The quality score assigned to the sequencing runs was good, meaning that they had lots of DNA sequence data to assemble into a genome (although, oddly, the team interpreted this to mean that the sample came from a single individual, which it does not).

The challenge is that the high-throughput machines typically produce short sequences that are about 100 bases long. Even the smallest human chromosome is over 40 million bases long. There are programs that are able to recognize when two of these 100 base-long fragments partly overlap and combine their sequences to create a longer sequence (say 150 bases). By searching for further partial overlaps, the programs can gradually build up longer and longer stretches, sometimes ranging into the millions of base pairs. Although this software will still leave gaps where sequences don't exist or show up at multiple places in the genome, it's still the standard way of assembling genomes from short, 100-base-long reads.For some unfathomable reason, team bigfoot didn't use it. Instead, they took a single human chromosome and got some software to line up as much as it could to that.
There are a number of serious problems with this approach. You could have an entirely different genome present in the sequences, and the software would ignore most of it. Most of the gene coding regions are highly conserved among mammals, so they'd line up nicely against the human chromosome—in fact, they might be difficult to distinguish from it. But the entire rest of the genome would be ignored by the software. By taking this approach, the authors pretty much guaranteed they'd get something out that looked a lot like a human genome.
The other problem here is that the software will typically treat the human chromosomal sequence as a target that it attempts to recreate. If it can't find a good match, it will stick the best match available where it's needed. Sometimes, the match will be fairly good. Other times, the sequence will be barely related to the template it's supposed to match.
Even given all these advantages, the software still couldn't assemble an entire chromosome. Instead, it ended up matching sequences to three different stretches of the chromosome, each a few hundred thousand base pairs long. Remember, the human genome is over three billion base pairs total. This only represents a tiny fraction of it. Given that the quality score provided for the DNA sequencing run was high, this tells us one of two things: either the software was woefully incapable of assembling a genome, even when given a template; or there was very little human DNA there in the first place. As we'll see, it might be a little bit of both.

A hypothetical hybrid
At this point, it's worth stepping back to try to figure out what it would look like if the author's ideas were correct, and some humans interbred with an unidentified hominin species to produce what are now bigfeet. There are two groups that humans are known to have interbred with: Neanderthals and Denisovans. But, obviously, anything that would have given us a bigfoot must have been quite different from the Neanderthals and Denisovans, which largely looked human. So, we can probably assume that it had diverged from our lineage for longer, but not as long as chimps.
What would the genome of such a hominin look like? Well, for Neanderthals and Denisovans, the genomes mostly look human. If there's a difference between humans and chimps, in most cases, these other groups have the human sequence. Hominin X's genome would be more distantly related. But the chimp genome puts a very strict limit on how different it could be. In terms of large-scale structure, the chimp and human are almost identical; there are only six locations with a major structural difference between the two with a total of 11 breakpoints. Unless you happen to be looking at one of those, you'd typically see the same genes in the same order. None of the breakpoints happens to be on Chromosome 11, which is what the authors were looking at, so this is a non-issue.
Smaller scale insertions and deletions are more common but not that common. Even when you consider them, the human-chimp sequence identity is over 95 percent. If you only focus on the areas of the genome where things line up without major rearrangements, then the identity is 99 percent. So any hominin that we can interbreed with would have a genome that is almost certainly in the area of 97-98 percent identical to our own. Sequences that lined up would be even higher than that.

“One thing I'm sure of is we've proven they exist. We should have been able to do it with just human mito with non-human hair, thoroughly washed and done by two labs.”
The first generation of hybrids would have a 50/50 split between these two nearly identical genomes, after which they'd start randomly assorting. Some areas would undoubtedly be favored or disfavored by various forms of natural selection. But about 90 percent of the human genome doesn't seem to be under any selective pressure at all, and most of the remainder of the genome wouldn't be under selective pressure simply because it's identical in the two species. As a result, all but one or two percent of the genome would probably be inherited randomly from one or both of the two species.
Of course, after the first generation, the two genomes would start undergoing recombination, scrambling them at a finer scale. The probability of recombination roughly scales with the length of DNA you have. The basic measure of recombination, the Centimorgan, represents a one percent probability that there would be a recombination each generation. In humans, a Centimorgan is about a million base pairs. So, if you had 50 million base pairs of DNA, then you'd have even odds that a recombination would take place every generation. In humans, the generation time averages out to be about 29 years; in chimps, it's 25. We'll assume bigfeet are in the neighborhood of 27 years per generation.
If bigfeet got started more recently than 13,000 years ago (based on the Spanish mitochondrial DNA, as mentioned above), that means there have been approximately 481 generations since. In half of these, there would be a recombination within our 50 million base pairs, meaning 241 recombinations. That means, on average, we'd see a recombination every 200,000 base pairs or so.
With that, we know what our genome should look like. Stretches of DNA, over 100,000 bases long, that is human, alternating with equally long stretches of something that looks almost human but not quite. In fact, the identity between the two sequences should be strong enough that it would be difficult to say where one ended and the next started with any greater resolution than about 1,000 base pairs. And because there were apparently a number of distinct interbreeding events (again, based on the mitochondrial DNA), then no two big feet are likely to have the same combinations of human and nonhuman stretches.

You call that a genome?
This is, of course, nothing at all like what the genome that's been published looks like. The paper itself indicates that regions of clearly human DNA are typically only a few hundred base pairs long. And interspersed with those are equally short pieces of DNA that appear to look little to nothing like the stretch of the human genome that they're supposed to be aligned to. If the genome is viewed as a test of the hybrid hypothesis, then the hypothesis fails. When asked about this, Ketchum just returned to the mitochondrial data. "I know there are ways, like you said, to figure out the nuclear age of things, but the bottom line is it couldn't have been longer than 13,000 years ago."
What actually is this? To find out, I started with the ENSEMBL genome website, which provides a convenient view of a variety of animal genomes. I then selected a large region (about 10,000 bases) from the purported bigfoot genome and used software called BLAST to align it against the human genome. The best match was invariably chromosome 11, which made sense, because that's what the authors used to build their sequence. And as described in the paper, the sequence was a mix of perfect matches to the human sequence along with intervening sequences that the software indicated didn't match.
I then selected each of the intervening sequences that were over 100 base-pairs-long and used the BLAST software hosted by the National Institutes of Health at NCBI. This would test the sequence against any genome that we've tried to sequence, even if the project wasn't complete.
If the hybrid model was correct, and these sequences were derived from another homonin, then they should look largely human. But for the first 10,000, most of them failed to match anything in the databases, even though the search's settings would allow some mismatch. Other sequences came from different locations in the human genome; another matched the giant panda genome (and presumably represents contamination by a bear). Similar things happened in the next 10,000, with a mix of human sequences, one that matched to mice and rats, and then a handful of sequences with no match to anything whatsoever. And so it went for another 24,000 bases before I gave up.
Ketchum's team had done the same and found similar results. "We had one weird sequence that we blasted in the genome BLAST, and we got closest to polar bear of all things," she told Ars. "And then we'd turn around and blast [unclear] and get 70 percent rhesus monkey with a bunch of SNPs [single base changes] out. Just weird, weird stuff."
Clearly, the DNA that was sequenced came from a mix of sources, some human, some from other animals you might find in the North American woodlands. (Recently, a researcher who was given a sample of the DNA by Ketchum announced that it was a mix of "opossum and other species," consistent with this analysis.) Clearly, there was human DNA present, but it was either degraded or present in relatively low amounts.
When asked to align this sequence to a human chromosome, the software did the best that it could by picking out the human sequences when and where they were available. When they weren't, it filled the gaps with whatever it could—sometimes human, sometimes not.

A question of motivation
In science, it's usually best to start with the evidence. But when the vast majority of the evidence points to one conclusion, and someone insists on reaching a different one, then it can be worth stepping back and trying to understand what might motivate them to do so. In Ketchum's case, the motivations weren't hard to discern; she offered them up without being prompted, even when the discussion was focused on the science.
This was clearest when Ketchum suggested that North America's bigfeet could have European mitochondrial DNA because interbreeding took place there, after which the hybrids crossed Siberia and into Alaska. As noted above, this seemed possible to her because "They're very fast." What wasn't noted above is that she followed that up with, "I've seen them, that's why I can say that." This was followed by a pretty detailed description of how this came about.

There's groups of people called habituators. They have them living around their property. And they interact with them, but they're highly secretive because one, people think they're crazy when they say they interact with bigfoot—and I prefer Sasquatch by the way, but bigfoot's easier to say. Finally a group of them came by and said "you want to see 'em? we'll take you and show you." And they did. The clan I was around was used to people and they were just very, very easy to be around—they're real curious about us, and they'd come and look at us, and we'd look at them.
With that experience and others that followed (several of which she described), Ketchum says she switched from skepticism to a desire to protect what she had seen. Several groups, including Spike TV, have offered rewards for anyone who could shoot a bigfoot, something Ketchum genuinely seems to be horrified by. "They are a type of human and we want them protected," Ketchum told Ars. "That's been the whole point of this once we realized what we had. And I've known what we had for several years now. Within the first year, we knew that we had them, it was just a matter of accumulating enough proof to satisfy science."
In terms of knowing what she had, Ketchum returned to the forensic evidence, which showed human mitochondrial DNA in a hair sample that had been identified as non-human. "One thing I'm sure of is we've proven they exist. We should have been able to do it with just human mito with non-human hair, thoroughly washed and done by two labs." At a different point, she said, "All we wanted to do with the paper was to prove there was something novel out there that was basically Homo, and the mitochondrial DNA placed it clearly in Homo."
With that clearly established, all the apparently contradictory results simply become points of confusion. When asked about the discrepancy between the young mitochondrial age and the nuclear genome, Ketchum just said it was a mystery. Referring to the apparent age difference, she said, "It would look that way but it's not, that's the problem. I don't know how to rectify that other than they are what they are, and the data is what it is." Later, she suggested that the creatures might simply experience an extremely high rate of mutation.
Ultimately, she saw the collection of contradictions as a sign of her own sincerity. "I'm not sure why they're like they are. I don't think anybody is, and I think that gives people a real problem. But we can't change how the results came out. And I'm not going to lie about them, and I'm not going to try to make them fit a scientific model when it doesn't."
After an hour-long phone conversation, there was no question about whether Ketchum is sincere in her belief that bigfoot exists and if her data conclusively proves that it's worthy of protection. But, at the same time, it's almost certainly this same sincerity that drove her to look past the clear problems with her proof.

Show more