If you’ve uploaded your DNA to GEDmatch, you may be scratching your head and wondering if your GEDmatch results are accurate. You are bound to see some differences from what your DNA testing company tells you. Both the ethnicity estimates and DNA relative matching will diverge. And not all kits on GEDmatch are real.
So, is GEDmatch accurate and legit?
Comparing Relationship Estimates To Other DNA Sites
The one-to-many comparison tool in GEDmatch shows you your DNA relatives in descending order of calculated centimorgans. One of the last columns has the DNA testing company.
I’ve tested with Ancestry and 23andMe and transferred my Ancestry results to GEDmatch and other sites. It’s not always easy to find your common matches across sites. But thankfully, I have quite a few matches on GEDmatch that use the same or similar display names when they upload their DNA.
So, I picked the top three DNA matches I recognized on 23andMe and compared the total centimorgans reported by both sites.
That’s actually not bad, is it? If you expected the numbers to be bang on – I’ll come to that. But I want to give FamilyTreeDNA a whirl. There are three from my match list reports on both sites:
Uh, oh. Wayne is showing as almost twice the centimorgans on Family Tree DNA.
I’ll point out here that I’m comparing DNA tests on GEDmatch from two different sites. And Family Tree DNA is known for estimating higher centimorgans than other DNA sites.
What if I take a look at matches I recognize from Ancestry? In these cases, both kits are coming from the same DNA testing company. How does GEDmatch compare with Ancestry?
The differences aren’t wild, but they’re not insignificant. What’s going on?
Why Does GEDmatch Have Different Relationship Estimates?
All the DNA testing companies have their own proprietary matching algorithms. They have different thresholds for the number of small segments of shared DNA that they include when summing up the total CM measure.
And there are other differences. Ancestry’s Timber algorithm “throws away” some matching DNA that other companies do not. Each company has tweaked its algorithms in recent years, seeking higher accuracy.
So, the displayed centimorgans (or 23andMe’s percentages) are estimates produced by different calculations. This is why you’ll see different numbers in your match list display.
The key word here is estimates. All the company’s numbers are estimates.
GEDmatch’s algorithms are respected in the genealogy community. If people found major issues with what GEDmatch was reporting for known relatives, then it’s reputation would rightly be panned. That is not the case.
GEDmatch Warns You About Potential Inaccuracy
The different DNA testing companies have different methods of analyzing DNA samples. Think of your chromosomes as a length with thousands of tiny pieces of DNA.
For reasons of cost, the consumer DNA companies do not test every single piece. They skip whole sections of DNA. And here’s the kicker: different companies may skip or test different areas on the map. That makes comparing kits from different sites a real challenge.
Take a look at the last column of these two rows from my match list. (I’ve omitted some of the columns). It’s heading is “Overlap”.
Both kits are from 23andMe, whereas my own kit upload was from Ancestry. GEDmatch has highlighted one figure in pink. It’s a much lower number than the one below it.
This number is a count of the number of those tiny pieces of DNA that were compared between your kit and the match. Those tiny pieces are called SNPs (and pronounced as “snips”).
GEDmatch is warning me that the low pink number represents a DNA kit where the testing was markedly different from my Ancestry kit. I explain this in a companion article on Overlap. Suffice it to say here that GEDmatch is warning you that they are less confident about the accuracy of this comparison.
The one number to be wary of is the count of Generations on your match list. This is supposed to show how many generations are between you and your DNA match.
It’s reasonable at the lower levels, but doesn’t have much meaning above a count of four. The number of possible relationships that are represented by lower centimorgans is too complex to capture in a simple estimate of generations.
Fake Kits Are Probably Legit
My highest matches on GEDmatch are under 50 cM. So you can imagine my excitement when I logged in one day to see a new top match in the high hundreds. My eyes popped wider when I saw I had several more new matches in the mid-hundreds. And all with the same email address. A whole family!
To put this into context: I’m an adopted adult. I figured these must be “new” first and second cousins. I was genuinely pleased to reach out and make contact.
The young man who replied to my email was politely apologetic. These samples weren’t real people. They had been created as part of a research project, and he ruefully told me he’d done something wrong. His synthetic kits were matching to too many people.
I got the impression he was deluged by similar emails to mine. He was true to his word about deleting the kits from GEDmatch.
How Would Anyone Make A Fake Kit For GEDmatch?
You may be wondering how on earth someone could fake a DNA “kit”. Don’t they originate from spit or a cheek swab? But GEDmatch doesn’t test your DNA, it accepts the raw results.
Take a look at a section of my Ancestry raw results. This is just text that can be overwritten. (Well, you should know what you’re doing if you try it).
Kevin Borland develops software to help construct DNA kits. And GEDmatch itself has a tier-one (paid) tool that generates kits – the Lazarus tool.
Why Would Anyone Put A Fake Kit On GEDmatch?
There are perfectly legitimate reasons to manipulate or create raw results like this.
The GEDmatch tool is for reconstructing the DNA of a deceased ancestor. This tries to give you the benefits of testing a parent when both are deceased. The benefits include higher centimorgan DNA matches than you will get with your own kit.
My young fake cousin (the one whose research went wrong) was conducting genetic research on ethnicity (I think). But there are potentially nefarious reasons. The DNA Geek gives a clear account of a quite technical subject.
Are GEDmatch Ethnicity Estimates Accurate?
GEDmatch has a dizzying variety of admixture projects. It’s up to you to choose the one you deem most accurate for your heritage.
And if you’ve confidently selected “Eurogenes”, the questions don’t stop there. Would the K11 calculation model be the most correct? Or is the K12 model more accurate?
My point is that the ethnicity or admixture reports on GEDmatch are based on differing academic interpretations and calculations of your DNA results. They can’t all be right!
Admixture Academics differ and GEDmatch Users die…quietly inside, then cheer up and return to some proper genealogy research of their DNA matches.
I jest. The admixture reports are interesting in their own right. They’re just not particularly useful for researching our family trees.
But I haven’t answered the question as to their accuracy, so let me give that a stab. Previously, I wrote a guide to Ancestry’s ethnicity estimates. And I asked the same question: how accurate are the results?
I gave several factors that reduce the potential accuracy. Most apply to all the companies. I’ll recap three problems with a focus on GEDmatch.
Modern DNA Samples Reduce Accuracy Of GEDmatch And Other Sites
The admixture projects are based on collections of samples that are hoped to represent particular groups. Ideally, there would be hundreds of samples that were centuries old and could be tied to a region without having to take migration into account.
GEDmatch has the interface of a particularly complex time machine, but it can’t magic up enough ancient samples. It does have a few, by the way. See my articles on archaic matches, and ancient Irish matches.
Many of the GEDmatch projects are based on current DNA with varying degrees of manipulation.
Some Regions Are More Accurate Than Others
The research projects have to collect as many appropriate DNA samples as they can get. The larger the number of samples, the more accurate the estimates.
But Western regions simply have more people purchasing consumer DNA kits.
I believe that some of these projects try to remedy this by providing DNA testing to target areas. But the documentation on GEDmatch is sparse.
Differing Calculation Models
There are significant differences between my ethnicity breakdowns on Ancestry, 23andMe, MyHeritage, and Family Tree DNA. I think that this isn’t just because of different reference panels. It’s also because each is using a different calculation model. That’s a guess because the companies don’t publish the detailed calculations.
But each of these DNA testing company uses one calculation model that it tweaks over time.
GEDmatch takes a different approach. It offers you lots of different calculation models (and very little documentation).
It’s up to you to try to find the most accurate one for your heritage.
Are My Own Ethnicity Estimates Accurate?
For context: my maternal line is Irish and my paternal line is African.
I’ve taken a few of the GEDmatch projects for a whirl. Some, but not all, give a half-and-half split to Europe and Africa. So I judge the continents to be accurate in some projects.
Yes, I’m not judging to a high standard!
The sub-regions vary markedly across different projects. I intend to write more about how to use the admixture reports in another article. For now, I’ll conclude that my GEDmatch admixture reports are broadly accurate at a high level.
More Articles And Tutorials?
This article on GEDmatch Overlap goes into more detail on accuracy than presented here.
If you’re curious about the organization and people behind the website, we have an article on GEDmatch ownership.
And if you’re interested in future articles, we have a weekly newsletter.