If you’ve noticed the Overlap column in the One-To-Many Report on GEDmatch, it’s probably because you have DNA matches with varying shades of pink. GEDmatch is warning you to treat these matches with some caution.
In general, you don’t need to worry about overlap with your higher (closer) matches. The overlap value is more important when you are reviewing more distant matches. This article starts with a simple explanation, and then dives into the technical details with lots of illustrated examples.
What Does Overlap Mean On GEDmatch? A Basic Guide
If you’re not interested in the technical aspects behind Overlap numbers, this section is the 101 guide to using the Overlap information.
We have a separate tutorial on how to use the One To Many report, which goes into the rest of the display. Here, we focus on the overlap information.
GEDmatch uses a color-coded warning system to warn you when it is less confident about its matching calculations.
The sparse documentation on the One-To-Many Report says that the Overlap field is highlighted with a pink or red background.
There actually seems to be three shades of non-white. Judge for yourself: I’ve taken an enlarged screenshot of the Overlap column of four of my matches in descending order over overlap.
The deeper the shade, the stronger the warning. But what are you being warned about?
Warning: Less DNA To Work With
Different DNA testing companies may use different technologies to analyze your DNA sample. And the same company may change how it configures the technology to test your DNA.
GEDmatch accepts DNA results from many different sources. If you and your DNA relative tested with Ancestry in 2020, then GEDmatch is comparing apples with apples. Both kits were tested with the same technology and algorithms.
But if your DNA match tested with 23andMe in 2020, then the underlying technology is quite different. 23andMe isn’t just using different hardware to Ancestry. They are dividing and testing the DNA samples in different places along the chromosomes. This is the key to understanding what overlap means on GEDmatch.
What Overlap Means On GEDmatch
Let’s take a hypothetical example: a chunk of chromosome divided into ten tiny pieces of DNA. Ancestry’s technology tests five positions: 1, 2, 4, 7, and 10.
23andMe also tests five positions (it could have been four or six). But 23andMe hits position 1, 4, 7, 8, and 10.
Only four of the pieces overlap instead of five. For GEDmatch, that gives you a value of 4 in the Overlap column.
So, let’s look again at real values from my one-to-many report. The top match had 265,862 tiny pieces of DNA that were tested in both kits. This has nothing to do with whether the DNA was identical across the kits. That measure is in different columns in the report. The overlap is the number of DNA markers (tiny pieces) in one kit that were also tested in the second kit.
Is that a good number? Well, it’s a lot better than the numbers below it.
My kit in this comparison was from Ancestry. And so is the kit with the highest overlap value shown here. The other kits are from 23andMe. It’s clear that the two companies are markedly different in the positions that they target.
You may think – well, at least it’s easy to compare kits from the same company. But 23andMe changed its technology in late 2017. The new hardware tests less positions and and at many different places than the old. Comparing a 23andMe kit from 2016 with one from 2020? Apples and oranges, again.
But GEDmatch manages to make those comparisons (as do other companies that accept DNA uploads). How does GEDmatch do it?
A key benefit of GEDmatch is that it attempts to compensate for the differences across the DNA testing companies.
The GEDmatch comparison looks for the pieces of DNA that were included in both tests. Then it uses statistical techniques to make educated guesses – basically filling in the gaps where DNA was examined in one kit but not the other. Imputation, extrapolation…the algorithms are complicated but hopefully the concept is clear.
Of course, this introduces the possibility of error. The less pieces of DNA that different technologies have in common, the lower the quality of these guesses. And a darker shade of pink.
When to Ignore GEDmatch Overlap and When To Take Notice
The one-to-many list for my 23andMe kit on GEDmatch is nearly all pinks and reds. Take a look at my highest match.
Wait, this match is a whopping 3,569 cM. Is it my twin? No, it’s my other DNA kit from Ancestry.
GEDmatch is comparing my DNA results from different test companies. The estimated total centimorgans are very reasonable. But the match is flagged with a warning pink.
Clearly, we don’t run away from all these Overlap warnings.
As a general guideline, focus on the total centimorgans and largest segment. Your higher centimorgan relatives are not going to be false matches due to low overlap values.
As you drop down to more distant DNA matches, you are faced with the usual genetic challenges of working with low cM. The shared DNA may be real (i.e. not a computation error) – but it could be identical due to the random nature of inheritance i.e. identical by chance.
So, these low matches are already prone to be false positives. Compound that with the introduction of comparison issues due to different technologies – and this is where you should take notice of Overlap warnings.
You can use the extra info that GEDmatch is giving you to prioritize your research. If you have two 8 cM matches you are determined to investigate, you may as well start with the match who is less at risk from extrapolation errors.
Chips And SN(i)PS – A Technical Background To GEDmatch Overlap
I hope that the previous section is enough to get away with using GEDmatch to best effect. It’s a fair bit more info than GEDmatch provides on the reports! But I also think it’s useful to understand what’s going on under the hood. First, let’s geek out on some hardware.
Four of the top five consumer DNA testing companies use technology from a company called Illumina.
Illumina’s name cracks me up. There are plenty of people who believe that DNA tests are passed on to some nefarious organisation. Telling them that Illumina
ti are involved – yeah, let’s not go there.
Living DNA started with Ilumina but switched to an offering from Thermo Fisher Scientific.
The physical tools that examine DNA are referred to as microarrays – or chips. I’ll use chips in this article.
There was a period of years where Ancestry, 23andMe, MyHeritage, and FamilyTreeDNA all used various versions of Illumina’s OmniExpress chip. But technology vendors march on, and Illumina rolled out a new chip called GSA (Global Screening Array).
Chips Testing SNPs
The DNA chips do not test every tiny piece of DNA in your sample. The cost would be prohibitive at consumer level. Besides that, there’s no need. Most human DNA is the same.
Instead, a DNA chip targets a specific set of positions that are known to vary across human populations. Each position is a single nucleotide polymorphism. Thankfully, there’s an acronym for that: SNP, pronounced “snip”.
But now for the kicker. You’ve heard of different strokes for different folks? Well, there are different SNPs for different chips. In other words, different versions of chips target different sets of SNPs.
Different Chips, Different SNPs
Let’s go back to our example of Ancestry and 23andMe. Both tested five SNPs, but at varying positions.
This isn’t too far away from a comparison between an Ancestry test of 2020 and a 23andMe test that dated from 2016. Both were using the OmniExpress chip at these times. At some point, 23andMe requested a custom version that greatly reduced the number of SNPs tested.
You may see mention of V3 and V4 in relation to 23andMe. That’s referring to their change in usage of the OmniExpress chip. 23andMe customers can see their chip version in the Settings section of the 23andMe website.
Our example could represent the 23andMe V3 version alongside Ancestry’s use of the same chip technology. The overlap is considerable but not complete.
That is not the case with the change from OmniExpress to the new GSA chip. There is about 20% overlap in terms of which SNPs are tested. Yes, I also raised my eyebrows when I first saw that number.
Companies Change Chips (and SNPs)
It’s fair to say that Illumina wants to retire the OmniExpress chip and focus on the new GSA chip. The Counting Chromosomes blog refers to Ilumina advising Living DNA that the older chip would be discontinued at some point.
Let’s pause to think of the impact on DNA testing companies that offer DNA matches as a feature.
Suppose Ancestry suddenly switched to the GSA chip, with no corresponding changes to their matching algorithms. Remember that rather low figure of 20% overlap?
New customers would excitedly view their Ancestry DNA match list and see something like this:
Tumbleweed! You can understand why the genetic genealogy companies didn’t all rush at once to embrace the new chip. A switch would require significant effort on their part to produce algorithms to accommodate the differences.
Some of the consumer DNA companies are big, but not big enough to halt their technology provider in its tracks. Illumina’s core business is not genetic genealogy. Their financial focus is on healthcare, and the new chip has improvements in medical analysis.
Who Changed When?
23andMe stands out as the consumer DNA company with a major focus on health reports. And I think they were the first to jump chip from OmniExpress (pun intended). 23andMe switched in the second half of 2017.
But 23andMe wasn’t the first adopter of GSA.
Living DNA, the newest of these DNA companies, didn’t have to worry about upgrading. They launched their DNA testing service on the GSA chip the year before 23andMe moved over.
MyHeritage and Family Tree DNA moved in 2019.
Have you spotted the lone gunslinger holding out in the rundown shack? Or the last drinker refusing to leave the dilapidated bar? Yep, its Ancesetry.
There’s some speculation out there as to how Ancestry will proceed. Ed Williams’ take on things in 2019 may already be outdated, but it’s both interesting and entertaining.
The International Society of Genetic Genealogy (ISOGG) maintain a SNP comparison chart across the different testing companies. It gives you the total number of SNPs tested for different versions of chips.
I compiled my own chart for working with GEDmatch. My Ancestry upload dates back to 2017, so I use the “Date Compared” column in the One-To-Many Report to estimate which chip version my match is on.
For example, if a MyHeritage match is dated as October 2018, then I figure it’s on the OmniExpress chip.
My chart is a guesstimate. Some companies announced some of the switches, and the rest of the dates are collated from conflicting guesses in forums and blogs. So I don’t guarantee that I’ve got all the details right. I’d be happy to take corrections in the comments section!
The Low Overlap Problem: Enter GEDmatch Genesis
When the DNA companies first switched over to the new GSA chip, their kits could no longer be transferred to GEDmatch.
GEDmatch couldn’t handle working with such low overlap of SNPs between the old and new chips. So, the GEDmatch team created a second database to hold the new kits. And developed new algorithms to process matching across the OmniExpress and GSA chips.
The new system was called Genesis.
If you’re fairly new to GEDmatch, then it may be confusing to see references to GEDmatch versus GEDmatch Genesis. In some ways, you can forget history. GEDmatch has migrated all data into the new database and now maintains a single website.
What Does N/A Mean In the GEDmatch Overlap Column?
I uploaded my Ancestry kit in 2017, well before GEDmatch migrated to Genesis. A lot of my matches in the one-to-many report show N/A in the Overlap column.
And they have a weird name for the Testing Company.
When you see “Migration” in the Testing Company name, then you know that these are pre-genesis kits that were migrated to the new system. The last letter tells you the actual Testing Company:
- A for Ancestry
- M for 23andMe
- H for MyHeritage
- F for FamilyTreeDNA
N/A means that both kits being compared were uploaded to GEDmatch before the Genesis migration. Therefore, the overlap is so high that GEDmatch considers the number to be irrelevant.
Interpreting GEDmatch Overlap In Your Comparison Reports
This section covers different scenarios you are likely to see in the Overlap column. I’ll illustrate what each means with examples.
What you see depends on the chip that tested your DNA kit. I’ve got an Ancestry kit on the older OmniExpress and a 23andMe kit on GSA.
I’ve transferred my DNA to Living DNA, but I haven’t used their testing service. So I don’t have screenshots based on the AffyMetrix chip.
The Older Chip – Ancestry and Older Kits From Other Sites
This picture is from my one-to-many report on my Ancestry kit. The example shows my own 23andMe kit above a DNA match who has also tested with Ancestry.
The number of SNPs compared for the two Ancestry kits (i.e. with Mark) is 167,765.
This is more than double the number between Ancestry and the 23andMe GSA chip.
Different Chips, Same Company
These three matches are from the one-to-many report for my 23andMe kit on the later GSA chip.
Notice that the bottom match has a clear Overlap field with a high number of SNPs. This shows that it was tested on the same chip as my own kit.
But the other two kits are also from 23andMe. What’s going on there? Well, they must be from the older OmniExpress chip, to be that low.
One of the kits has a considerably lower Overlap than the other. I’m speculating here, but it’s possible that one kit was from 23andMe’s V4 version of OmniExpress, while the other one was on the prior V3 version. Unfortunately, I can’t tell from the date – it represents when I uploaded my kit and the comparison process completed.
Slightly Different Values, Same Chips, Same Company
Here are two Ancestry kits compared with my own Ancestry kit.
The Overlap values are nice and high, but notice that they are slightly different! As Overlap represents the total number of SNPs that are common per test, shouldn’t the two numbers be the same?
No, you’ll see small differences across tests on the same chip. This can be due to minor errors at localized areas during the testing process of a particular kit.
Pre and Post Genesis Kits
This picture illustrates several aspects that arise from kits that were uploaded to GEDmatch before and after the Genesis conversion. I’m running the report for my 23andMe GSA kit.
Let’s take it from the top.
The first kit shows”Migration-F2-A” in the Testing Company name, so we know it was uploaded from Ancestry prior to Genesis. But it’s not showing “N/A” in the Overlap column. This is because my kit (the other side of the comparison) is post-Genesis. We get an Overlap value when one or both kits were uploaded after the migration.
The second kit is also from Ancestry, but uploaded after the migration. The low overlap numbers are similar because Ancestry has stuck with OmniExpress.
Notice how the 3rd test is in worse shape for Overlap, yet it’s the same testing company as my kit (23andMe)? That shows that the 4th test must be from when 23andMe was using the older OmniExpress chip. It was closer to Ancestry than the new 23andMe chip is to kits from the same company.
And finally, we have a MyHeritage kit with a high Overlap value. It must be on the GSA chip!
Using The GEDmatch Overlap Cutoff Threshold
GEDmatch provides a filter to set a threshold for the Overlap value. The drop-down filter is way over on the right of the One-To-Many report page.
I’m far more likely to use the cM or offset filters on this page. But if you want to ignore low Overlap matches, this one is for you!
Other Articles on GEDmatch
You may have noticed there’s not a huge amount of documentation on the GEDmatch website. We’ve got a growing category of GEDmatch articles – you can check out the current list here.
We’ve also got a GEDmatch playlist on our YouTube channel. Two of our more popular videos are on comparing your DNA to Neanderthal and ancient Irish DNA kits on GEDmatch. You’ll find them in the playlist!
5 thoughts on “What Is Overlap On GEDmatch – An Illustrated Guide”
Just one problem. You write “The number of SNPs compared for the two Ancestry kits (i.e. with Mark) is 167,765.”, based on the overlap field. However, my second cousin and I both tested relatively recently on Ancestry, and we both uploaded to GEDmatch. The overlap field shows 211851 for our kits. But when I look at the one to one comparison for the two of us, it says “455469 SNPs used for this comparison”. So I have to say I don’t totally believe the explanation of the overlap field that I have see at here and at family history fanatics on youtube.
Interesting numbers – thanks for posting the details. The GEDmatch calculations are proprietary, and we all can only interpret what’s in front of us.