GEDmatch Superkits – How To Reap The Benefits

GEDmatch Superkits are part of the paid tier on the GEDmatch website. You can combine up to four uploaded kits in your account into one GEDmatch Superkit – all representing DNA of one person.

A GEDmatch Superkit won’t give you a sudden increase in DNA matches. You’re more likely to lose some distant matches at the lower end of your comparison reports.

So, why on earth would you want a Superkit? The main goal is to improve the accuracy of how you match with other GEDmatch users. The Superkit is more likely to reduce false matches that can waste your time and effort when you’re researching at the lower centimorgan ranges.

GEDmatch Superkits

I tested my DNA with kits from Ancestry and 23andMe. When I compare both kits on GEDmatch, their total cM is as much as identical twins. No surprises there. But GEDmatch also flags the match with a pink warning:

The warning tells me that this match is of lower quality than some of my more distant relatives.

We’ve got a full article on what Overlap means on GEDmatch. In a nutshell, the Ancestry and 23andMe tested my DNA in different places. GEDmatch could only find a subset of positions that both companies targeted. This reduces the quality of the comparison.

Wouldn’t it be great to eliminate this problem? Mash together my two kits into some kind of new and improved Super Kit to rule(ahem) I mean run comparison reports?

It may sound like mad science. And very expensive. But GEDmatch says – give us ten bucks and we’ll do it for you.

There are free options for combining kits, but GEDmatch costs $10 for a month – and your GEDmatch Superkit is available if you drop back down to the free tier.

Why Combine DNA Kits Into A GEDmatch Superkit?

I’ll use a simplified example of the problem of DNA kits analyzed by different machines. Imagine a chunk of DNA divided into ten positions. Ancestry tested six positions while 23andMe tested four. And only two positions overlap when you compare both kits:

These “positions” of DNA are known in genetic parlance as SNPs (pronounced “snips”). Most human DNA is identical across us all, so DNA tests target places that are known to vary. These are the single nucleotide polymorphisms i.e. SNPs.

Our mock example has about 20% overlap between the SNPs tested by Ancestry and 23andMe.

That isn’t too far off the real-life overlap between an Ancestry kit and kits tested in 2020 by 23andMe, MyHeritage, and Family Tree DNA. I mention 2020 because the latter three companies were using different testing hardware compared to Ancestry.

The Problem Of Low Overlap

When you run a One-To-Many comparison report, GEDmatch tells you how many SNPs were targeted by both tests. The total is in the Overlap column. GEDmatch shades low numbers in pink as a warning of the quality of matching.

In our simplified example, the Overlap column would show a value of “2” in a deep shade of pink.

Below is an excerpt from the one-to-many report for my 23andMe kit. There’s a lot of pink in the Overlap column. The low numbers are mostly due to kits from Ancestry, which uses different hardware.

The labs of MyHeritage and 23andMe currently use the same testing hardware. So, the overlap is high for the match in the middle from MyHeritage.

But see how the 23andMe kit has the lowest overlap with my own? 23andMe changed its testing chip in late 2017. I tested after the switch, and this DNA match tested before it.

Fixing The Overlap Problem (In Theory)

Going back to the simplified example, my wish is that the combination process delivers this:

My theoretical Superkit combines each position that was tested by either Ancestry or 23andMe. And maybe a third kit from MyHeritage could plug in the missing middle pieces.

This isn’t actually what GEDmatch does. I’ll come back to the reality at a later section. For now, let’s look at how to create a GEDmatch Superkit.

How To Create A GEDmatch Superkit

The GEDmatch Superkit tool is in the paid tier, at about $10 per month’s access. Creating the Superkit is a two-step process.

  1. Click the Superkit link in the Tier 1 application list on the Home Page.
  2. Choose up to four of your uploaded kits and hit the Generate button.

This is one of the few tools which is restricted to your own kits i.e. you can’t select other people’s Kit Numbers. You also can’t work with your Lazarus or Phased kits.

I combined my Ancestry and 23andMe (v5) kits. The combination process literally took under a second to complete!

The resulting web page tells you that the One-To-One comparisons are available straight away. However, the One-To-Many comparisons take 24-48 hours.

Checking The Processing Status Of Your Superkit

The processing status of your kit is displayed on the Home Page. A green tick mark shows that it has finished processing.

I took the picture below shortly after I generated my Superkit. The circled icon shows that the the third kit is in processing mode. (The legend on the Home Page says this icon means “processing”. But what is the symbol? Is it the Vitruvian Man?)

The status changed to the green tick after about 36 hours.

A GEDmatch Superkit Doesn’t Give You More Matches

It’s understandable if you were hoping that a Superkit would give you more matches to work with. But you won’t see a thousand more matches in the One-To-Many report for your Superkit.

In general, Superkits don’t give you more DNA matches than your source kits. It may throw up a handful of new matches at the lower centimorgan level. This implies false negatives i.e. that the GEDmatch calculations made assumptions about shared DNA where it didn’t exist.

What’s far more likely is that you will have fewer DNA matches at the lower levels. And that’s not a bad thing.

A GEDmatch Superkit Can Reduce False Positives

When you compare the One-To-Many report of your Superkit with the source kits, some of your higher matches will show a slightly smaller total shared CM. The largest segment may also drop a little.

Some of your DNA matches at second-cousin level may swap places in the ordered list. But there won’t be much difference with your closer relatives.

The benefit is at the lower end of your reports, where DNA matches may disappear completely.

A simplified illustration should help explain this. Let’s take ten adjacent pieces of DNA out of thousands on a chromosome. Your Ancestry kit tested positions 2, 3, 5, 6, 7, and 8 – and has matching DNA with another kit at those positions.

Position 4 was not tested by the Ancestry chip. The other kit was from 23andMe, and tested from positions 2 to 8.

The GEDmatch algorithm performs additional calculations to “fill in the gaps”. It may infer that position 4 is probably a match between the two kits. Now we have a segment of seven centimorgans. Hey presto: a DNA relative at the GEDmatch threshold of One-To-Many comparison reports.

Enter the Superkit. The additional source kit has tested position 4. And this extra piece of DNA does not match – giving us a segment of two cMs, a mismatch, and a segment of four cMs.

This falls below the GEDmatch threshold, and the target kit falls out of your One-To-Many comparison lists.

This is how Superkits reduce the number of small false-matching segments. Your higher matches may show slightly less shared DNA, and some of your lower matches will disappear.

The Main Benefit Of A GEDmatch Superkit

If you have so many high cM matches that you never look at the lower range, then a Superkit isn’t going to do much for your research.

If you spend a bit of time down in the lower regions, then the Superkit can help you focus your research on DNA matches that are less likely to be false.

Marking Kits As Research – Be Kind To Other Users

When you make a Superkit, you now have one more kit that shows up in comparison reports of other GEDmatch users. These (nearly) duplicate kits for one person can be a little confusing – or irritating.

There was a period when the GEDmatch system would automatically mark the source kits as “Research”, but that isn’t happening now. When I created a Superkit in 2020, the status of my other kits did not change.

You can edit the status yourself from the Home page – click on the pencil icon beside the Kit Number.

The Research status can be found toward the end of the page in the “Public Profile” section.

Toggle on the Research opion, and this DNA kit will not turn up in other people’s comparison reports.

How Do GEDmatch Superkits Really Work?

Our simplified diagrams present an idealized picture of how the merging of two kits might work. It assumes that GEDmatch uses all the SNPs across the combined kits.

But GEDmatch also targets its own set of SNPs, and leaves some behind. Once again, we get calculation and extrapolation as part of the outcome. And unfortunately, GEDmatch does not allow a download of Superkits – so we can’t actually check (or verify) what exactly is going on.

But if I was sitting on the fence about making a Superkit (and already had multiple tests), I’d be swayed by a series of in-depth articles on Louis Kessler’s Behold Genealogy blog. I’ll touch upon his research here, but if you like figures and raw data – follow my links below.

I mentioned that there are other ways to combine downloaded DNA results. Louis did his own combination of five DNA kits and uploaded the mash-up to GEDmatch. He then compared this kit against one created by GEDmatch. The Superkit result were actually very good, and he recommends using Superkits.

As to the question of how do GEDmatch Superkits really work? The manual walkthrough by Louis (the combination article) illustrates the underlying principles.

More Articles and Tutorials?

.

Margaret O'Brien

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.