This article provides a step-by-step tutorial on using GEDmatch Oracle within an Admixture project. We’ll address the basic concepts first, and then walk step-by-step through using and interpreting Oracle results.
The original creator of the Oracle utility described it as a “little fun tool“. Bear in mind that the results are highly speculative. But they are certainly fun – as long as you can understand them!
Before I get to the Oracle, I need to give a simple description of a Calculator. So, let’s start there!
GEDmatch Admixture Calculators
Each calculator in a GEDmatch admixture project shows your ethnicity percentage breakdown against a selection of broad categories. Some example categories are North_European and East_African.
The categories are derived from many small clusters of DNA samples. These clusters are called reference populations, and are assigned labels like:
Each project creator tries to cluster the samples they collect in a way that strongly represents a distinct ancestral location, community, or ethnicity. The creator picks the labels.
Always bear in mind that there is plenty of subjective interpretation built into a GEDmatch admixture project. This is why you’ll get different results with different projects and calculators.
What Is GEDmatch Oracle?
The GEDmatch Oracle compares your DNA kit to all the reference populations within an Admixture Project. It measures how closely your admixture percentages are aligned with each population.
It’s important to understand that Oracle isn’t measuring shared DNA between your kit and the reference samples. It’s calculating how closely the breakdown of your ethnicity percentages match the breakdown of a reference population across the same categories.
This may be clearer with an example:
Let’s say the Armenian reference population averages out as 29% Mediterranean, 54% West Asian, and 12% Southwest Asian within a project.
In contrast, the Maasai population has negligible percentages for those three categories. Instead, they score highly on three African categories.
You run a GEDmatch calculator that gives your admixture breakdown as 40% Mediterranean, 50% West Asian, and 10% Southwest Asian.
The Oracle compares your percentages to each reference population and gives you a measure of how closely you are aligned. Your “Armenian” comparison is going to rate much closer than your “Maasai” comparison. The rating is known as the distance.
Looking Under The Hood
The GEDmatch Oracle is a version of a utility program first developed by a blogger known as Dienekes Pontikos (a pseudonym). The original details are here.
Dienekes made the utility available as a download to run on a local machine. Other developers ported it for use online. The version on GEDmatch is slightly different to the original software.
GEDmatch Oracle Distance
GEDmatch Oracle distance is a measure of how closely a DNA kit’s admixture percentages are aligned with a reference population in a GEDmatch project. The lower the number, the closer the fit.
If your own DNA sample was also one of the reference populations, your Oracle distance would be zero. In other words, the admixture percentages would be identical. The theoretical maximum distance is 100.
So, what’s a good number? Plenty of GEDmatch users post their results on genealogy forums, and I’ve seen distances in the low single digits.
Bear in mind that the Oracle distance is a mathematical calculation. The numbers should not be confused with a genetic measure like centimorgans.
Oracle Single Population
The first set of results that Oracle gives you is titled “Single Population”.
Behind the scenes, the Oracle has calculated your distance to every reference population within the project. The display lists the top twenty results i.e. the lowest distances. You can usually focus on the first few outcomes for insight.
The populations are supposed to represent ancestral DNA from one area, community, or ethnicity. The Oracle calculations may work very well for you if all your grandparents are of the same region or community.
But what if your father is Armenian and your mother is Welsh? Or your paternal line is East African and your maternal line is Irish? The calculations won’t make much sense. I’m the latter example, and my top distances are very high for the Single Population examples.
In these case of dual ethnicity, a more complex calculation may help: the Mixed Mode calculation.
Oracle Mixed Mode Population
The second set of results that Oracle gives you is titled “Mixed Mode Population”. Don’t be put off if these terms are unfamiliar (statistics again). The concept is quite simple.
The Oracle runs through every possible pairing of the populations within the project. For each pair, the utility calculates your distance to the combined pair. After all those calculations, your displays shows the top twenty.
Consider a very simple example of four populations: Armenian, British_Isles, Maasai, and Mixed_Slav. The Oracle would calculate your genetic distance to six distinct combinations of each possible pairing.
In reality, the GEDmatch projects may have over 200 populations – but the numbers are crunched very quickly.
The goal is to produce the top combinations that align with your dual ethnicity. I‘ll show my results later on – you’ll see that they make more sense for me than the Single Population calculations.
But suppose your grandparents all come from different places? The results aren’t likely to be great for you. Enter “Oracle 4”, a separate application.
This article would be very long if I covered Oracle 4. I’ll leave it for another day.
The rest of this article is a tutorial on how to use GEDmatch Oracle.
A Video Walkthrough
If you prefer a visual walkthrough, this video runs through the same material as our article. But there’s more background information in the article, and I refer back a lot in the video to the rest of the content here.
You can use the timestamps in the video description to jump to the different sections.
How Do You Access the GEDmatch Oracle?
The Oracle utility may be available after you run a project calculator on your chosen DNA kit.
Some project calculators don’t have Oracle available. If you use the same project and calculator in this tutorial, you’ll see both Oracle utilities on the display. If those buttons aren’t there on the calculator you choose – you’ll have to try a different calculator or project.
So, to run the Oracle analysis, you must first launch an admixture project.
(1) Click on the “Admixture (heritage”) link under DNA Applications on the Home Page.
(2) Choose one of the GEDmatch admixture projects
I’ll use the Dodecad project in this article because it was originated by the same person who created the first Oracle utility.
(3) Enter your kit number and choose a calculator
I’ll use the first calculator model that was included in this project.
“Dodecad” is the greek for twelve, and it refers to 12 reference populations in the project. Don’t worry if that doesn’t mean much to you, we’ll explain when we get there!
The calculator will take up to ten seconds to complete. Don’t refresh your screen, the results will display shortly.
(4) Check that the calculator has one or two Oracle utilities
There are usually one or two command buttons labeled with “Oracle” and/or “Oracle-4”.
These have moved positions at times, but they are currently on the left-hand side of the page. Look underneath the list of populations (which I’ve truncated in the image below from a list of twelve).
How To Arrange Your Browser Pages For The Best Oracle Display
My top tip before proceeding to look at your Oracle results is to give yourself three web pages to work with. This is because the GEDmatch display is not user-friendly.
Before I click on the Oracle button, I like to duplicate the display page twice. As the calculator re-runs each time, this does consume a few more resources from the GEDmatch servers.
But I have no qualms, particularly since the website was purchased by a commercial entity. Hey Verogen: improve the ridiculously poor user experience!
Take these simple steps:
(1) Right-click the browser tab and choose “duplicate” from the menu.
A new tab will open with an error message. The page asks you to “Confirm Form Submission” by reloading the page. Do what they ask!
(2) Reload the browser page by pressing the FN 5 button on your keyboard.
With my latest laptop, I also have to hold down the fn (function) key simultaneously. And click “Continue” to keep going.
The calculator model will rerun, so this may take up to ten seconds to get the same display that you duplicated.
(3) Duplicate the page one more time.
Now you’ve got three tabs, you’ll keep the first page as a visual breakdown of your results. You’ll use the other two pages to click command buttons and open new displays.
At this point, you’re all set to go. If you can drag these pages to multiple monitors – even better.
Launching The GEDmatch Oracle Utility
This section is on the standard Oracle utility. A later section will look at Oracle-4.
Click the “Oracle” button on one of your browser pages. This does not launch a new page, which is why I like to generate the duplicate pages upfront.
I’ll work down from the top to explain section by section of what you see on the page.
The top section is the same for every Oracle display, as it provides links to the original developer and project. I’ll have a full article on the Dodecad project soon.
The Oracle utility was first developed as a utility to download to your local machine. Credit is given to Zack Ajmal for working with the GEDmatch developers to port the utility online.
Sorted Admixture Results
This section simply duplicates the calculator display. It shows your percentages against the broad categories of the project. The difference here is that
- the results are sorted in descending order
- categories are not shown if you have zero or insignificant alignment
I like to keep the main display open so that I can see the categories that I don’t match. These are listed with a blank entry in the category list.
The next section is where the Oracle information really starts. First, you’ll see the results of Single Population Sharing.
How To Interpret Single Population Sharing
The first thing to know about Single Population Sharing is that the calculations assume that your ancestors are from the same region or community. If your parents are from different regions or ethnicities, this will not work so well for you.
The display sorts the reference populations by how closely you align with their ethnicity breakdown.
Here is a truncated version of my list of 20 entries for Dodecad V3.
The measure of alignment is labeled as “Distance”. We explained this earlier but I’ll repeat that the lower the value, the more your ethnicity breakdown is similar to the population. My top distances are quite high.
In general, you can focus on the top four or five rows in the report. If your top values are single digit, then you can start getting interested. If they are approaching 1, then dive into further investigation!
GEDmatch Oracle Populations And Their Source
So, who exactly are you being measured against?
Each row represents a group of DNA samples within a project. These are known as the reference populations.
Each population is assigned a descriptive label by the project creator. These labels may be obvious – like “German” or “Tuscan”. Don’t be fooled – they may not correspond to modern geographical entities.
Some projects (but not all) name the source of the DNA samples. The sources can be divided into two types:
- Academic genome projects that publish DNA results
- Volunteer DNA samples collected by the project creator
The project as a source
The population labeled as “German” in my list above has a source named “Dodecad”. This is the same name as the project. This shows that the group was made up of DNA results collected by the project creator from volunteer submissions.
I’ll discuss this process further in an article on the Dodecad project. Just be aware that the population could be derived from as few as five samples.
The other reference populations are sourced from academic studies in the public domain. Here’s a rundown of the ones shown in my example.
- Reich refers to genome studies led by Dr. David Reich at Harvard University.
- Henn refers to a 2011 publication by Dr. Brenna Henn of Stanford
- HapMap refers to an international genome project
- Xing refers to a genome sampling project led by Dr. Jinchuan Xing
And that’s just the sources from my small example. If you want to learn more about any source, you can google something like “[name] genome”. Look for an academic publication in the results.
Oracle Population Spreadsheet
You will probably not recognize many (or most) of the populations listed in your Oracle results. And when you do recognize “German” or “Tuscan”, don’t assume that the label corresponds to modern meaning.
This is where the Oracle Spreadsheet comes into play. You’ll find the button above your ethnicity breakdown on the general display.
The population spreadsheet is specific to the chosen project.
If you are going to use it via the GEDmatch website, a large monitor will be an advantage. Try reducing the browser display to 80% to fit more width on the screen.
I like to copy the display to an external spreadsheet. I have detailed instructions in a later section.
So, once you’ve opened it – what exactly are we looking at?
Header and rows in the Oracle Spreadsheet
The broad ethnicity categories are listed across the top of the spreadsheet. There are twelve in the Dodecad V3 project.
The reference populations are listed down the left of the spreadsheet. These are the small groups of DNA samples that represent a specific area or community.
The spreadsheet represents the percentage breakdown of each reference population against the project categories.
The Oracle Spreadsheet has nothing to do with your DNA kit and Oracle results. It’s purely about project information. Every kit will see the same display.
Using The Spreadsheet To Interpret Your Oracle Results
I’m not going to use my own Single Population results as an example, as they don’t make much sense.
Instead, I’ll take an example from someone with a closer distance to their top populations. This person posted their results on a public forum, so I’ve gratefully borrowed them! They look like this:
|3||Argyll (1000 Genomes)||3.89|
The first step is to find each population in the Oracle spreadsheet (you just need to work with your top few).
Here’s a gotcha: the populations aren’t listed in complete alphabetical order in the GEDmatch display. There seems to be multiple alphabetized groups. You can use the browser search feature (ctrl-f) to jump to the text.
Another gotcha: there are some populations with very similar names. One may just be a plural of the other. Make sure you find the correct population.
Other GEDmatch display drawbacks
Once you start scrolling or jumping down the website display, you’ll find another major drawback. The header row isn’t fixed, and it’s the key to interpreting your results.
You also can’t filter the populations to the ones that are relevant to your DNA.
For those reasons, I like to copy-and-paste the web display to Microsoft Excel. It’s much easier to work with the data. You can do the same with Google Sheets or Open Office Calc.
Working with a “real” spreadsheet
Copying the data to your favorite spreadsheet application is very simple.
- Highlight and select the entire page with ctrl-a (control key and the a key on your keyboard).
- Open a new spreadsheet and paste the results
- Get rid of the four header lines so that the project categories are the top row
- Freeze the header row
The next step is to filter on the top few populations.
- Add a filter to the header row
- Use the filter on the first column to deselect all the populations
- Select the top few populations from your “Single Population Sharing” display
Once you’ve narrowed down your spreadsheet to three or four rows, it’s much easier to see the patterns distributed horizontally across the reference populations.
You can also hide or delete the columns where all the percentages are low.
Here’s the outcome with the example results I borrowed from the forum:
Remember, this is nothing to do with your ethnicity breakdown. These are the percentages for each reference population against the same categories that your breakdown was measured against.
The reason why these populations are at the top of your list is that the distribution is most similar to yours.
So, what does it all mean? Well, first you have to identify which ethnicity, community or region is represented by the three populations. A google search of “Argyll” tells you that the current geographical meaning is an area in western Scotland. And “German” seems obvious.
But if you’re really interested, you should take a closer look at the sources.
Researching External Population Sources
You’ll look in vain for information on the GEDmatch site.
This is where having the source is useful. The source isn’t in the Oracle spreadsheet, but you’ll see it in your Single Population Sharing results.
Again, this is why I like to have multiple browser pages open! So, German is displayed there as “German (Dodecad)”, CEU as “CEU (HapMap)”, and Argyll as “Argyll (1000 Genomes)”.
A Google search for “CEU HapMap” throws up links to the original source project. You may have to look at a few web pages. But in this case, the CEU group are DNA samples from Utah residents with Northern and Western European ancestry.
Utah residents?? Remember, this was an academic project that collected volunteer samples. I haven’t read their methodology , but I expect (hope?) that they sought genealogical evidence to at least grandparent generation.
Researching Project Population Sources
The first group, “German”, names Dodecad as the source. This is the same name as the project, which means that the samples were collected by the project creator.
So, does “German” correspond to current nation boundaries, or is it referring to a wider Germanic region or peoples? Unfortunately, it’s not as easy as it should be to find information about the sources. You may have to do some creative searching.
If you’re really interested in following up on your results, you should have a wander through the project website. The link to the project website is at the top of the Oracle results page.
Some of these projects are hosted on the aging Blogspot website. There is a search box on the Blogspot that doesn’t seem 100% reliable.
So, you can also run a site-specific Google search. Type “site:[website URL] [search terms]” into a google search bar.
As I noodled around the Dodecad website for information, I came to the conclusion that the project has 11 samples grouped as the “German” reference population. These samples were submitted by individuals who self-reported that their known direct line was German. (I may be wrong, but I wasn’t interested enough to spend hours looking – remember, these were results I borrowed from a stranger).
Biogeography and Anthropology
So, once you’ve figured out what the reference populations represent – what does this mean for you?
It may not mean much if you don’t already know your genealogy back a few generations. However, let’s say our example kit’s great-grandparents are from Northern Europe (I’m making this up, I don’t know the example’s pedigree).
The top three reference populations have quite a similar category breakdown. High numbers for West_European and East_European is probably not a shock. But the fact that West is nearly five times the percentage of East may be of interest.
Just remember that we’re not talking recent generations here. This could represent migrations across Europe from distant eras.
And are you eyeing the Mediterranean percentages with surprise? And what about West Asian?
If you want to delve further, you have a journey into varied and conflicting theories of population migration since the origin of the human species. Good luck!
Mixed Mode Population Sharing
The second part of the Oracle results shows the Mixed Mode Population Sharing.
The opening section of this article explained how the Mixed Mode compares your admixture to every possible pair combination.
This may work well for you if your parents are from different ethnicities or communities. But the usefulness will also depend on how diverse your parents’ admixture might be.
If you think you are firmly of dual ethnicity, then it’s worth taking a good look at the results here. I’m using Dodecad as an example, but you may find other projects align better.
For this section, I’ll use my own Oracle results. My eight maternal great-great-grandparents are from one county in Ireland. My father is East African. These are the top five rows of my mixed-mode results:
Although I’m not showing the full list of top twenty pairings, I can tell you that each combination splits clearly between the two continents of my heritage. There is no pairing of two African groups or two European groups. Oracle will have calculated those pairings, but they must be at the bottom of the possible list.
The Hema population dominates the primary side. My research of the source told me that these were broadly East African.
It’s a plausible result. But I mustn’t conclude that half my heritage is of the Hema People (now mostly Congolese). Our connection could stretch back to common ancestors that precede this community.
The top list of secondary populations seem to fall within a wide European area. Yet, there’s not an Irish-labeled sample in the full top twenty. Does this invalidate the entire results in my eyes? Maybe not. Argyll is western Scotland, and I’ve long suspected an ancestry that I haven’t yet shown genealogically.
But I’m reaching to fit speculative theories to these speculative calculations. I also know (from reading the project website) that the project creator considers other calculators more appropriate for m heritage.
It’s easy to see why many sceptics view these admixture calculators as genetic astrology.
Oracle Versus The Mainstream Ethnicity Estimates
By mainstream, I’m referring to Ancestry, MyHeritage, 23andMe, FamilyTreeDNA, and LivingDNA. Each company provides ethnicity reports based on your DNA results.
In my article on how to interpret Ancestry ethnicity results, I described how Ancestry pinpoints my Irish heritage to a specific county.
I also reviewed MyHeritage “Genetic Groups“. They hit the bullseye on the same county. This matches my family tree back to the early 19th century.
In contrast, the Oracle paints broad brush strokes that are difficult (if not impossible) to verify independently. That’s because the scope is aimed at historic eras that modern genealogy cannot reach. Many of the admixture projects include hypothetical paleolithic and neolithic population references.
So, if you’re interested in genealogy and your family tree – stick to the mainstream ethnicity reports. They are getting better and more focused as the reference samples increase and the DNA analysis techniques improve.
Another problem is that the GEDmatch admixture projects are stuck in time. They date to between 2012 and about 2015, and it’s impossible to predict whether Verogen (the company that owns GEDmatch) will seek improvements.
However, if you’re interested in anthropology and population migration – the admixture projects are fascinating. Just bear in mind that the different projects provide varied and conflicting results. Have fun, but be cautious about drawing firm conclusions (or any conclusions at all).
This article used one of the GEDmatch projects to illustrate the concepts behind the Oracle.
We have a full article on using the GEDmatch Dodecad Project, which goes into detail on the background of each calculator. It should help you decide which (if any) of the Dodecad calculators are a good fit for your heritage.
We also have an article on the GEDmatch Eurogenes project, which has quite a lot of calculators. We go into most detail on the two recommended calculators.