The GEDmatch HarappaWorld Project (Explained For Beginners)

HarappaWorld is the only admixture project on Gedmatch that has just one calculator. That’s an indicator as to how focused this project is. HarappaWorld was designed by its creator to explore South Asian heritage.

The project includes ancient samples alongside DNA results from project volunteers and academic studies. Bear in mind that all the GEDmatch admixture projects explore older eras of human history.

Origins Of The HarappaWorld Project

HarappaWorld was created by Zack Ajmal, an American of Pakistani descent. Zack based his calculator on the software and methodology of the Dodecad project from Dienekes. He was also impressed with how the Eurogenes project had branched in its own direction.

As well as creating his own project, Zack helped the GEDmatch team to make online versions of the original desktop admixture software. You’ll see the acknowledgment at the top of every Oracle display.

Back in 2011, Zack felt that the academic genome projects were focused on European ancestry. He launched a website asking for volunteer DNA results from people of South Asian or neighboring countries. He named his project after an archaeological site in Punjab: Harappa.

These are the countries he targeted:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • Burma
  • India
  • Iran
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka
  • Tibet

Work on the GEDmatch HarappaWorld project dates back to 2012 to 2014. As this was a hobby for the project creator, he moved on to other things.

Bear in mind that more recent discoveries of archaic DNA are not reflected in HarappaWorld. Also, DNA testing was fairly sparse amongst people representing these regions. If you’re of Eurasian heritage, you should also take a look at the Gedrosia project.

Is HarappaWorld Useful For Other World Regions?

Although projects like Dodecad and MDLP started with a small number of targeted regions, they expanded their scope with further calculators. Some of these “wider” calculators were not well received.

HarappaWorld never tried to expand its focus. However, the project creator decided to accept all volunteered DNA samples, regardless of the person’s heritage.

There were some grumbling comments on his blog about this. Zack Ajnal replied that he ran admixture calculations for these volunteers, but he didn’t necessarily include their samples in the project.

Although some of the populations shown by the calculator are clearly outside the South Asian region, you can be a little skeptical if they show up for you. My West African/East African percentages aren’t at all realistic. But my heritage wasn’t meant for this calculator.

Understanding The HarappaWorld Populations

There are sixteen reference populations within the HarappaWorld calculator. Your admixture results show percentages across these population names. Here’s the list:

AmericanBalochBeringianCaucasian
E-AfricanMediterraneanNE-EuroSE-Asian
NE-AsianPapuanPygmySan
SiberianS-IndianSW-AsianW-African

It’s important to remember that these names aren’t meant to show where your ancestors are from. They are “nicknames” for the clusters of DNA samples that make up each category.

So, what does NE-Euro mean? And how about NE Asian? This is when you hit the “Spreadsheet” button above the admixture display.

HarappaWorld Spreadsheet

This will show you the groups of DNA samples that make up each of these populations. We have a separate article that walks you step by step through using the Oracle spreadsheet.

Although the tutorial uses the Dodecad project as an example, the steps apply to all the spreadsheets. I’ll walk through an HarappaWorld example here.

Suppose I want to understand more about the SE Asian population. It’s the 5th population listed across the top row of the HarappaWorld spreadsheet. You then run your eye down the groups on the left, looking for the higher percentages.

Our tutorial shows you how to copy it to a proper spreadsheet, like Excel, which makes it much easier to review. When I get the reference data into Excel, I can put a filter on the column to show groups (rows) with a percentage of above 70.

The shortlist includes cambodian, dai, and iban as the higher percentages.

It’s important to remember that the spreadsheet is pure reference data and has nothing to do with your DNA. Each row of the spreadsheet shows you the admixture breakdown of the DNA cluster across the 16 populations.

You can now compare how closely your own admixture breakdown aligns with the cambodian, dai, and iban. If they look close across the board, this becomes an interesting insight.

But keep in mind that we’re dealing with older eras of human history.

Interpreting Your HarappaWorld Oracle Results

The next step is to use the Oracle utilities. The HarappaWorld projects have both the Oracle and Oracle-4 utilities.

These may be useful when your grandparents are from different regions. The statistical display can be confusing when you first encounter it. Our article on the GEDmatch Oracle has a step-by-step walkthrough.

It uses the Dodecad project, but the same principles apply to all projects.

Reviewing Other Peoples’ HarappaWorld Results

It can be useful to see other people’s results that are similar to your own, particularly when they describe their heritage. Here’s a lengthy thread on the Anthrogenica forum in which people post their HarappaWorld results.

Unfortunately, plenty of posters give their results without describing their known heritage. But if you scan quickly through the pages, some people go into detail about their grandparents’ origins.

Alternative Calculators For South Asian Heritage

We have a roundup article on which caculators are best for which heritage.

You can check out the section on the best GEDmatch calculator for South Asian ancestry.

HarappaWorld And The Calculator Effect

Zack Ajmal said in his initial blog post in 2011 that his work was inspired by the Dodecad and Eurogenes projects. The two men behind these prior projects had a major dispute the following year.

Each claimed that the other was using flawed methodology.

Davidski of Eurogenes wrote about what he called the Calculator Effect. This flaw implies that the calculators are only useful for the people who volunteered their samples to the project.

Davidski amended his methodology and said that this fixed the issue in his project. He also pointed out that the Dodecad project, and any other project based on it, retained the flaw.

Dienekes of Dodecad rejected these arguments. He claimed that the Dodecad methods were correct and that Eurogenes would now be flawed. As would any other calculator based on Eurogene’s methods.

This escalated into a flame war. Here’s Davidski’s original argument, and Dienekes’ first rebuttal. With each calling the other an imposter, only double-spiderman can explain it all.

spiderman pointing at spiderman

I don’t actually know whether HarappaWorld followed Dodecad’s methods or that of Eurogenes. My main point is that you shouldn’t take the results too seriously.

Alternative GEDmatch Admixture Projects

Check out our articles on each of the other projects:

More Recent Ethnicity

The GEDmatch admixture projects are aimed at older eras of human history. You should look elsewhere if you’re researching your ethnic heritage within the last few centuries.

If you’ve DNA tested with Ancestry, you can check out our article on reading Ancestry’s ethnicity breakdown. Their features include assigning you to geographic regions based on your DNA matches on the site. MyHeritage have a similar feature called genetic groups.

More Articles And Tutorials?

.
Margaret O'Brien
Latest posts by Margaret O'Brien (see all)

6 thoughts on “The GEDmatch HarappaWorld Project (Explained For Beginners)”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.