This article examines how accurate are Ancestry hints by taking a detailed look at how they work. As well as understanding how Ancestry generates hints, we describe how the system can go wrong.
Many Ancesty customers have a love-hate relationship with the green leaf hints, which can provide invaluable insight or be wildly inaccurate. We’ll explain the scenarios in which they’re more likely to lead you astray.
What Do Hints Mean On Ancestry?
As you create and update entries in your family tree, Ancestry starts looking for potential matching data in its databases. You’ll see hints to this data in three places:
- The leaf icon on a tree entry
- A list of all hints from the Tree menu
- Potential parents of your tree entries inserted as glowing icons
As you can see from the middle graphic, there are four types of hints. You get links to records, images, stories, and to entries in other public trees that Ancestry thinks may be connected to the person in your tree.
Where Do Ancestry Hints Come From?
Notice that often when you save a new entry in your tree, a hint appears immediately? This system is working in real-time, which is an impressive technical feat.
Ancestry has over thirty-two thousand record collections at the time of writing, containing twenty-four billion records. How can it possibly find instant matches across all that data?
The answer is that it doesn’t. Hints are restricted to a specific subset of record collections. But which ones?
The Top 10 Percent
Ancestry pulls hints from the top 10% of its record collections (also referred to as databases). This figure of 10% keeps being quoted in discussion forums, but is there a recent verifiable source? I’ll go with the long-standing Ancestry genealogist, Crista Cowan, who tweeted the figure in 2019:
So, the next question is: which databases? Ancestry doesn’t list exactly which record collections are covered by hints. We can guess from experience that the collections cover census records, birth, marriage, and death collections, and of course public family trees.
If you’re a numbers person, then you may be thinking that 10% of 24 billion records isn’t a great slice of the pie. But we’re not talking records here, we’re talking 10% of record collections or databases. Cowan says that these top collections account for “well over two thirds of our records.”
That does still leave a substantial slice of pie in the box and unavailable. Be sure to include active searching as a big part of your tree-building strategy.
How Do Ancestry Hints Work?
It’s important to understand that Tree Hints do not result from a computer performing the same kind of searches that you make using the Ancestry Search forms.
Tree Hints are generated from the records, photos and stories that are already attached to other public family trees. Ancestry compares the persons in your tree to entries across its public tree collection with attached sources. If it thinks that another tree has the same person as in yours, it may offer links to the associated records, photos, and stories in that tree.
A senior Ancestry executive put it like this in an interview in 2014: “Hopefully, there is someone out there from one side of your family that already has a lot of information. It is sort of a crowd-sourced model.”
This crowd-sourcing aspect does mean that the quality of hints will vary. If several tree owners enter the same inaccurate facts and sources for your relative, the generated hints will simply be wrong. I’ll address the level of accuracy a little later.
Another aspect to be aware of is that the associated sources must be from within Ancestry collections. Someone may have a very well-sourced tree, in which the citation documents were loaded manually from their personal document storage. Unless they also link to Ancestry records, these tree entries won’t to be served up as hints.
Once again, I emphasize the importance of including manual searches when building your tree.
Here Comes The Data Science!
Up to now, I’ve described the hints processing in simple terms: “If it thinks that another tree has the same person as in yours, it may offer links…” You may also see the process described somewhat pejoratively, as with these observations in Ancestry’s own genealogy forum:
[the] algorithm is simplistic, and largely depends on a record being linked to a profile in a public tree.Comment from Ancestry’s forum
A hint is simply a notification that the computer has found…someone with vaguely the same name, born around vaguely the same time, and living in vaguely the same area as someone in your tree.Comment from Ancestry’s forum
It’s certainly not simplistic, although it isn’t always right. Ancestry have registered a number of patents around the data science and technology underpinning the Hints system. I’ll try and give the 101 on a recent patent, to bring some insight into what the system tries to achieve.
The process starts by compiling a list of public trees that contain an entry similar to a person in your own tree. The list accommodates variations in names and dates.
The process next runs a set of dual comparisons between your tree and each of the trees in the list. Points are awarded for how many types of similarities exist.
Let’s take an entry of John Swyfte in Tree A (yours) and Jonathan Swift in Tree B (some random user).
Tree B gets a less than perfect score in the first and last name categories. But these are known name variants, so it’s still scoring fairly highly.
Extra points are awarded based on statistics for name uniqueness. So, John/Jonathan get meagre points for occurring so frequently in western censuses, but Swyfte/Swift will do better than a Smythe/Smith comparison.
Dates and locations get their own scores. Dates in particular are given a wide leeway in comparisons.
Historical migration patterns are also considered, when comparing locations. This might explain an avalanche of U.S. city directory hints I received for some of my Irish ancestors who hadn’t set foot outside their native shores.
If the process stopped right there, then it could be described as simplistic. But the next steps were developed through using machine learning to mimic how an expert genealogist would go about answering the same question: how likely is it that these two tree entries are the same person?
For some comparisons, the evaluation is widened to include spouses, parents, and children. So, the mothers of John Swyfte and Jonathan Swift are compared, in terms of name, location, birth and death dates. Each of these maternal comparisons are also allocated a similarity score. Add in other family members, and the patent mentions that up to 400 features may be assessed across the two trees.
Finally, the patent refers to how the system incorporates feedback into the scores. That’s feedback from you and other Ancestry users. You know the drill: yes, no, maybe.
Suppose the tree owner of Tree B with Jonathan Swift had jumped down the wrong rabbit hole and attached erroneous birth/marriage/death records for Jack Swift to his entry? If ten irritated Ancestry users hit “No” to reject these record hints, the system should add some negative scores to Tree B.
At this point, you’re probably wondering: how could such a sophisticated system possibly go wrong?
How Accurate are Ancestry Hints?
Ancestry doesn’t publish statistics for the “Yes”, “No”, “Maybe” feedback, so it’s difficult to assess the general accuracy of hints.
A thread on Quora poses this question, and the answers range from “overwhelmingly inaccurate” to “wrong at least as often as they are correct” to “80% correct” to “extremely reliable”. In other words, mileage varies.
Experienced American genealogists tend to report positive levels of accuracy, while emphasising the need for careful evaluation of each hint. Randy Seaver said his hints were 80-90% correct. And more recently, Roberta Estes wrote about her efforts to get more hints in her tree.
I’m certainly not hitting 80% levels, but I suspect that’s because many of my sources for Irish relatives are uploads I’ve acquired from 3rd party archives. Ancestry is basically blind to these, and they are not nudging the scoring algorithms towards greater accuracy.
My conclusion is that the more you research and add Ancestry sources to your tree, the higher the quality of hints you’ll receive.
Ancestry may fall victim to the negativity effect, where bad experiences outweigh successful encounters. I’ll put my hand up here. A record hint popped up on the profile of a cousin of mine who was born in the 1970s. The record was for a man of the same name who died in Flanders in the First World War. I suppose it’s the context of death that I found so jarring (my relative is alive and well). Otherwise I mightn’t have griped about it to whoever would listen!
So, there’s no doubt that some percentage of hints served up to you will be inaccurate. But why?
Why Are Some Ancestry Hints Inaccurate?
Let’s run through a few reasons, so we can at least understand what’s going on.
Multiple coincidences score too highly
Take the example of my cousin. Not only did the names and birth places match the erroneous record, the name of the young casualty’s father was also a match. Remember, Ancestry is totting up points for each similarity and this record is doing well on both the person and other family members.
But the birth date was 80 years different, which should have been a huge negative drag on the overall score. I’ll speculate that Ancestry puts a low weighting on the reliability of dates entered by its users, which is how this hint got through. But there’s no denying that a gap that wide in the dates should have dropped this record down the list. I consider that a bug.
Bugs and glitches
The Hints system is a complex mix of rule-based algorithms and machine learning. My experience is that sometimes hints drop markedly in quality, and then at some point the bad ones go away.
I put that down to the introduction of a bug in the software, which is subsequently fixed.
Not enough feedback
If every tree owner reviewed every hint and provided yes/no/maybe feedback, then the software and models would increasingly self-correct towards a much higher level of accuracy. That’s the essence of machine learning.
So, it’s our own fault? Definitely not.
The alternative option is that the system owner employs as many real-life genealogists as it takes for frequent assessment of large samples of hints. They do the yes/no/maybe grunt work that we don’t want to. There certainly are teams of genealogists involved in this system, but I suspect not enough to reach the level of accuracy that us tree owners would prefer.
The patent for Tree Hints describes evaluating up to 50 features when comparing the family members of the target person to the hint record. That’s splendid, but you may have noticed hints for census records that should have been kicked out quickly. There may be four children with similar names, but another four on the record have no bearing to your tree.
The 2019 patent clearly states that the number of evaluated children is limited to three. I understand that for near-instant hints, the process has to stop somewhere. But it’s worth knowing that even when there is plenty of additional information that can be checked, there are limitations to the lengths Ancestry will go to. These limits ensure that not enough records are rejected for conflicting information.
Records with minimal information
Which brings us to the many types of records that are so limited that wider evaluation is impossible.
I was looking for an example with limited information, and I chanced upon a hint for one of the many priests in my tree: a Charles Collins. I thought how interesting it would be if this hint were correct:
The match is on name and approximate age. Without additional details like birthplace and parental names, I couldn’t possibly accept this into my tree. It’s important always to check the associated image, but I could only glean that that the sentence was 10 days hard labour.
So why am I getting such a limited record? Someone else has a tree entry for a Charles Collin with this record, and it’s likely that our two trees have similar parental names for Charles.
Along with court records, city directories are basically name and current residence. The U.S. school year books are name, and the address of the school!
And, of course, there are those eras in which fathers were recorded on marriage and death records, but not the mothers.
The point is that often your tree entry has far more information than in the record. The algorithms don’t have 50 features to compare, they have about three. Ancestry has two choices: include these records as hints or throw them out as a waste of time. Ancestry chooses to include them.
Assessing useful collections
I’ll finish here with a tip for those less-than-useful records with minimal information.
If you’ve had a run of bad luck of reviewing garbage hints to sparse records, it’s tempting to throw up your hands at the whole system. The trick is to keep notes on which collections are less useful than others.
But remember that the amount of recorded information on marriage and death records may differ between jurisdictions within the same country. Parents may be listed in one state and not another. Take note of the actual collection that was disappointing.
Each hint shows the record collection on the top left corner. Here’s that prison record hint for my relative. The collection is “Gloucestershire, England, Prison Records, 1728-1914.“
Now that I know that this particular collection lacks details of parents and birthplace, I’ll probably skip over its hints in future.
But do make allowances for changes in recording over time. At some point in the mid-20th century, the mothers of spouses start appearing on Irish marriage records. A while later, female spouses get space for their occupation.
This is the first of three in-depth articles on Ancestry tree hints. The others are:
Looking for a full guide to building your Ancestry tree?
Check out our e-book on building your family tree with Ancestry.com. It’s available on Amazon now! Content includes:
- Setting up your DNA-linked tree
- Using your tree to find connections with DNA matches
- Best practices for entering names, dates, and locations
- Strategies for getting the most benefit from Hints
- Tips for using powerful Search features
If you would like to watch some short video tutorials that walk through using Ancestry features step-by-step, browse through the DataMiningDNA YouTube channel.