A word cloud is a visual representation of text. The size of each word in the cloud is proportional to its frequency.
When we use word clouds to represent our family tree, we are usually making one from either:
- the surnames in the family tree
- the places in the family tree
In this tutorial, I show two ways to make a word cloud based on last names in your family tree.
The more times a name appears in the text, the larger it will appear in the word cloud.
Here is one of mine:
Haven’t Word Clouds Gone Out Of Fashion?
Word clouds were very popular for a while. You’d often see them used in newspaper articles to back up whatever point the journalist was trying to make.
However, their popularity has decreased in recent years as people have become more aware of their limitations.
In 2011, a New York Times journalist wrote an eye-catching article on how word clouds could be harmful. Here is just one quote:
“Every time I see a word cloud presented as insight, I die a little inside.”Jacob Harris
The problem he and others have addressed is that word clouds don’t give context.
So, using them to analyze a presidential speech or a complex piece of writing can be misleading.
Word Clouds And Family Trees
But what about names in family trees?
Well, that’s different. We’re not looking for context or some deep meaning. We just want to know what names appear most in the tree.
And word clouds are great for quickly identifying the most common terms in a large data set. But that’s not all they’re good for.
Three Things To Look For In Your Family Tree World Cloud
Here are the main insights that word clouds can give:
- mistaken variants you introduced when making your tree
- unexpected frequent names
- “missing” names that should be frequent
Variants and misspellings
I’m not talking here about genuine variants of names. There are two very similar names in my word cloud: Gorman and O’Gorman.
Different families genuinely chose these different variants of the same name. It’s not a result of me mistyping some members into my tree.
But as I’ve been reviewing word clouds for clients of our conversion tool (see the details here), I notice that bigger trees often have these crop up as separate names:
- GORMAN, Gorman
- O’BRIEN, O’Brien
The problem is that some individuals have their surnames completely capitalized and some don’t.
I think this comes from an old genealogy convention that pre-dates software and online trees. It was customary to have surnames in all caps to make them easy to read on the printed page.
But that is considered unnecessary with modern online practices.
So, folk who have been working on their tree for years end up with a mix of both conventions.
When I put together our online GEDCOM->spreadsheet conversion tool, I had to decide whether to “correct” the capitalized names and save them in the spreadsheets in the proper case.
But I decided that it would be more useful for people to see what was in the original GEDCOM file.
If the two biggest names in the word cloud are “GORMAN” and “Gorman”, then maybe it’s time to clean up the tree.
Unexpected frequent names
When I made my first word clouds, I scratched my head at some names I didn’t recognize but who were beating out the others for size.
This can be a pointer to families that have been entirely duplicated in your tree.
This can be a consequence of being too accepting of hints in online sites like Ancestry or MyHeritage.
Be wary in particular of census hints in Ancestry and family hints in MyHeritage.
If you aren’t careful about what you accept, those sites can “helpfully” insert far more people into your tree than you intended.
In my case, I was surprised to see “____” appear in the word cloud.
Then I remembered that I used to follow a convention of using four underlines when I was missing a surname. If you’re like me, you have married women in your tree whose maiden names you haven’t tracked down.
But after a year of research, I decided that an empty space was just as valid. The word cloud shows me I’ve been inconsistent with this. That’s something I’ll go back and clean up at some time.
Names that are too small
If there’s a name in your direct line that is displaying as a tiny speck in the word cloud, then you should maybe take a look at that section of your tree.
Did you fall down a rabbit hole of researching one member of an ancestral pair and forget to do the other half?
Time to roll up your sleeves and start digging!
Now that I’ve looked at the insights you can get from family tree word clouds, it’s time to show you how to make one.
Making A Surname List
Before you can generate a word cloud, you need to extract a list of every person in your tree. You don’t even need their first names – just the surnames are fine.
Unfortunately, the online sites like Ancestry, MyHeritage, or FamilySearch don’t give you that kind of list.
FamilyTreeMaker and RootsMagic are software programs that let you export a person list.
I don’t think that the free version of RootsMagic provides this feature. But you can do it with the free Family Tree Builder from MyHeritage.
You can also do it for free using the free Python script I provided in a separate tutorial. This assumes no knowledge of programming or Python.
If you follow the guide, you’ll end up with a spreadsheet of everybody in your tree.
Now that you’ve got a list or spreadsheet, I’m going to show you two ways to make a word cloud.
The first method continues on from the tutorial that used Python to convert your GEDCOM to a spreadsheet.
The second method uses a free online website that takes any list of words and generates a word cloud.
The third method uses our done-for-you service.
Method 1: A Python Script To Generate A Family Tree Word Cloud
When you followed our GEDCOM conversion tutorial, you set up what’s called a “notebook” on a free Google website known as Google Colab.
You ended up with a spreadsheet file in the samples folder of your notebook.
Here are the steps to use that spreadsheet to generate the word cloud.
Step 1: Prepare your notebook
If you deleted the notebook after working through the other tutorial, I’ll just assume that you downloaded your spreadsheet first.
Create a new notebook and upload the spreadsheet into the sample_data folder.
Step 2: add a new code cell to your notebook
Create a new cell.
Step 3: grab a copy of the Python script
You don’t have to provide an email or log in to get the script.
Just click on this link and grab it.
Step 4: copy-and-paste the script into the code cell
Step 5: edit the fourth line to be the name of your spreadsheet
The fourth line tells the Python script how to find your spreadsheet.
It’s probably not named “Family Tree.xlsx”. Change the name to what you see in the sample_data folder.
Step 6: optional change to skip very infrequent names
It’s common to have a lot of names with one occurrence.
This happens when we include female spouses of blood relatives without wanting to also include their parents and pedigree line.
If you find your word cloud very cluttered with tiny names, you can skip any surname that only has one occurrence.
To do so, scroll down to the section of the script marked “OPTIONAL – SKIP LOW FREQUENCY NAMES”.
The current line of code is this:
minimum_count = 0
To skip single-occurrence names, change it to this:
minimum_count = 1
If you have a massive tree, you can prune it even more. You can change the script to only include names with over four occurrences. The code should look like this:
minimum_count = 4
Step 7: run the script
Click the run button beside the code cell and wait for a few seconds.
You should see your word cloud appear beneath the cell. This is a small image for you to look at.
But the code has also saved a larger version of the image into the sample_data folder.
It is saved as the same name as the spreadsheet but with the .jpg extension.
Step 8: Download the word cloud
Highlight the image file in the folder and click on the ellipsis (three dots).
Choose “download” from the dropdown menu.
Method 2: Online Word Clouds Generator
There are several free websites that will generate a word cloud from a file of words.
The one I tried was wordclouds.com. You don’t need to create an account.
Step 1: Prepare the input file
The only tricky part is to prepare your list of names.
The Python script in the previous method assumes that the names are in the second column of a spreadsheet.
However, to get this word clouds generator to work properly, you should give it a single list of names.
If you have a spreadsheet with multiple columns and a heading, follow these steps:
- open it and remove all columns except for the last names
- delete the first row if it’s a heading.
- Save as a .txt file
Step 2: load the text file
Go to wordclouds.com and expand the “Word list” menu.
Choose “Extract words from Text”.
You may have thought that the “Import from csv” option was more suitable. But I found that it didn’t work with a single column of names.
Click the browse link and upload your .txt file.
Click the Apply button on the bottom right of this window.
Your world cloud should render in front of you.
Step 3: experiment with the options
You can add a background shape and set different fonts. I don’t bother with these.
But I do like to change the color scheme.
To do so, open the Colors tab and expand the Themes menu. This gives you a list of various color schemes.
There are lots of other options to explore.
The one thing I don’t think this tool can do is let you set a minimum number of occurrences of names.
If your cloud is too cluttered and you’d like this option, take a look at the previous method.
Or read on for the third method where we give you two versions of word clouds – one with no restrictions and one with only names of over one occurrence.
Method 3: Done-For-You Service
We offer a service that converts a GEDCOM file into a set of useful spreadsheets.
- a spreadsheet of all persons in your tree (names, birth, and death detail)
- word clouds showing surname frequency in your tree.
- a pedigree list of your direct ancestors in the GEDCOM.
- generated family trees formatted in spreadsheets that print on a single page.
You can get full details on the service here along with a demo video that shows every generated spreadsheet.