Do you want to convert a GEDCOM file into a spreadsheet with every person nicely formatted in rows?
This tutorial walks through how to do this for free. You’ll be working with a script or piece of code, but don’t worry if you’re not familiar with programming.
I’ll provide the script in a way that you only need to change one line to tell it where to find your GEDCOM file.
Then follow my instructions to run the amended script and watch as it creates a spreadsheet from your GEDCOM.
If it all seems too daunting, we offer a done-for-you service with lots of extra spreadsheet features.
You can open the spreadsheet in Microsoft Excel or import it into Google Sheets or Apple Numbers. I’ve tested it with all three.
I also expect it will import into the common open-source spreadsheet applications. There are no macros or internal code within the spreadsheet – it’s pure data in rows and columns!
YouTube Version Of This Tutorial
If you prefer a video walkthrough, here you go:
The video follows the same sequence as this article, although I provide a little more background information about Python when I’m talking.
You can use the two formats together as you follow the instructions.
What The Spreadsheet Looks Like
This picture has most of the columns (I used a GEDCOM of fictitious names).
I like to include the parents with each person. Of course, the parents get their own entry in the spreadsheet too.
I find that the extra info helps with my Irish heritage. My spreadsheet is full of very similar names born in the same year and in the same village!
If you’ve ever looked inside a GEDCOM file, you’ll know that each person is assigned a unique identifier that looks like a long number.
The spreadsheet also includes those IDs – but they’re just out of the picture above).
Python – But Really Simple
I often use Excel macros to work with data. But I couldn’t get a macro to cope with converting the GEDCOM data format to a spreadsheet.
Thankfully, I found that other people had done some of the heavy lifting – but not with macros.
Specifically, several Python developers have provided scripts to work with GEDCOM files.
If you don’t know what Python is – it’s a programming language.
But don’t worry if you’ve never encountered Python or have no background with programming.
I walk you through using a Google website to run the script that I give you.
That means you don’t need to install Python or anything else on your local machine.
Want to dive into the water?
If you want to know more on the Python package I’m using, this is the link to the documentation.
If you have some Python experience or would like to learn a bit, then I’ve got a separate tutorial for you.
The other tutorial achieves the same result as this one, but I walk you through getting a Python environment set up on your local machine.
I haven’t finished writing it! But check back here within a day or two. I’ll have a link to it very soon.
Step 1: Get The Python Script
If you’re on our newsletter list, you’ve probably received an email with a link to download the script.
If not (or you can’t find the email), just enter your email address in the form below. Be sure to confirm your email address and you will receive the link to download the script.
You can unsubscribe from our newsletter at any time.
But I will be providing new tutorials on doing fun stuff with your GEDCOM spreadsheet (like making a word cloud). The subscription will let you know when that one is ready.
Step 2: Have A GEDCOM File
You will need to upload your GEDCOM file to a Google-owned website.
I’m assuming that you have one on your local machine.
If your tree is on Ancestry or MyHeritage, here are links to tutorials to download it.
Step 3: Log Into Google Colab
Google Colab is a website and service from Google that lets you run Python scripts on the Google servers.
Follow this link to access the site.
It’s free. But you do need to have a free Google account to access the service.
If you have a Gmail account or use Google Sheets, then you already have one. If you don’t, the site will take you through the process of creating one when you try to sign in.
Click on the “Sign in” link (top right) to access Colab and use your Google account details.
The default Colab page may seem very busy. Ignore everything except the sign-in button at this point.
Step 4: Create A New Notebook
A pop-up screen will appear. Ignore everything and click on the blue “New notebook” link at the bottom right of the screen.
Think of a notebook as a project area for your work.
Step 5: Change The Name Of Your Project (Notebook)
By default, your new notebook is called “Untitled0.ipynb”. You’ll see that at the top left of the screen.
Let’s change that to something more helpful.
This step is optional!
But if you are going to tinker a little more with the script, and possibly create your own versions – then you should give this notebook a descriptive name.
Click on the title and edit it to something like “GedcomToSpreadsheet.ipynb”.
Be careful not to change the extension (the bit after the dot).
Step 6: Install A Gedcom Package
I mentioned that other people have provided Python code to help deal with GEDCOM files.
My companion video goes more into the background of working with packages of Python code, although I keep it at beginner level.
This article would get very long if I give a lot of context. Instead, I’m going to tell you exactly what to type into Google Colab.
Right now, you see a long white rectangular box on the screen. There is a “run” button to the left of the box.
Place your cursor into this box and copy or type this:
!pip install python-gedcom
Yes, this starts with an exclamation mark.
When you have finished typing, click the run button to the left of the box.
The button should change to a spinning icon as the website goes to work for you.
It may take a few seconds to finish, but you should see this:
If you don’t see the above success message, you have most likely got a spelling mistake.
By the way, the “box” is known as a “cell” in these notebooks. I’ll use that term from now on.
Step 7: Install A Spreadsheet Package
We now want a second code cell to type into.
There is a button below the top menu that gives us a new code cell. Click the button labeled “+ Code”.
You get another rectangle. Type or copy this text exactly:
!pip install xlsxwriter
Press the run button to the left of the cell and wait for the code to complete.
Your screen should look like this (check for success under the second cell).
Wait…is that Excel?
You’ve probably spotted the “xlsx” in what we typed.
Don’t worry if you don’t have Microsoft Excel. You don’t need it to get your spreadsheet into Google Sheets (or another spreadsheet application).
Step 8: Upload Your GEDCOM File To The Notebook
The notebook has a files area where we can upload our GEDCOM.
- Click on the files icon on the left of the screen.
It’s kinda small, but I’ve marked it with a big red arrow in this picture:
- Click on the “sample_data” folder
Google Colab is for educational purposes and they provide sample data files to play with.
You can ignore all these. We are going to upload our own.
When you hover your mouse over the “sample_data” line, you will see an ellipsis (three vertical dots) appear to the right.
- Click the ellipsis beside the “sample_data” folder.
- Choose “Upload” from the drop-down menu (it’s the first option).
- Click “OK” on the warning message.
The warning message is telling you that the file you upload will be eventually deleted. For privacy reasons, that’s a good thing.
You don’t get a message telling you that the file has uploaded successfully. Depending on the size of your GEDCOM, it should be fast.
There were six files in the sample_data folder. You should simply see a seventh appear (in alphabetical order). It will be the only file with the “.ged” extension.
Mine is in the picture below.
Step 9: Paste The Python Script Into A Code Box
At this point, you should have the Python script downloaded to somewhere you can access.
- Give yourself a new (third) code cell by clicking on the “+Code” button under the top menu.
- Open the Python script with notepad (on Windows) or another simple text editor.
- Copy the entire contents of the Python script (ctrl-A and ctrl-C will put the contents into your clipboard).
- Put your cursor into the third code cell and paste the script contents.
Google Colab will scroll you down near the bottom end of the script. I have made the last line to read “# LAST LINE”.
When you scroll up to the top, you should see “# FIRST LINE”.
I put in those lines just so that you know that you have the full contents of the script in the code cell.
Step 10: Change One Line Of The Script
I promised that you only had to change one line of code.
That is the sixth line of the file. The picture below has a big red arrow pointing to it:
This line tells Google Colab where to find your GEDCOM script.
Remember, you have already uploaded it to the sample_data folder.
File paths can be tricky so we are going to let Google Colab tell us the exact details to use.
If the file area is collapsed then click on the File icon again to show the files.
Click into the sample_data folder and hover your cursor over your GEDCOM file. You want to see the ellipsis (three dots) appear beside the file.
Click on the ellipsis and choose “Copy path” from the dropdown menu.
Now, replace what’s inside the quotation marks with what you’ve copied. Be sure not to remove the quotation marks.
My GEDCOM file is named “demo.ged” and the amendment looks like this:
input_file = ‘/content/sample_data/demo.ged’
Notice that the slashes are going forward. That’s just the way it is!
Step 11: Run The Script
Now for the moment of truth.
Click the run button for this cell (it’s to the left of the top line of the script).
Scroll down to the end and watch the output window.
You should see some names appearing.
If you see an error message, you’ve most likely made a mistake with copying the file path. Did you erase one of the quotation marks?
One drawback with Google Colab is that it’s not always clear when the script has finished processing.
Take a look at the very bottom of the screen. You should see a tiny green tick and some text saying the code has completed.
But the real proof is the appearance of the spreadsheet in the sample_data folder.
You should see an extra file with the extension “.xlsx” in the folder.
The spreadsheet has the same name as your GEDCOM, but with the xlsx extension. It should be listed right under your GEDCOM file.
Step 12: Download The Spreadsheet
- Hover over the spreadsheet in the sample_data folder and click the ellipsis.
- Choose the “Download” option.
- Use the pop-up window to choose where you save the file.
If you use Microsoft Excel, then you’re all set. Open your new spreadsheet and away you go.
You’ll notice that I haven’t put a header in the spreadsheet. Jump to step 14 and I’ll give you the column headings.
Step 13: Optional – Upload To Google Sheets
To upload a “.xlsx” file to Google Sheets, follow these steps:
- Create a blank spreadsheet
- Expand the File menu and choose “Import”.
- Use the Upload tab and find your spreadsheet.
- Click the “Import data” green button.
You’ll notice that I haven’t put a header in the spreadsheet. Go to the next step for the column headings.
Step 14: Optional – Add A Header
These are the column headings:
- First Name
- Last Name
- Birth Date
- Birth Place
- Death Date
- Death Place
- Father’s Name
- Mother’s Name
- GEDCOM ID for Person
- GEDCOM ID for Father
- GEDCOM ID for Mother
More Python Fun
If you want to run another easy Python script, we’ve got one that generates a word cloud from the spreadsheet you’ve just created.
Check out our tutorial on word clouds for family trees.
Our “Done-For-You” Service
We offer a service that converts your GEDCOM into a set of useful spreadsheets. Aside from the main list of persons, we also provide:
- word clouds showing surname frequency in your tree.
- a pedigree list of your direct ancestors in the GEDCOM.
- generated family trees formatted in spreadsheets that print on a single page.
You can get full details on the service here along with a demo video that shows every generated spreadsheet.
22 thoughts on “How To Convert A GEDCOM File To Spreadsheet (Script Provided)”
Thank you for your thorough instructions. Worked perfectly
great to hear!
Thank you for this. This was so easy to do. When I first read the instructions and listened to your you tube video, I thought to myself there is no way I’m going to do all that. It took less than 2 mins from start to finish. I would have liked the extras but I don’t use PayPal.
p.s. can you pretty please do a leads method one for Mac
It’s a relief to hear that people can follow it!
About the Mac – I don’t have Apple scripting knowledge so I wouldn’t do one for Numbers. But I have been tinkering with a version in Google Sheets which presumably Mac users could access. I’ll update the newsletter if I can get it working!
Thank you for the feedback about PayPal. I’ll probably change how I word that in the offer – because people can pay with all kinds of methods (e.g. credit/debit card) via PayPal. It’s just that using them means that my website doesn’t have to hold customer payment details, which is better for everybody!
Anyway, I’ll be showing how to get the word cloud in future, that should be even easier to run once you’ve used Colab once!
Thank you for your reply. Google sheets leads method would be fantastic xx
I am delighted to see an easy method to parse a GedCom file compared to the method I used to use. Thank you very much.
I tried to add the code to get marriage data but got an error ” NameError: name ‘marriage_date’ is not defined”
In the try section I added:
marriage_date, marriage_place, marriage_sources = element.get_marriage_data()
And the area showing the error is:
96 worksheet.write(person_count, 13, gender)
—> 98 worksheet.write(person_count, 14, marriage_date)
99 worksheet.write(person_count, 15, marriage_place)
It would seem I don’t have the correct syntax for marriage or it is not coded into the parser.
Possibly burial date and place suffers the same issue.
Any suggestions please
What other data does the parser handle?
Hi Eric, you can get the documentation here on PyPi.
If you search for “gedcom” on the PyPi site, you’ll see some alternatives. But I saw a blog post from someone else using the basics of this package, so I ran with it.
Marriages aren’t part of the INDI element, but you can retrieve them by passing it to the parser within the loop. This returns a list for me on a small test:
marriages = gedcom_parser.get_marriages(element)
In contrast, burial and census data are part of the element. So, the call would be like this:
burial_data = element.get_burial_data()
I realize that I explained more about the package and documentation in the video than in this article. I’m about to put an extra paragraph in the article so people can go look at the package (and to make clear that I didn’t write it!)
I’ve tried many variations to get the marriage code to work:
marriage_date, marriage_place = gedcom_parser.get_marriages(element)
Would you explain where it should go and what is the syntax for the date and place.
Not every individual is married and the marriage data is under the FRIN tag in the GedCom.
Thanks for your help
I added a marriage to a tiny sample tree in Ancestry.com and the exported GEDCOM section looks like below i.e. the details are under a MARR tag. I’ll send you an email tomorrow with the sample GEDCOM and the script with the extra lines.
2 DATE 1853
2 PLAC GA
Hi Eric – I sent an email to the address I see behind this comment (it starts with joseph).
Hi Margaret, Great work, thanks. It ran my Ancestry.ged file successfully, but I’m having no success with Myheritage.ged file. Yes, the file path is correct but it is generating a couple of decode errors. Unfortunately I have no experience to interpret these. Am happy for you to process if possible so I have uploaded it. More than happy to pay your very reasonable charge. – thanks.
Interesting about MyHeritage, I haven’t experienced a problem with their ged files yet.
I saw your file come in, I’ll run it through the processing and examine for any structural issues.
I expect I can fix issues. If I can’t, there’ll be no charge.
I’m not able to download your gedcom.py Any suggestions?
I sent a copy of the script to your email. Also I had a few questions in the email to try and trouble-shoot – thanks for alerting me to a problem.
Very interesting, plus excellent demo – thank you!
I was surprised just how quickly the new xlsx file was generated considering I have almost 6,000 entries.
I did at first think it very strange, that the initial results seem almost completely ‘Last Name’ alpha-sorted, but with very few exceptions. Then I realised it was sorted on the 1st ‘pointer’ column. I guess only MyHeritage would understand why only that few seem misplaced?
Thank you for the feedback about the timing. My own GEDCOM has under 1K entries so it’s great to hear results with a much larger file.
The full list is based on stepping through the GEDCOM file, line by line. So the order is based on how MyHeritage prepares and exports the file. So, you’re right – I can’t even speculate!
I have signed up for the newsletter, but I am not getting the python script.
Since there seems to be problems with the newsletter thingy, perhaps you could consider sharing it anyway and letting people sign up if they really want to? Just a thought… 🙂
Would it be possible for you to send me the script?
I sent that to your email address.
Great script, thank you.
Is it possible to parse Burial_Notes at all?
Yes, the python-gedcom package has a function called “get_burial_data”. I haven’t used it myself, but it sounds like what you want.
Is this reversable ie can the file produced be converted back to a gedcom
that at some future date could be uploaded to MyHeritage Findmypast Ancestry etc?
No, not this process. And I haven’t seen a tool out there that can go from spreadsheet to GEDCOM.