If you’ve copied your family tree between different genealogy sites and software, then you’ve probably dealt with a GEDCOM file.
Have you ever wondered how the GEDCOM format came about? This article delves back into history to it origins.
I also take a quick look at why the genealogical data standard seems stuck in time.
If you prefer a video, here you go. Otherwise, scroll down and keep reading!
1979 – FamilySearch’s Ancestral File
In 1966, the Church of Latter-day Saints introduced a four-generation family group sheet for members to send pedigree trees to the Church headquarters.
Here’s an example form – it’s a typical pedigree sheet.
Over the next decade, there was growing awareness about quality issues. Different family sheets understandably contained the same ancestors but with different dates or spelling variations.
The Church sent out a new call for submissions in 1979.
This time, the emphasis was on increased accuracy. Families were asked to collaborate before sending in well-researched sheets to HQ.
The Church’s genealogy department built processing software that merged the information across multiple sheets.
The goal was to produce a single “accurate” individual and “correct” set of relationships.
Growth of Ancestral File
These were held in a new database called Ancestral File.
By 1990, there were 7 million individuals in the Ancestral File. That grew to an impressive 35.6 million by 1999.
Problems with Ancestral File
It’s fair to say that there was undue confidence in the data quality of submissions and the ability of the early computerized programs to deal with messy incoming data.
Problems were compounded by a decision not to include notes and sources in the Ancestral File.
Where is it now?
You can still search and view the entries in the Ancestral File. Here’s the link on FamilySearch.org if you’re curious.
You’ve probably realized that the contents come with a health warning.
FamilySearch no longer allows the submission of corrections to the records. It should really be treated as a museum piece.
So why did I start with it? Because the database brought about GEDCOM.
Early 1985: GEDCOM 1.0
The genealogy department of the LDS struggled to keep up with the rate of paper submissions.
They had several data-entry programs to get information into the Ancestral File database. But paper forms meant lots of typing.
The LDS wasn’t the only show in town when it came to genealogy software in the 1980s.
There was already plenty of discussion around a common data format that allowed the exchange of family tree information between different systems.
But the LDS was primarily interested in solving the problem for the Ancestral File database.
We can label their first short-lived format as GEDCOM 1.0.
One of the main differences between this early version and the next was the use of two-character tags instead of four.
Here’s an example I mocked up for Teddy Roosevelt.
This is based on a far more extensive example provided by Philip Brown who wrote perhaps the only program that used the format.
1986/87: PAF 2.0 (Personal Ancestral File), GEDCOM 2.0 and 3.0
PAF was a program developed by the LDS to get family tree data into the Ancestral File. LDS members could get the program for free. It arrived at their homes on multiple floppy disks.
The members keyed in names, dates, notes, and sources into the program.
The first version was released in 1984 but the data it exported wasn’t in the GEDCOM format. Instead, it printed out a report that could be posted to the LDS headquarters.
But this didn’t really solve the problem of manual entry into the Ancestral File.
The breakthrough was with the second version in 1987. It used a file format that we can call GEDCOM 2.0. This version had the four-character tags that are still in use.
PAF users could export a GEDCOM file and send it on a floppy disk to headquarters.
With the standardized format and digital form, the genealogy department had a lot less work in importing these files into Ancestral File.
Unfortunately, notes and source information were stripped from the data. The Ancestral File was always going to be a relative failure (excuse the pun).
But the key here was the GEDCOM 2.0 format. This was also adopted by other programs.
That includes the popular FHS (Family History System) developed by Philip Brown. He kept his software up-to-date as the LDS continued to evolve the GEDCOM format.
For example, GEDCOM 3.0 arrived about a year later in 1987. It improved the way multiple marriages were recorded and linked in the file.
I’m going to skip over GEDCOM 4.0 and some minor releases.
The next notable release was 5.5.
1995/96: GEDCOM 5.5
There were plenty of incremental draft versions of the format between 1987 and 1995.
The mid-nineties saw the widespread acceptance of release 5.5.
Somewhere in there, Unicode was introduced to the format. This opened the way to non-Western languages (e.g. Chinese). However, the takeup by software vendors was limited.
The format extended the address tags to include CITY and multiple address pieces (ADR1 and ADR2).
The 5.5 format also included different date tags for concepts like “between” (BET) and “about” (ABT).
It’s not that some genealogy programs weren’t using these tags. But 5.5 ensured that this was now standard.
1999: GEDCOM 5.5.1 Draft – Still The Latest Standard
This article is about the early history of GEDCOM. Going up to 1999 counts as early history.
This is when the 5.5.1 draft was released.
One major change in 5.5.1 was adding UTF-8. This is the fullest implementation of Unicode. Again, this greatly expands the languages that can be supported.
You may be surprised to hear that 5.5.1 is still the accepted format across most genealogical software applications.
So, was the genealogy community completely satisfied at this point over twenty years ago? Were all data exchange problems solved with 5.5.1?
Not at all. There have been continual discussions, debates, and proposals by genealogists to try to extend the format.
A reluctant custodian
It’s fair to say that the LDS hasn’t been particularly interested in pushing GEDCOM forward.
At times, they’ve seemed more inclined to ditch the format in favor of a different direction. For example, they were running with something called GEDCOM X in the early 2010s.
Despite the name, “GEDCOM X” wasn’t a twice-as-good version of GEDCOM 5. It was a different format altogether.
But other software providers didn’t want to adopt their new system.
And FamilySearch.org doesn’t want to be cut off from the millions of family trees started in Ancestry.com, MyHeritage, Geni, and other popular sites.
1999 is earlier than halfway between my (somewhat arbitrary) starting point of 1985 and now. But we seem stuck at this point in GEDCOM history.
Here’s an infographic that summarizes the history:
GEDCOM To Spreadsheet Conversion!
Would you like to convert your GEDCOM to a spreadsheet with every person in your family tree?
Check out our conversion service for the full range of features at a low price.