Centurial and the GEDCOM

Saturday 5 January 2019 by Fouke Boss

A major new feature in versions 1.7 and 1.8 of Centurial is the support for GEDCOM. But the specification for GEDCOM is over 20 years old and is not very evidence-based. This blog documents the design principles for the implementation of GEDCOM support in Centurial.

The GEDCOM format

The first version of GEDCOM was introduced in 1984, and, in some miraculous way, developed into the industry standard for exchanging genealogical information between software applications. The most recent version of the GEDCOM specification, v5.5.1, celebrates its 20th birthday later in 2019. In its basic form, the GEDCOM format is quite simple. As an example, this short snippet is how a person is defined in GEDCOM:

0 @I1@ INDI
1 NAME Vincent Willem /Van Gogh/
2 SURN Van Gogh
2 GIVN Vincent Willem
1 SEX M
1 BIRT
2 DATE 30 MAR 1853
2 PLAC Zundert, The Netherlands
1 DEAT
2 DATE 29 JUL 1890
2 PLAC Auvers-sur-Oise, France
1 BURI
2 DATE 30 JUL 1890
2 PLAC Auvers-sur-Oise, France
1 OCCU painter

But when I started implementing GEDCOM support for Centurial, I soon found out that working with GEDCOM files involves many, sometimes very intriguing details. For example, GEDCOM files can be encoded in many different ways. And it turns there are many versions of the GEDCOM specification, and even more software applications implementing these specifications, each of them doing their own unique version of it.

In the end I decided to limit GEDCOM import into Centurial to the specifications of GEDCOM versions 5.5 (1996) and 5.5.1 (1999). First of all, these versions have been around for almost 20 years so most vendors implement one of these 2 versions already. And also, there are some tools that convert older versions of GEDCOM into version 5.5 or 5.5.1. This way, Centurial will be able to import almost every research from a GEDCOM file.

For export Centurial supports GEDCOM v5.5.1 only.

The data models of GEDCOM and Centurial differ

GEDCOM defines a data model for genealogical data, and quite a comprehensive one. I soon realised that many genealogical software applications base their data model entirely on the GEDCOM data model. GEDCOM defines a birth date as a property of a person? Then so does the software. GEDCOM defines a WILL event? Then so does the software. GEDCOM makes up an 'about date'? Then so does the software, even when the concept of 'about date' is not clearly defined (as a side note: can someone please tell me if 1 january 1533 is part of 'ABT 1532'? and if so, what about 1 january 1540?).

Some of those software applications really use a one to one implementation of GEDCOM. They are perhaps more what one could call a GEDCOM editor than they are genealogy applications. Some of them are very advanced, very easy to use and work beautiful. I’d recommend them to anyone. If you need a GEDCOM editor, that is.

For these GEDCOM editors, it is key that when a project is exported to GEDCOM and then imported again, the resulting project is exactly the same as the original project. And why shouldn’t it be? When the data model is exactly the same as the GEDCOM specification, every detail of the data can be transferred to GEDCOM, and vice versa.

GEDCOM is conclusion-based

But Centurial has a data model of its own. In some areas, like the management of sources and citations, GEDCOM is way less detailed than Centurial, to the point of being conclusion-based in nature. This is especially true when it comes to tracking conclusions back to claims and evidence. Although GEDCOM does allow for source citations to be specified for many entities and properties, it does not do so for all. For examle:

  • GEDCOM allowes source citations to be specified for the combination of names of an individual only. But what if source 1 contains the correct given names of a person while source 2 contains the correct family name? There is no way to include this nuance in GEDCOM.
  • For some properties, GEDCOM does not allow source citations at all. There is for example no way to specify the evidence for the gender of an individual.
  • And what if source 1 states that persons A and B are partners, while source 2 states that person C is a child of A and B? In GEDCOM, only the conclusion can be transferred that persons A and B are a family with child C. There is no direct way to specify all the different sources for each of these partial relationships.

Goals

GEDCOM is extensible. Some of the software applications that dare to be more detailed than GEDCOM, extend GEDCOM to incorporate all the richness of their data model in their GEDCOM export. That way, when that GEDCOM is then imported again, the resulting project still is exactly the same as the original project.

For Centurial, I decided not to go that way. It is no goal in itself to ensure that an exported-then-imported project is an exact copy of the original project. Instead I recognised these 3 goals:

  1. When a user has gone to great lengths to add sources and citations to a GEDCOM project, Centurial would like to offer the possibility to import these sources and citations the best it can. Starting from Centurial v1.8, Centurial offers the import as a project feature.
  2. When a user has created a research project in Centurial and wants to import research by another genealogist from a GEDCOM file, Centurial offers the import as a source feature, available from Centurial v1.7. This feature handles all information in the GEDCOM as coming from one single source: the GEDCOM file itself.
  3. A user can export all the information in a Centurial project to GEDCOM, either for migrating to a different genealogical application or for generating reports and diagrams using other software.

Design principles

Based on these goals, I came to the following design principles:

  1. Centurial embraces GEDCOM as a format of its own, with its own pros and cons.
  2. Centurial will support GEDCOM specification v5.5 and v5.5.1 for import.
  3. Centurial will support GEDCOM specification v5.5.1 for export.
  4. It is the responsibility of Centurial to import and export data model to and from a GEDCOM file in the best way it possibly can, even though it can never be done perfectly due to the differences in the data model.
  5. During import, Centurial is allowed to read application-specific GEDCOM extensions in order to import the data in the best way possible.
  6. During export, Centurial does not use Centurial-specific tags GEDCOM, emphasizing the fact that only the generic part of the GEDCOM specification is supported by other software applications.
  7. It is no goal in itself to perfectly execute the import-export circle. Centurial will however try to execute this the best it can.

Implementation of import

Specifiying a source for a conclusion is a complicated affair in most software applications, and even if a user has gone to great lengths to do so, it will still not be entirelly possible to specifiy all sources due to the limitations of the GEDCOM specification as mentioned before. Therefore, there will always be information in a GEDCOM that can not be attributed to a source.

To remedy this, Centurial will always create a seperate source for the imported GEDCOM itself. If a GEDCOM contains n source citations, the imported Centurial file will have (n+1) sources. This additional source will contain any claim for which no source citation is specified.

Implementation of export

During export to GEDCOM, Centurial will export both the conclusions and the evidence that it was based on. Where applicable Centurial will specify the source for each value.

For example, if source 1 states the name 'Vincent van Gogh', sources 2 and 3 state 'Vincent Willem van Gogh' and the user concludes that 'Vincent Willem van Gogh' is the most likely full name, then Centurial will export 2 names for this individual: 'Vincent Willem van Gogh' first, as it is also the conclusion, including 2 source citations with it. And 'Vincent van Gogh' second, with a single source citation along with it. This way, all available data is exported.

Room for future improvement

The import from and export to GEDCOM in Centurial will always be a best effort, which means there will always be room to improve. Play around with the import/export capabilities and get in contact if you have suggestions for improvement!