A major new feature in versions 1.7 and 1.8 of Centurial is the support for GEDCOM. But the specification for GEDCOM is over 20 years old and is not very evidence-based. This blog documents the design principles for the implementation of GEDCOM support in Centurial.
The first version of GEDCOM was introduced in 1984, and, in some miraculous way, developed into the industry standard for exchanging genealogical information between software applications. The most recent version of the GEDCOM specification, v5.5.1, celebrates its 20th birthday later in 2019. In its basic form, the GEDCOM format is quite simple. As an example, this short snippet is how a person is defined in GEDCOM:
0 @I1@ INDI
1 NAME Vincent Willem /Van Gogh/
2 SURN Van Gogh
2 GIVN Vincent Willem
1 SEX M
2 DATE 30 MAR 1853
2 PLAC Zundert, The Netherlands
2 DATE 29 JUL 1890
2 PLAC Auvers-sur-Oise, France
2 DATE 30 JUL 1890
2 PLAC Auvers-sur-Oise, France
1 OCCU painter
But when I started implementing GEDCOM support for Centurial, I soon found out that working with GEDCOM files involves many, sometimes very intriguing details. For example, GEDCOM files can be encoded in many different ways. And it turns there are many versions of the GEDCOM specification, and even more software applications implementing these specifications, each of them doing their own unique version of it.
In the end I decided to limit GEDCOM import into Centurial to the specifications of GEDCOM versions 5.5 (1996) and 5.5.1 (1999). First of all, these versions have been around for almost 20 years so most vendors implement one of these 2 versions already. And also, there are some tools that convert older versions of GEDCOM into version 5.5 or 5.5.1. This way, Centurial will be able to import almost every research from a GEDCOM file.
For export Centurial supports GEDCOM v5.5.1 only.
GEDCOM defines a data model for genealogical data, and quite a comprehensive one. I soon realised that many genealogical software applications base their data model entirely on the GEDCOM data model. GEDCOM defines a birth date as a property of a person? Then so does the software. GEDCOM defines a WILL event? Then so does the software. GEDCOM makes up an 'about date'? Then so does the software, even when the concept of 'about date' is not clearly defined (as a side note: can someone please tell me if 1 january 1533 is part of 'ABT 1532'? and if so, what about 1 january 1540?).
Some of those software applications really use a one to one implementation of GEDCOM. They are perhaps more what one could call a GEDCOM editor than they are genealogy applications. Some of them are very advanced, very easy to use and work beautiful. I’d recommend them to anyone. If you need a GEDCOM editor, that is.
For these GEDCOM editors, it is key that when a project is exported to GEDCOM and then imported again, the resulting project is exactly the same as the original project. And why shouldn’t it be? When the data model is exactly the same as the GEDCOM specification, every detail of the data can be transferred to GEDCOM, and vice versa.
But Centurial has a data model of its own. In some areas, like the management of sources and citations, GEDCOM is way less detailed than Centurial, to the point of being conclusion-based in nature. This is especially true when it comes to tracking conclusions back to claims and evidence. Although GEDCOM does allow for source citations to be specified for many entities and properties, it does not do so for all. For examle:
GEDCOM is extensible. Some of the software applications that dare to be more detailed than GEDCOM, extend GEDCOM to incorporate all the richness of their data model in their GEDCOM export. That way, when that GEDCOM is then imported again, the resulting project still is exactly the same as the original project.
For Centurial, I decided not to go that way. It is no goal in itself to ensure that an exported-then-imported project is an exact copy of the original project. Instead I recognised these 3 goals:
Based on these goals, I came to the following design principles:
Specifiying a source for a conclusion is a complicated affair in most software applications, and even if a user has gone to great lengths to do so, it will still not be entirelly possible to specifiy all sources due to the limitations of the GEDCOM specification as mentioned before. Therefore, there will always be information in a GEDCOM that can not be attributed to a source.
To remedy this, Centurial will always create a seperate source for the imported GEDCOM itself. If a GEDCOM contains n source citations, the imported Centurial file will have (n+1) sources. This additional source will contain any claim for which no source citation is specified.
During export to GEDCOM, Centurial will export both the conclusions and the evidence that it was based on. Where applicable Centurial will specify the source for each value.
For example, if source 1 states the name 'Vincent van Gogh', sources 2 and 3 state 'Vincent Willem van Gogh' and the user concludes that 'Vincent Willem van Gogh' is the most likely full name, then Centurial will export 2 names for this individual: 'Vincent Willem van Gogh' first, as it is also the conclusion, including 2 source citations with it. And 'Vincent van Gogh' second, with a single source citation along with it. This way, all available data is exported.