The Centurial project file format and backup strategies

Friday, December 6, 2019 by Fouke Boss

While designing Centurial, considerable thought went into the way your family data is stored. In this blog we take a look at these design goals, and what they mean for your backup strategy for Centurial projects.

Designing Centurial

Back in 2016, when I first started working on Centurial, quite a lot of time went into designing the data storage aspect of Centurial. As a genealogist, we can come to collect quite a large amount of data, not only the uncovered evidence in all the different sources but certainly also a huge amount of digital scans, photos and website screenshots. This digital body of data is one of the main results of our research and therefore deserved some good designing.

The first choice I had to make was between storing all this data 'in the cloud' as it's called these days, which means on a server of some company, or on the PC or laptop of the Centurial user. Now I tend to quite value my privacy, and I soon decided to go with the local storage concept: with Centurial, genealogical data never leaves the PC or laptop of the user.

After spending some time with several other genealogy software applications, I noticed all these applications store their data in a format that has been dubbed a pile-of-files format. What this means is that for each research project, these applications create a directory on the disk of the user. In this directory, the application then creates many different files and subdirectories, each containing a part of the research: a pile-of-files. This way of storing your research always makes me a little nervous, as in my mind this immediately raises questions like "which of these files should I copy, should I ever decide to move to a new PC or laptop?" and "which of these files should I include in my backup?". Based on these observations, I formulated a second design goal: with Centurial, all research data is contained in a single file.

On the websites of these other genealogy applications, I sometimes came across long pieces on documentation on how to restore a research project in case it had gone corrupt. Excuse me!? Shouldn't that be part of the core design of any well-engineered piece of software?! Well, at least I made sure it was one for Centurial: in Centurial, the data is stored reliably and robustly, and migrations should be transparent to the user.

Enter SQLite

It was around this time in the design process that I learned about the existence of SQLite, pronounced (at least by the author of SQLite) as S-Q-Lite. SQLite is a "C-language library that implements a small, fast, self-contained, high-reliability, full-featured SQL database engine". A bit perplexed that I had never stumbled upon SQLite before in my years as a software developer, I soon became very excited about the possibilities of using SQLite as a storage format for Centurial. Why? Well:

  • SQLite is highly reliable. SQLite does not give problems, it just works. The SQLite testing standards for SQLite are among the highest for any commercial software.
  • SQLite is self-contained. Other database systems require the installation of a huge database management system, usually including Windows Services and what more. SQLite does not require such an additional installation so it can be installed with Centurial itself, without any additional hassle for the user.
  • SQLite is designed for long term support. The developers of SQLite intend to support SQLite at least until the year 2050, so your Centurial data will be available until at least 2050 as well. In fact, SQLite is so well designed for long term support, it has even become a recommended storage format for the US Library of Congress.
  • SQLite is the most widely deployed database in the world. It's used in all major operating systems, in browsers and televisions, on phones and many other devices. This means that even if Centurial would magically disappear from Earth (which it will not), there are many other applications available for extracting your research data from your Centurial project files.
  • SQLite is free software, so no large database usage fees for Centurial users are required.

In the past three years, I've become a great fan of SQLite. It has never given any problems, and it truly does just work. So far, not a single Centurial user has ever reported losing any research data (or perhaps maybe once, a little, but that was my bad...).

Centurial project file

So the Centurial project file is actually an SQLite database file on your local disk with the .cent extension. Every Centurial project file contains all the source details, the information and claims from the sources, the evidence and all the conclusions for that project. Also, all digital scans, images and photos added to the File Explorer of the Source Editor view are contained in this one Centurial project file.

You can use the Project dialog to have a look at the storage size of your Centurial projects. For example:

The data storage size indicates the size of the source, information, claim, evidence and conclusion data. The files storage size indicates the total size of all the documents and digital images in your project.

Now the great advantage of having a single project file is, that you always know where your project data is. It's not spread out over many different files and directories, no, it is contained in that one, single .cent project file. Do you want to move your research to a different PC or laptop? Install Centurial on your new machine, transfer the single project file using an external hard drive, and your good to go.

Project file size

As you spend more and more time on your research, the size of your project file will grow ever larger. Centurial project files of over 10GB have been spotted in the wild! Because of SQLite, this size (and much larger) is no problem for Centurial.

However, the size of such a Centurial project might become a problem for the user:

  • Such a large file cannot be moved around easily.
  • How do you create a backup of such a file?

The first challenge is a real concern, but changing the Centurial project file into a pile-of-files would be a fake-friend. If Centurial would store all project data as a pile-of-files, then yes, the data could be moved around more easily. But then, the connection between the different pieces of data could easily be lost! If anything, the single Centurial project file makes it very clear to the user that all data in the project is related and should not be partially copied or moved. And: a Centurial project file is usually smaller than if it were a pile-of-files.

Backups in Centurial

To tackle the second concern, Centurial has from a very early v1.1 been equipped with a dedicated backup system. The system is activated quite easily, and users are tempted to do so during the initial setup sequence. Open the Settings dialog from the TOOLS menu, and navigate to the 'Backup' tab:

To activate the automated backup system, simply check "Yes, I want to make backups of my Centurial project files.", supply the path of the backup directory, and click 'OK'. After that, for every change you make in any your Centurial project, a backup will be automatically created at the given backup directory.

Successful backup strategies for Centurial projects

Now the main concern for a successful backup strategy is to make sure your backups are placed on a location outside of your PC or Laptop as soon as possible after creation. This way, if your PC or laptop suddenly crashes and burns, or gets stolen, or lost, you will always have your backup to go back to.

The first way to achieve this is by using a cloud storage solution, like Dropbox, Google Drive, OneDrive or any other. These cloud storage solutions usually come with a sync app that automatically syncs the contents of a directory on your local drive with a directory in your cloud storage. The way to go then is to select (a directory within) this local sync directory as the backup path. This way, every time you edit your research, Centurial automatically creates a backup in this directory, and the sync app then copies this backup to your cloud storage. Your changes will be securely copied off of your machine within a minute or so.

Another way, which does not depend on some cloud storage, is to use a local network-attached storage, or NAS for short. A NAS is like an external hard drive that is attached not to your computer but to your network, behaving like a file server. By selecting a directory on the NAS as your backup path, backups created by Centurial are copied to your NAS immediately.

Part of any successful backup strategy is to never place your Centurial project in the local Cloud storage directory or on the NAS. Because if you would, any time you change to the Centurial project, the complete project file will be copied to your cloud storage or NAS. When this project file is small, this will not be a problem. But a medium-sized project, say 100MB, might start to cause some serious problems when it is copied every 10 seconds or so!

Backup strategy summary

To summarize, a successful backup strategy for your Centurial projects consists of 2 parts:

  1. Store your Centurial projects in a directory on your local drive that is not synced with a cloud storage solution.
  2. Activate the Centurial backup system by specifying a backup directory that is either
    1. synced to a cloud storage solution or
    2. on a network-attached storage.

Backups and garbage collection

So Centurial backups are optimized for file size in order to prevent large amounts of data to be continuously transferred to your cloud storage or NAS. However, as Centurial creates a backup for every change you make to your project files, your backup directory might become too large anyway. That is why Centurial is equipped with a garbage collector for backups. The garbage collector makes sure the number of backups does not grow too large.

The policy of the collector is to keep more recent backups but to start removing backups as they become older. This means that

  • For the last minute, it keeps at most 1 backup per second.
  • For the last 5 minutes, it keeps at most 1 backup per 10 seconds.
  • For the last 15 minutes, it keeps at most 1 backup per minute.
  • For the last hour, it keeps at most 1 backup per 5 minutes.
  • For the last day, it keeps at most 1 backup per hour.
  • For the last week, it keeps at most 1 backup per 4 hours.
  • For the last month, it keeps at most 1 backup per day.
  • For older backups, it keeps at most 1 backup per month.

The garbage collector always keeps the most recent backup in the given time frame.

Restoring a project

To restore a backup of a Centurial project, select 'Restore a backup' from the QuickStart dialog. In the source step of the wizard, you are asked to select the backup that you would like to restore. You can select the project to restore the latest available version of the project, or you can drill down to a more specific backup.

In the target step, you choose the name of the restored project (defaults to the original project name) and the location of the project.

After that, the summary step shows you the summary of the selected source and target settings.

Click 'Next' to restore the project. After the restore is complete, Centurial will open the restored project.

Be wise... and create backups!

Setting up a successful backup strategy in Centurial takes about 5 minutes, which is time well spent once your PC or laptop decides to kick the bucket. Better safe than sorry!