Correlation Revisited

Friday, February 14, 2020 by Fouke Boss

Correlation, the matching of information in sources to subjects in your research, is probably the one single concept that sets Centurial apart from all other genealogy software out there. In this blog we take another look at this concept, and we introduce the new Correlation Panel.

What again is Correlation?

As genealogists, we all get our information from sources. When we find a source, we match the information in that source to the persons in our database and then add the claims in the source to the matching persons. This process of matching the information in a source with the existing persons in our database is what is called correlation.

This process is independent of the software that we use to store our research, and so correlation is certainly not a unique feature of Centurial. What is unique, however, is the fact that in Centurial this step is made so explicitly. In most software, when you find the source, you first correlate the information to your research and then add the claims. In Centurial this sequence is rearranged, initially focusing on the source:

  1. You find a source.
  2. You enter the information and claims from that source into Centurial.
  3. Only then you correlate the information to the existing persons in your research project.

This way, you are initially much more focused on the information and claims in the source, unbiased by any conclusions that are already in your research project. For example, my research shows that the last name of my great-grandfather was Quaedackers. If I find a new source that states that a Mr Quadackers registered the birth of a daughter, and I immediately recognise that this must be my great-grandfather (correlation), I might miss out on the fact that his last name is spelt a little differently in the new source. On the other hand, if I enter all the information in the source first, and only later try to correlate it to my existing research, I (as well as Centurial) will notice this difference in spelling right away.

Correlation turns Information into Evidence

In our earlier blog, Evidence-based genealogy - What is Evidence?, we argued that the information in a source only becomes evidence once we determine that information is actually correlated to our research. If a source mentions a Mr Quadackers, this is no evidence yet. Only if we believe (based on the information) that the information is about the Mr Quaedackers from our research, only then this information becomes evidence for the fact that his name was spelt in 2 different ways.

Source 1
Quaedackers
Source 2
Quadackers
Conclusion Quaedackers
Quadackers

This works the other way around as well. Suppose that, after finding more sources, we come to the conclusion that Mr Quadackers must actually be an altogether different person than Mr Quaedackers. Then the information on Mr Quadackers is no longer evidence for Mr Quaedackers but becomes evidence for an at this point new person in our research.

Source 1
Quaedackers
Source 2
Quadackers
Conclusion Quaedackers Quadackers

Correlation is not unlike Merging

Most genealogy software allows the user to import and merge a GEDCOM file into their existing research. This is usually a tricky process because in most software it is not possible to unmerge. Once the information in the GEDCOM is merged into the research project, there is no easy way of reverting the merge.

Looking at it like that, one might say that in Centurial every source is merged into the research project independently. Luckily, unmerging in Centurial is easy, making the process far less nerve-wracking.

Auto Correlation

If Centurial would leave the correlation process entirely to the user, correlation might become quite a tedious exercise. That's why Centurial is equipped with an Auto Correlation algorithm. This algorithm tries its best to match the information in a new source to the existing evidence in the research project.

After you have entered all the source information into Centurial, you kick off the Auto Correlation algorithm by pressing the Auto Correlate button:

The algorithm exposes the differences between a human and a computer quite nicely. On one hand, matching Quaedackers to Quadackers is probably easy for a human, whereas the computer needs quite a complicated algorithm to find the same match. On the other hand, the computer can take much more information into account in a much more objective way, sometimes pointing out mismatches that would have gone unnoticed by a human, sometimes coming up with relationships you never realised were there.

Validating the results of the Auto Correlation algorithm

Although the Auto Correlation algorithm is continuously improved and enhanced, it is most certainly not infallible. Every now and then it comes up with a correlation that you most certainly would not have made yourself. Sometimes the information in separate sources is just too different for the algorithm to match, or the algorithm weighs the similarities and differences in an unexpected way.

In Centurial, the researcher always has the final say on correlation, and the Auto Correlation algorithm is nothing more than a tool that saves you a lot of time in almost all cases. It is advisable to always validate the results of the algorithm after each Auto Correlation.

The new Correlation Panel

Centurial v1.14 introduces the new Correlation Panel. This panel simplifies the validation of the algorithm results. After Auto Correlation, the new panel is now shown on the right, next to the Network Diagram.

The panel shows the source reference of the source we just auto correlated at the top of the panel (1). Below that, the panel shows to which persons (2, 3) in the project the information in the source is correlated. The person that is currently centered in the Netwerk Diagram, is marked with a green background (3). To center any other person in the list, use the button (4).

For each person, the panel then shows the information from all the different sources that mention this person (5). Each line represents the information from a single source. The information coming from current source (1) is highlighted. Each line shows the name, gender (as represented by the color of the person rectangle), the birth and death dates, and the partner as entered in that source. By holding the mouse pointer over the dates in a line, the tool tip displays the source reference from that particular source. A click with the right mouse button over a line displays the context menu with a single option 'View Information', which opens the corresponding Source View.

You can dismiss the correlation panel at any time by using button (6). But if you want to validate the results of the Auto Correlation algorithm, button (7) is a far better choice. This option is only available for the person that is centered in the Network Diagram. Use the Network Diagram to determine if you're happy with the correlation. If so, simply press button (7). Centurial will remove the validated person from the list, and will automatically center the next person in the list. This way you can conveniently check off all persons. Once you check off the last person in the list, the panel will close:

Please note, that checking off each of the correlations does not change anything to the correlation itself. As the correlation of information to persons can easily be undone, the Auto Correlation algorithm merges the information from the various sources immediately. The researcher is not required to approve every single correlation, and checking of the correlations from the Correlation Panel is optional.

Manually improving a correlation

If you find that the information is not correlated to the correct person, you can always improve the correlation manually. This is accomplished by moving the mouse pointer over the information line you want to improve. Pressing the left mouse button, and keeping the button down, allows you to drag the incorrectly correlated information rectangle around the screen freely. Drop the rectangle on the person you feel is the correct one by releasing the mouse button while the mouse pointer is over that person. You can also create an entirely new person by dropping the information rectangle in the empty space between the persons. Use the Escape button on your keyboard to abort a manual correlation.

After the manual correlation, Centurial will update the Network Diagram and update the appropriate conclusions to reflect the new situation.

Analyzing correlations later on

The new Correlation Panel replaces the previous Correlation dialog, which allowed the researcher to analyze the correlation of a person or multiple persons at any time during the research project, even long after a source is first correlated. The new Correlation Panel still supports this scenario and has some additional features to further improve usability.

There are several ways to start analyzing the correlation of a single person or multiple persons. Either in the Person List, in the Network Diagram, or in the Person Evidence View you find the View Correlation menu option. Any of these options will open the Correlation Panel, this time without the source information in the top of the panel.

New in v1.14 is the possibility to add additional persons to an already open Correlation Panel. Simply select View Correlation from the Person List or the Network Diagram, and the person or persons will be added to the open Correlation Panel.

Persons can be removed from the Correlation Panel by using the button. To remove all persons from the panel, simply close the panel with the cross button (6) in the top right corner.

Manual correlation, of course, works exactly the same in this scenario. Please also notice how you can easily scroll through the Correlation Panel while dragging an information rectangle, by moving the mouse pointer to the top or the bottom of the Correlation Panel while dragging the information rectangle.

Summary

Correlation is an important part of doing genealogy, and Centurial offers many features to support the researcher during the initial correlation and later improvements and analysis.