Manual Correlation

14 May 2021 by Fouke Boss

Correlation is an important step in the Centurial research process, usually done by the Centurial algorithm. In this blog, we take a look at manual correlation, which allows you to improve the results of automated correlation if needed.

Correlation

Correlation is an important step in the Centurial research process, one that we've discussed in earlier blog posts like this one about the Correlation Panel and also this one on evidence in general. In short, information comes from sources, and so in Centurial you always enter information in the context of a source. In other words, you cannot enter data into Centurial unless you have first specified the details of the source that the information comes from.

Usually, any person in your research is mentioned in multiple sources. Correlation is the step in the research process where we as researchers decide if [person X in source A] is actually the same as [person X in source B]. How do we know it's really the same person X, and not just some other person also named X? Most of the time, we use common sense and logical reasoning. If it's the same person, we expect the name to be the same, or at least very similar; we expect claims like birth dates, birthplaces, residences, and death dates to match; and we expect to see matching partners and children.

Correlation is an integral part of our daily genealogy routine. Most genealogy software however does not incorporate this step of the research process at all. Centurial is an exception to this rule. After entering all information and claims from a source into Centurial, you click the Auto Correlate button to correlate the information in the source you just entered to the persons, relationships, and events that are already present in your research project.

Centurial incorporates a reasonably intelligent auto-correlation algorithm to automatically determine the correct correlations, based on the set of conditions outlined above. Many a time, the algorithm will correlate the information exactly as you would expect. But this is not, and cannot always be the case.

An example

I've crafted a Centurial research project to serve as an example (which you can download here) of how the auto-correlation algorithm might not yield the expected results. The project consists of two sources, aptly named Source1 and Source2, both containing information on two parents, Theo van Gogh and Anna Carbentus, and on their child, Vincent van Gogh (yes, the painter):

After opening the Source View for Source1, Centurial will, with a single clink on the auto correlate button, automatically correlate this first source. Once completed it will switch to the Network View, displaying the network diagram (1) and the Correlation Panel (2):

For now, let's zoom in on the network diagram, which directly after correlation looks like this:

Remember that the smaller rectangles represent the information coming from the source that was just correlated, while the larger rectangles represent the information from all other, already correlated sources in the project. As this is the first source to be correlated in this project, there is no other information available and so the larger rectangles are completely empty.

After closing the Correlation Panel (using the button in the top right corner), all information is merged into the larger rectangles:

Things get interesting...

Auto correlating the second source is a less straightforward affair. Although Source2 contains information about the same 3 people, with their names spelled exactly the same, the end result is a bit messy, as it looks like there 3 parents and 2 children:

So what happened here?

Let's see. The only person looking as expected is probably Anna Carbentus. The larger pink rectangle is showing the information from Source1, and Centurial has auto correlated the information from Source2 to that of Source1 correctly, which we can tell because the smaller rectangle of Source2 is on top of the larger rectangle from Source1.

Things did not go that smooth for the correlation of the partner of Anna, father Theo van Gogh. We can see the information from Source1 in the large blue rectangle at the left top corner, but we also see an empty rectangle at the left bottom corner with the information for Theo from Source2 correlated to it. This is actually the root cause of the mess: Centurial decided to correlate the information from both Anna's, but somehow decided that the Theo from Source1 cannot be the Theo from Source2.

Why did Centurial decide not to merge the two Theos?

Because Centurial decided against correlating the information for Theo, we now actually have two persons named Theo in our project, which you can very clearly see from the Person List:

The Person List also provides a clue as to why Centurial decided not to correlate both Theo' s, as the Birth column shows that one Theo was born on 2 February 1822, while the other Theo was born around 1823. Closer inspection (using the Analysis View for the birth date) reveals that the second Theo indicated to be "33 years old on 1 May 1857", which means he must be born between 2 May 1823 and 1 May 1824. So Centurial found this mismatch in their birth dates and decided this second Theo is probably a different person than the first Theo!

But then what happened to the child, Vincent?

But then why is Centurial displaying two (2!) children, when there clearly is only one? Well, let's see. Anna was correlated as expected, so we have one single Anna in our project. The Theos were not correlated, yielding 2 separate persons named Theo. Based on the information from both sources, however, both men were partners of Anna, which is indicated by the 2 double lines connecting Anna with both Theos.

Both sources also state that Anna and Theo had a child together, Vincent. As Centurial did not see any reason not to correlate the information from Vincent from the two sources, we end up with 1 child Vincent, with Anna as his mother. But now it has become unclear who is the father of the child: the Theo from Source1 or the Theo of Source2? Centurial does not decide on this, instead leaving the analysis of the situation to us, the researcher.

What Centurial does instead is simply indicating that something fishy is going on here, by displaying the child as being from both the first and the second Theo. Because of the way the network diagram is rendered, this means Vincent is displayed twice. To indicate that a person is displayed multiple times in the network diagram, Centurial marks it with an index number:

The ambiguity about the identity of the father also becomes clear if we close the Correlation Panel and make one of the Vincents the active person in the network:

As Vincent is now displayed only once in the network, the index number is removed. Instead, we now have the issue of Vincent having multiple fathers, which Centurial handles by displaying a stack of fathers.

How do we resolve this situation?

So Centurial is indicating in several ways that something worthy of our attention is going on here. But how do we proceed? As we already found out, the root cause of the issue is that there is a mismatch between the birth dates of both Theos. So what could have happened here?

  • One obvious reason could be a misspelling of one of the dates. Perhaps Source1 should have read "2 February 1824" instead of "2 February 1822". Or maybe Theo miscalculated when asked to declare his age (which happens to the best of us, right?), stating 33 instead of 35. Or maybe the Civil Registrar misheard.
  • Or perhaps one of the sources is about two completely different people also named Anna and Theo, also having a child named Vincent.
  • A more far-fetched theory could be that Anna was actually married to 2 different men named Theo, having a child with each of them. Perhaps the first Theo died, and Anna decided to remarry a brother or a cousin of her late husband, who happened to have the same name, Theo.

To be sure of what happened here, the usual solution is to go and find more sources. Perhaps one of these new sources will shed a different light on the situation. However, based on what we know so far, it would not be unreasonable to assume, at least for now, a misspelling or miscalculation. In that case, both Theos are in fact the same man!

Manual correlation using the Network Diagram

Manual correlation is the way for us, the user, to tell Centurial that we think it is safe to correlate information even though the auto-correlation algorithm decided otherwise.

There are several ways to manually improve the correlation, all of which involve the use of drag & drop. In this particular case, the easiest way to manually improve the correlation is to drag the information rectangle in the network diagram from the second Theo, and drop it on the first Theo, like so:

As you can see, the whole situation is resolved immediately with this single drag & drop action. The network diagram is looking as expected, and the Person List is correctly showing only three persons:

This type of manual correlation is available directly after you auto correlated a source, Once you close the Correlation Panel, it will become unavailable again. But you can always go back and again display the correlation of a source by using the View Correlation option from the Source View:

A click on this button will once again display the correlation of this source in the Network View.

Manual correlation using the Correlation Panel

The second method of manual correlation is by using the Correlation Panel that opens on the side of the network diagram. At the top of the Correlation Panel, Centurial shows the source reference (1) of the source that is currently being displayed. Below that, the panel contains a list of persons for which correlation is being displayed (2). For each person, rectangles (3) are displayed, one for each of the sources that contains information about that particular person.

A useful feature of the Correlation Panel is, that you as a user can hand-pick the persons for which it displays the correlation. In other words, it is not limited to the persons of a single source. To add one or more persons to the Correlation Panel, select the View Correlation option from the context menu of a person in either the Person List or the network diagram, like so:

Please note that in both the Person List and the network diagram, you can select multiple persons in one go by holding the Ctrl key while selecting the persons. Also, every time you use the View Correlation, the selected persons will be added to the Correlation Panel, allowing you to gather all the relevant persons.

To improve a correlation manually using the Correlation Panel, you drag the information rectangles from a person and you drop it onto the person you want to correlate the information to:

Again, this resolved the situation in the expected way.

Some final thoughts

Before we end today's blog, there are three more thoughts to share:

You can change correlation at any time

An interesting feature of Centurial is that you can change the correlation of information at any time during your research. Even if the information has been correlated in a certain configuration for years, it can still be rearranged, for example, when new evidence emerges that suggests thusly. It's for this reason that we didn't need to hesitate earlier on to merge both Theos: if we are mistaken, we can at any time correct our mistake.

You can also correlate information to an altogether new person

If you find that evidence suggests that certain information no longer belongs to a person, but instead must relate to a person that is not already part of your research project, you can simply drag the information rectangle into the open space of the network diagram or the Correlation Panel. This will create an altogether new person based on that information.

Shouldn't the auto-correlation algorithm be improved in this case?

You might argue that in the example used in this blog post, the auto-correlation algorithm should actually have decided to correlate the two Theos anyway. Both men have the same name, are married to the same partner, have the same child Vincent and only differ slightly in their birth dates. But as I've said before, improving the auto-correlation algorithm is a balancing act, finding the sweet spot between false negatives and false positives.

In this particular example, you will find that if the difference in birth date would have been within a year (Theo mistaking his age to be 34 instead of 33, not 35), Centurial would have considered it more likely to be a mistake (people sometimes forget they've already celebrated their birthday earlier in the year, but not a whole year later), and the auto-correlation algorithm would have correlated both Theos. You can try this using the example project!

Summary

Correlation is an integral part of our research process. In Centurial, the researcher always has the final say on how the information is correlated. The auto-correlation algorithm is only just a tool, trying to enlighten an otherwise tedious chore. But if you find the algorithm comes to the wrong conclusion, you can always correct the situation using manual correlation.