Historical life cycle reconstruction by indexing

Presented by: 
Gerrit Bloothooft Universiteit Utrecht
Monday 12th September 2016
INI Seminar Room 1
Co-authors: Jelte van Boheemen (Utrecht University), Marijn Schraagen (Utrecht University)

Historical information about individuals is usually scattered across many sources. An integrated use of all available information is then needed to reconstruct their life cycles. Rather than comparing records between pairs of sources, it will be shown to be computationally effective to combine all data in a single table. In such a table, each record summarizes the information that can be deducted for a person who shows up in a source event. The idea is that this table should be ordered in such a way that consecutive records describe the life cycle events of an unique individual, for one individual after another, where each individual has its own ID. To arrive at this situation, it is necessary to filter and index the table in two ways, depending on the possible roles of an individual: the first as ego in focus (at birth, marriage and decease), the second as parent at the same life events of children. The results of both indexes (in terms of preliminary record clusters and IDs) should be combined, while resulting clusters should be tested for validity of the life cycle.

The success of such a procedure strongly depends on the available data and its quality. The Dutch civil registration, introduced by the French in 1811 and now largely digitized, provides quite optimal conditions. Remaining problems of data fuzziness can be circumvented by name standardization (to various levels of name reduction) and by testing different sequences of the available information in records for indexing. Both approaches are only effective when there is more information available then needed to identify an individual uniquely – which in many cases seems to be the case for the Dutch civil registration. An example of the procedure will be given for data for the province of Zeeland, while options for application of the method to older data of (much) less quality and completeness will be discussed. The latter touches upon the limits of historical life cycle reconstruction.
