skip to content

Tracking people over time in 19th century Canada: Challenges, Bias and Results

Presented by: 
Luiza Antonie
Tuesday 13th September 2016 - 12:00 to 12:30
INI Seminar Room 1
Co-author: Kris Inwood (University of Guelph)

Linking multiple databases to create longitudinal data is an important research problem with multiple applications. Longitudinal data allows analysts to perform studies that would be unfeasible otherwise. In this talk, I discuss a system we designed to link historical census databases in order to create longitudinal data that allow tracking people over time. Data imprecision in historical census data and the lack of unique personal identifiers make this task a challenging one. We design and employ a record linkage system that incorporates a supervised learning module for classifying pairs of records as matches and non-matches. In addition, we disambiguate ambiguous links by taking into account the family context. We report results on linking four Canadian census collections, from 1871 to 1901, and identify and discuss the impact on precision and bias when family context is employed. We show that our system performs large scale linkage producing high quality links and generat ing sufficient longitudinal data to allow meaningful social science studies. 
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons