Evaluating Data Linkage: Creating longitudinal synthetic data to provide a gold-standard linked dataset

Presented by: 
Tom Dalton University of St Andrews
Thursday 20th October 2016 - 15:30 to 16:30
INI Seminar Room 2
When performing probabilistic data linkage on real world data we, by the fact we need to link it, do not know the true linkage. Therefore, the success of our linkage approach is difficult to evaluate. Often small hand linked datasets are used as a ‘gold-standard’ for the linkage approach to be evaluated against. However, errors in the hand-linkage and the limited size and number of these datasets do not allow for robust evaluation. The research focuses on the creation of longitudinal synthetic datasets for the domain of population reconstruction. In this talk I will cover the previous and current models we have created to achieve this and detail the approaches to how we: define the desired behaviour in the model to avoid clashes between input distributions, verify the statistical correctness of the population, and initialise the model such that the starting population meets the temporal requirements of the desired behaviour. To conclude I will outline the model’s intended use for linkage evaluation, its other potential uses and also take questions.

