skip to content

Statistical Modelling using Linked Data - Issues and Opportunities

Presented by: 
Ray Chambers
Friday 8th July 2016 - 11:30 to 12:30
INI Seminar Room 1
Probabilistic linkage of multiple data sets is now popular and widespread. Unfortunately, there appears to be little corresponding enthusiasm for adjusting standard methods of statistical analysis when they are used with these linked data sets, even though there is plenty of evidence from simulation studies that both incorrect links as well as informative missed links can lead to biased inference. In this presentation I will describe the key issues that need to be addressed when analysing such linked data and some of the methods that can help. In this context, I will focus in particular on the simple linear regression model as a vehicle for demonstrating how knowledge about the statistical properties of the linkage process as well as summary information about the population distribution of the analysis variables can be used to correct for (or at least alleviate) these inferential problems. Recent research at the Australian Bureau of Statistics on a potential weighting/imputation approach to implementing these solutions will also be presented.
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons