skip to content
 

Probabilistic Record Linkage and Deduplication after Indexing, Blocking, and Filtering

Presented by: 
Jared Murray Carnegie Mellon University
Date: 
Friday 8th July 2016 - 14:30 to 15:30
Venue: 
INI Seminar Room 1
Abstract: 
When linking two databases (or deduplicating a single database) the number of possible links grows rapidly in the size of the databases under consideration, and in most applications it is necessary to first reduce the number of record pairs that will be compared. Spurred by practical considerations, a range of indexing or blocking methods have been developed for this task. However, methods for inferring linkage structure that account for indexing, blocking, and filtering steps have not seen commensurate development. I review the implications of indexing, blocking and filtering, focusing primarily on the popular Fellegi-Sunter framework and proposing a new model to account for particular forms of indexing and filtering. 
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons