skip to content

Computational Methods for Linking Sets of National Files

Presented by: 
Bill Winkler U.S. Census Bureau
Monday 12th September 2016 - 13:30 to 14:30
INI Seminar Room 1
A combination of faster hardware and new computational algorithms makes it possible to link two or more national files having suitable quasi-identifying information such as name, address, date-of-birth and other non-uniquely identifying information far faster than methods of a decade earlier. The methods (Winkler, Yancey, and Porter 2010) were used for matching 10^17 pairs (300 million x 300 million) using 40 cpus of an SGI machine (with 2006 Itanium chips) in less than 30 hours during the 2010 U.S. Decennial Census. The methods are 50 times as fast as PSwoosh parallel software (Kawai et al. 2006) from Stanford University. The methods are ~10 times as fast as recent parallel software that applies new methods of load balancing (Rahm and Kolb 2013, Yan et al. 2013, Karapiperis and Verykios 2014). This talk will describe how this software bypasses the needs for system sorts and provides highly optimized search-retrieval-comparison for a narrow range of situations needed for record linkage.

Related Links
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons