skip to content

Probabilistic anonymisation of microdatasets and models for analysis

Presented by: 
Harvey Goldstein
Friday 8th July 2016 - 10:00 to 11:00
INI Seminar Room 1
The general idea is to use the addition of random noise with known properties to some or all variables in a released dataset, typically following linkage, where the values of some identifier variables for individuals of interest are also available to an external ‘attacker’ who wishes to identify those individuals so that they can interrogate their records in the dataset. The noise is tuned to achieve any given degree of anonymity to avoid identification by an ‘attacker’ via the linking of patterns based on the values of such variables.  The noise so generated can then be ‘removed’ at the analysis stage since its characteristics are known, requiring disclosure of these characteristics by the linking agency. This leads to consistent parameter estimates, although a loss of efficiency will occur, but the data themselves are not degraded by any form of coarsening such as grouping. 
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons