skip to content

Sanitization for sequential data

Presented by: 
Grigorios Loukides
Thursday 29th September 2016 - 15:30 to 16:30
INI Seminar Room 2
Organizations disseminate sequential data to support applications in domains ranging from marketing to healthcare. Such data are typically modeled as a collection of sequences, or a series of time-stamped events, and they are mined by data recipients aiming to discover actionable knowledge. However, the mining of sequential data may expose sensitive patterns that leak confidential knowledge, or lead to intrusive inferences about groups of individuals.   In this talk, I will review the problem and present two approaches that prevent it, while retaining the usefulness of data in mining tasks. The first approach is applicable to a collection of sequences and sanitizes sensitive patterns by permuting their events. The selected permutations avoid changes in the set of frequent non-sensitive patterns and in the ordering information of the sequences. The second approach is applicable to a series of time-stamped events and sanitizes sensitive events by deleting them from carefully selected time points. The deletion of events is guided by a model that captures changes to the probability distribution of events across the sequence.  
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
University of Cambridge Research Councils UK
    Clay Mathematics Institute London Mathematical Society NM Rothschild and Sons