Presented by:
Paul Burton
Date:
Thursday 8th December 2016 - 14:15 to 15:00
Venue:
INI Seminar Room 1
Abstract:
Research in modern biomedicine and social science often requires
sample sizes so large that they can only be achieved through a pooled
co-analysis of data from several studies. But the pooling of information from
individuals in a central database that may be queried by researchers raises
important governance questions and can be controversial. These reflect
important societal and professional concerns about privacy, confidentiality and
intellectual property. DataSHIELD provides a novel technological solution that
circumvents some of the most basic challenges in facilitating the access of
researchers and other healthcare professionals to individual-level data.
Commands are sent from a central analysis computer (AC) to several data
computers (DCs) that store the data to be co-analysed. Each DC is located at
one of the studies contributing data to the analysis. The data sets are
analysed simultaneously but in parallel. The separate parallelized analyses are
linked by non-disclosive summary statistics and commands that are transmitted
back and forth between the DCs and the AC. Technical implementation of
DataSHIELD employs a specially modified R statistical environment linked to an
Opal database deployed behind the computer firewall of each DC. Analysis is
then controlled through a standard R environment at the AC. DataSHIELD is most
often configured to carry out a – typically fully-efficient – analysis that is mathematically
equivalent to placing all data from all studies in one central database and
analysing them all together (with centre-effects, of course, where required).
Alternatively, it can be set up for study-level meta-analysis: estimates and
standard errors are derived independently from each study and are subject to centralized
random effects meta-analysis at the AC. DataSHIELD is being developed as a
flexible, easily extendible, open-source way to provide secure data access to a
single study or data repository as
well as for settings involving several studies. Although the talk will focus on
the version of DataSHIELD that represents our current standard implementation,
it will also explore some of our recent thinking in relation to issues such as vertically
partitioned (record linkage) data, textual data and non-disclosive graphical
visualisation.
The video for this talk should appear here if JavaScript is enabled.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.
If it doesn't, something may have gone wrong with our embedded player.
We'll get it fixed as soon as possible.