Skip to content

SAFE Outputs

The SAFE Outputs principle is concerned with the dissemination profile of the outputs generated from the data supplied as part of the data access made by the individual or organisation. In most cases the outputs will be a publication, report or similar. However even if the derived data is not made public in the first instance, it is advisable to assume that the data could be made public is due course. Any data asset removed from a SAFE Setting, must be evaluated under a clear, transparent

Principles

0. SAFE Outputs: Only non-disclosive output data is subject to release from a TRE Role
1. Individuals are be able to apply for an extract of their derived data and/or developed code from a TRE RE, DE
2. Trusted Research Environment providers must implement processes and systems to assess and decide on the data release application, including providing an appeals process DE
3. Trusted Research Environment providers should provide open, clear documentation of their output checking process DE, TA
4. Trusted Research Environments must provide, where possible automated solutions to output checking and release of data via an established airlock process DE, TA
5. Trusted research Environments should explore opportunities to harmonise and collaborate with other TRE Airlock managers to coordinate the output checking process and coordinate output release TA
6. Trusted Research Environments should have the facility to provide ‘safe archive’ TA

Requirements

ID Description Requirement Type Roles
OUTPUT-01 Individual MUST be able to apply for a data or code release from a TRE including information on dissemination channels Functional RE, DE
OUTPUT-02 TRE providers must implement repeatable and timely processes and systems to assess and decide on the data release applications in a consistent manner, including decision provenance & appeals process and support to individuals to undertake output checking themselves with supervision. Non-Functional DE
OUTPUT-03 TRE providers must provide open and clear documentation of the statistical disclosure control policies including the assessment criteria Functional DE
OUTPUT-04 TRE Providers must provide automated solutions (Airlock), where possible, to assess and decide data release applications and where possible coordinate the transfer of output data to a location specified by the individual Functional DE
OUTPUT-05 TRE Airlock managers should aim to harmonise and coordinate output checking and data release management processes with other TRE Airlock managers Non-Functional DE
OUTPUT-06 TRE providers and Individuals must ensure appropriate training is afforded to staff and individuals to ensure individuals are able to produce outputs that require minimal effort to check Non-Functional TA
OUTPUT-07 TRE Providers must provide a mechanisms to archive an entire project workspace for a determined duration Functional TA, DE

Interoperable Standards & Specifications

This section is a work in progress

Please suggest edits and modifications to this section by clicking on the edit link

Currently there are no established standards around statistical disclosure control policies and output checking process across TRE providers. There are two main approaches to assessing disclosure risk for output data from TRE – rules-based and principles-based.

Rules-based approaches have many implementations as detailed below, and use simple deterministic heuristics (thresholding, rounding, etc) to accept or reject outputs. Some approaches go as far as being able to detect personally identifiable information and obfuscate/reject records. Rule-based approaches tend to be conservative weighing more on preventing disclosure using brute force, rather than considering the utility of the output.

The Statistical Disclosure Control Handbook REF outlines some of the principle-based output checking approaches undertaken by many TRE providers and data custodians. Principle-based output checking evaluations use contextual information about the dataset and project to balance the disclosure risk and utility of the output data. This is a very flexible approach and as such typically undertaken manually and hence takes longer.

Both approaches are not mutually exclusive and as such a hybrid approach is typically what is used by TRE Airlock managers.

Modular Software & Services

This section is a work in progress

Please suggest edits and modifications to this section by clicking on the edit link

There are a number of non-standard software and services that use rule-based heuristics to minimise the disclosure risk of output data as much as possible. These ranges from open source data anonymisation tools, e.g. ARX Deidentification Tool or Amnesia to software and services that use machine learning e.g. AWS Macie and {un}bounded differential privacy to perturb the output data, e.g. DiffLib and Cantabular.

There is also a requirement for TREs to provide standardised mechanisms to trigger and manage the Airlock process in general. We are aware of a few ad-hoc implementations via email, shared folders and web APIs, but none standardised across TREs.

Extensible Use Cases

This section is a work in progress

Please suggest edits and modifications to this section by clicking on the edit link