SAFE Outputs¶
The SAFE Outputs principle is concerned with the dissemination profile of the outputs generated from the data supplied as part of the data access made by the individual or organisation. In most cases the outputs will be a publication, report or similar. However even if the derived data is not made public in the first instance, it is advisable to assume that the data could be made public is due course. Any data asset removed from a SAFE Setting, must be evaluated under a clear, transparent
Principles¶
0. | SAFE Outputs: Only non-disclosive output data is subject to release from a TRE | Role |
---|---|---|
1. | Individuals are be able to apply for an extract of their derived data and/or developed code from a TRE | RE, DE |
2. | Trusted Research Environment providers must implement processes and systems to assess and decide on the data release application, including providing an appeals process | DE |
3. | Trusted Research Environment providers should provide open, clear documentation of their output checking process | DE, TA |
4. | Trusted Research Environments must provide, where possible automated solutions to output checking and release of data via an established airlock process | DE, TA |
5. | Trusted research Environments should explore opportunities to harmonise and collaborate with other TRE Airlock managers to coordinate the output checking process and coordinate output release | TA |
6. | Trusted Research Environments should have the facility to provide ‘safe archive’ | TA |
Requirements¶
ID | Description | Requirement Type | Roles |
---|---|---|---|
OUTPUT-01 | Individual MUST be able to apply for a data or code release from a TRE including information on dissemination channels | Functional | RE, DE |
OUTPUT-02 | TRE providers must implement repeatable and timely processes and systems to assess and decide on the data release applications in a consistent manner, including decision provenance & appeals process and support to individuals to undertake output checking themselves with supervision. | Non-Functional | DE |
OUTPUT-03 | TRE providers must provide open and clear documentation of the statistical disclosure control policies including the assessment criteria | Functional | DE |
OUTPUT-04 | TRE Providers must provide automated solutions (Airlock), where possible, to assess and decide data release applications and where possible coordinate the transfer of output data to a location specified by the individual | Functional | DE |
OUTPUT-05 | TRE Airlock managers should aim to harmonise and coordinate output checking and data release management processes with other TRE Airlock managers | Non-Functional | DE |
OUTPUT-06 | TRE providers and Individuals must ensure appropriate training is afforded to staff and individuals to ensure individuals are able to produce outputs that require minimal effort to check | Non-Functional | TA |
OUTPUT-07 | TRE Providers must provide a mechanisms to archive an entire project workspace for a determined duration | Functional | TA, DE |
Interoperable Standards & Specifications¶
This section is a work in progress
Please suggest edits and modifications to this section by clicking on the edit link
Currently there are no established standards around statistical disclosure control policies and output checking process across TRE providers. There are two main approaches to assessing disclosure risk for output data from TRE – rules-based and principles-based.
Rules-based approaches have many implementations as detailed below, and use simple deterministic heuristics (thresholding, rounding, etc) to accept or reject outputs. Some approaches go as far as being able to detect personally identifiable information and obfuscate/reject records. Rule-based approaches tend to be conservative weighing more on preventing disclosure using brute force, rather than considering the utility of the output.
The Statistical Disclosure Control Handbook REF outlines some of the principle-based output checking approaches undertaken by many TRE providers and data custodians. Principle-based output checking evaluations use contextual information about the dataset and project to balance the disclosure risk and utility of the output data. This is a very flexible approach and as such typically undertaken manually and hence takes longer.
Both approaches are not mutually exclusive and as such a hybrid approach is typically what is used by TRE Airlock managers.
Modular Software & Services¶
This section is a work in progress
Please suggest edits and modifications to this section by clicking on the edit link
There are a number of non-standard software and services that use rule-based heuristics to minimise the disclosure risk of output data as much as possible. These ranges from open source data anonymisation tools, e.g. ARX Deidentification Tool or Amnesia to software and services that use machine learning e.g. AWS Macie and {un}bounded differential privacy to perturb the output data, e.g. DiffLib and Cantabular.
There is also a requirement for TREs to provide standardised mechanisms to trigger and manage the Airlock process in general. We are aware of a few ad-hoc implementations via email, shared folders and web APIs, but none standardised across TREs.
Extensible Use Cases¶
This section is a work in progress
Please suggest edits and modifications to this section by clicking on the edit link