Skip to content

Version 2.0

required

Required metadata needed for the GWDM

gatewayId

Associated identifier (number) that is the BigInt key in our SQL database for the dataset version associated with this metadata

title guidance is_list required type
Gateway Identifier False True ['str']

gatewayPid

A unique persistent identifier for the metadata version. This is a 128-bit unique identifiers, as 32 hexadecimal digits separated by hyphens

title guidance is_list required type
Gateway Persistent Identifier False True ['str']

issued

Datetime stamp of when this metadata version was initially issued

title guidance is_list required type
Metadata Issued Datetime False True ['datetime']

modified

Datetime stamp of when this metadata was last modified

title guidance is_list required type
Last Modified Datetime False True ['datetime']

revisions

A list of persistent identifiers and version numbers for previous versions of metadata for this dataset

version

Version number used for previous version of this dataset

title guidance is_list required type
revision version False True ['str']

Examples:

  • 6.0.0

url

Some url with a reference to the record of a previous version of this dataset

title guidance is_list required type
revision url False True ["Url[{'anyOf': [{'format': 'uri', 'minLength': 1, 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • https://api.service.nhs.uk/health-research-data-catalogue/datasetrevisions/841f7da2-b018-41f6-b4ae-2e0aadab6561

version

Dataset metadata version

title guidance is_list required type
Dataset Version False True ['str']

Examples:

  • 1.1.0

summary

Summary of metadata describing key pieces of information.

title

The main title of the dataset

title guidance is_list required type
Title - The title should provide a short description of the dataset and be unique across the gateway.
- If your title is not unique, please add a prefix with your organisation name or identifier to differentiate it from other datasets within the Gateway.
- If an accronym is widely used the dataset name, please add it in brackets () at the end of the title.
- Good titles should summarise the content of the dataset and if relevant, the region the dataset covers.
- Example: North West London COVID-19 Patient Level Situation Report
False True ["TwoHundredFiftyFiveCharacters[{'maxLength': 255, 'minLength': 2, 'type': 'string'}]"]

Examples:

  • Publications that mention HDR-UK (or any variant thereof) in Acknowledgements or Author Affiliations

shortTitle

A shorter descriptive title of the dataset

title guidance is_list required type
Short Title False False ["ShortTitle[{'anyOf': [{'maxLength': 100, 'minLength': 2, 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • ONS 2011 Census Wales (CENW)

doiName

DOI associated to this dataset

title guidance is_list required type
DOI Name - Please note: This is not the DOI of the publication(s) associated with the dataset.
- All HDR UK registered datasets should either have a (DOI) or be working towards obtaining one.
- If a DOI is available, please provide the DOI.
- What happens if I do not have a DOI?: Contact your academic organisation to find out if there is an existing relationship with a DOI provider. If that is not available, sites such as figshare offer free services to mint a DOI for your dataset. Subsequent versions of the Metadata Exchange will provide a DOI minting service.
False False ["Doi[{'anyOf': [{'pattern': '^10.\\d{4,9}/[-._;()/:a-zA-Z0-9]+$', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • 10.1093/ije/dyx196

abstract

Longer abstract detailing the dataset.

title guidance is_list required type
Abstract - The abstract should provide a clear and brief descriptive signpost for researchers who are searching for data that may be relevant to their research.
- The abstract should allow the reader to determine the scope of the data collection and accurately summarise its content.
- Effective abstracts should avoid long sentences and abbreviations where possible.
- Note: Researchers will view Titles and the first line of Abstracts (list view) when searching for datasets and choosing whether to explore their content further.
- Abstracts should be different from the full description for a dataset.
- Example: CPRD Aurum contains primary care data contributed by General Practitioner (GP) practices using EMIS Web® including patient registration information and all care events that GPs have chosen to record as part of their usual medical practice.
False True ["LongAbstractText[{'anyOf': [{'maxLength': 5000, 'minLength': 5, 'type': 'string'}, {'type': 'null'}]}]"]

Examples:

  • COVID-19 Key Worker Testing Results data is required by NHS Digital to support COVID-19 requests for linkage, analysis and dissemination to other organisations who require the data in a timely manner.

keywords

Comma separated key words associated to this dataset.

title guidance is_list required type
Keywords - Please provide relevant and specific keywords that can improve the search engine optimization of your dataset.
- Please enter one keyword at a time and click Add New Field to add further keywords.
- Text from the title is automatically included in the search, there is no need to include this in the keywords.
- Include words that researcher may include in their searches.
False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • Preprints,Papers,HDR UK

controlledKeywords

Keywords that have been filtered and limited

title guidance is_list required type
Controlled Keywords False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

contactPoint

email of a person who can be the main contact point of this dataset

title guidance is_list required type
Contact Point Organisations are expected to provide a dedicated email address associated with the data access request process. If no contact point is provided in this field, this field will be defaulted to the teams support email provided in the teams setting.
Note: An employee's email address can only be provided on a temporary basis and if one is provided, you must obtain explicit consent for this purpose.
False False ['EmailStr', 'null']

Examples:

  • SAILDatabank@swansea.ac.uk

datasetType

What type of dataset is this?

title guidance is_list required type
Dataset type False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

description

Longer description of the dataset in detail

title guidance is_list required type
Description False False ["LongDescription[{'anyOf': [{'maxLength': 50000, 'minLength': 2, 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • Publications that mention HDR-UK (or any variant thereof) in Acknowledgements or Author Affiliations

    This will include:
    - Papers
    - COVID-19 Papers
    - COVID-19 Preprint

publisher

Link to details about the publisher of this dataset

name

Name of the organisation

title guidance is_list required type
Organisation Name False False ['Name[{}]', 'null']

gatewayId

Identifier on the gateway

title guidance is_list required type
Organisation Gateway Identifier False False ['str', 'null']

rorId

The Research Organization Registry (ROR) for the organisation, if applicable

title guidance is_list required type
Research Organization Registry Identifier False False ['str', 'null']

populationSize

Summary population size of the cohort

title guidance is_list required type
Population size This number informs a filter for Researchers to differentiate dataset search results based on the number of people in the dataset, and does not pull from the Observations fields. The filter also allows for Researchers to search datasets which have no population size reported, but will not pull any population size captured in the Observations section. False False ['int', 'null']

datasetSubType

Placeholder for dataset sub-type

title guidance is_list required type
Dataset Sub-type False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

inPipeline

Indicate whether this dataset is currently available for Researchers to request access.

title guidance is_list required type
Dataset pipeline status False False ["Pipeline['Available','Not available']", 'null']

coverage

This information includes attributes for geographical and temporal coverage, cohort details etc. to enable a deeper understanding of the dataset content so that researchers can make decisions about the relevance of the underlying data.

spatial

The geographical area covered by the dataset. It is recommended that links are to entries in a well-maintained gazetteer such as https://www.geonames.org/ or https://what3words.com/daring.lion.race.

title guidance is_list required type
Geographic Coverage - The geographical area covered by the dataset.
- Please provide a valid location.
- For locations in the UK, this location should conform to ONS standards.
- For locations in other countries we use ISO 3166-1 & ISO 3166-2.
False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • https://www.geonames.org/2635167/united-kingdom-of-great-britain-and-northern-ireland.html

pathway

Please indicate if the dataset is representative of the patient pathway and any limitations the dataset may have with respect to pathway coverage. This could include if the dataset is from a single speciality or area, a single tier of care, linked across two tiers (e.g. primary and secondary care), or an integrated care record covering the whole patient pathway.

title guidance is_list required type
Pathway - Please indicate if the dataset is representative of the patient pathway and any limitations the dataset may have with respect to pathway coverage.
- This could include if the dataset is from a single speciality or area, a single tier of care, linked across two tiers (e.g. primary and secondary care), or an integrated care record covering the whole patient pathway.
False False ["LongDescription[{'anyOf': [{'maxLength': 50000, 'minLength': 2, 'type': 'string'}, {'type': 'null'}]}]", 'null']

followUp

If known, what is the typical time span that a patient appears in the dataset (follow up period)

title guidance is_list required type
Followup If known, please indicate the typical time span that a patient appears in the dataset (follow up period).
-0 - 6 MONTHS: Data typically available for a patient over a 0-6 month period.
-6 - 12 MONTHS: Data typically available for a patient over a 6-12 month period.
-1 - 10 YEARS: Data typically available for a patient over a 1-10 year period.
-> 10 YEARS: Data typically available for a patient for over a 10 year period.
-CONTINUOUS: Data for patients is being regularly added to and updated.
-UNKNOWN: Timespan is Unknown.
-OTHER: Data available for a patient over another time period.
False False ["FollowupV2['0 - 6 Months','6 - 12 Months','1 - 10 Years','> 10 Years','Unknown','Continuous','Other',null]", 'null']

typicalAgeRange

Please indicate the age range in whole years of participants in the dataset. Please provide range in the following format '[min age] – [max age]' where both the minimum and maximum are whole numbers (integers).

title guidance is_list required type
Age Range False False ["AgeRange[{'anyOf': [{'pattern': 'Not Known

Examples:

  • 18-90

datasetCompleteness

The URL where a Researcher can learn more about the completeness of the dataset.

title guidance is_list required type
Dataset coverage/completeness/quality If your organisation has a publicly available site which contains information on the completeness of a dataset, add that URL here.
Example: https://bhfdatasciencecentre.org/dashboard/
False False ["Url[{'anyOf': [{'format': 'uri', 'minLength': 1, 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • https://bhfdatasciencecentre.org/dashboard/

provenance

Provenance information allows researchers to understand data within the context of its origins and can be an indicator of quality, authenticity and timeliness.

origin

None

purpose

Please indicate the purpose(s) that the dataset was collected.

title guidance is_list required type
Purpose - Research cohort: Data collected for a defined group of people.
- Study: Data collected for a specific research study.
- Disease registry: Data collected as part of a disease registry.
- Trial: Data collected for as part of a clinical trial.
- Care: Data collected as part of routine clinical care.
- Audit: Data collected as part of an audit programme.
- Administrative: Data collected for administrative and management information purposes.
- Financial: Data collected either for payments or for billing.
- Statutory: Data collected in compliance with statutory requirements.
- Other: Data collected for other purpose.
False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

source

Please indicate the source of the data extraction

title guidance is_list required type
Source - EPR: Data Extracted from Electronic Patient Record.
- Electronic survey: Data has been extracted from electronic surveys.
- LIMS: Data has been extracted from a laboratory information management system.
- Paper-based: Data has been extracted from paper forms.
- Free text NLP: Data has been extracted from unstructured freetext using natural language processing.
- Machine generated: Data has been machine generated i.e. imaging.
- Other: Data has been extracted by other means.
False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

collectionSituation

Please indicate the setting(s) where data was collected. Multiple settings may be provided

title guidance is_list required type
Collection Situation Setting False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

imageContrast

Indicate whether usage of imaging contrast is captured within the dataset.

title guidance is_list required type
Image contrast If any contrast media or contrast agents were used in creating the images within the dataset and the contrast is known, mark 'Yes'. If this information is not known or not captured, indicate 'Not stated'. If there was no contrast used in the images, mark 'No'. False False ["Ternary['Yes','No','Not stated']", 'null']

temporal

None

startDate

The start of the time period that the dataset provides coverage for. If there are multiple cohorts in the dataset with varying start dates, please provide the earliest date and use the description or the media attribute to provide more information.

title guidance is_list required type
Start Date - The start of the time period that the dataset provides coverage for.
- If there are multiple cohorts in the dataset with varying start dates, please provide the earliest date and use the description or the media attribute to provide more information.
False False ['date', 'datetime', 'null']

endDate

The end of the time period that the dataset provides coverage for. If the dataset is “Continuous” and has no known end date, please state continuous. If there are multiple cohorts in the dataset with varying end dates, please provide the latest date and use the description or the media attribute to provide more information.'

title guidance is_list required type
End Date - The end of the time period that the dataset provides coverage for.
- If the dataset is “Continuous” and has no known end date, please leave blank.
- If there are multiple cohorts in the dataset with varying end dates, please provide the latest date.
False False ['date', 'datetime', 'null']

timeLag

Please indicate the typical time-lag between an event and the data for that event appearing in the dataset

title guidance is_list required type
Time Lag Please indicate the typical time-lag between an event and the data for that event appearing in the dataset.
- Less than 1 week: Typical time lag of less than a week.
- 1-2 weeks: Typical time-lag of one to two weeks.
- 2-4 weeks: Typical time-lag of two to four weeks.
- 1-2 months: Typical time-lag of one to two months.
- 2-6 months: Typical time-lag of two to six months.
- 6 months plus: Typical time-lag of more than six months.
- Variable: Variable time-lag.
- Not applicable: Not Applicable i.e. static dataset.
- Other: Other time-lag.
False True ["TimeLagV2['Less than 1 week','1-2 weeks','2-4 weeks','1-2 months','2-6 months','More than 6 months','Variable','Not applicable','Other']"]

accrualPeriodicity

Please indicate the frequency of distribution release. If a dataset is distributed regularly please choose a distribution release periodicity from the constrained list and indicate the next release date. When the release date becomes historical, a new release date will be calculated based on the publishing periodicity. If a dataset has been published and will remain static please indicate that it is static and indicated when it was released. If a dataset is released on an irregular basis or “on-demand” please indicate that it is Irregular and leave release date as null. If a dataset can be published in real-time or near-real-time please indicate that it is continuous and leave release date as null. Notes: see https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/

title guidance is_list required type
Periodicity False True ["PeriodicityV2['Static','Irregular','Continuous','Biennial','Annual','Biannual','Quarterly','Bimonthly','Monthly','Biweekly','Weekly','Twice a week','Daily','Other',null]"]

distributionReleaseDate

Date of the latest release of the dataset. If this is a regular release i.e. quarterly, or this is a static dataset please complete this alongside Periodicity. If this is Irregular or Continuously released please leave this blank. Notes: Periodicity and release date will be used to determine when the next release is expected. E.g. if the release date is documented as 01/01/2020 and it is now 20/04/2020 and there is a quarterly release schedule, the latest release will be calculated as 01/04/2020.

title guidance is_list required type
Release Date - Please indicate the frequency the dataset is published.
- If a dataset is published regularly please choose a publishing periodicity from the constrained list and indicate the next release date.
- When the release date becomes historical, a new release date will be calculated based on the publishing periodicity.
- If a dataset has been published and will remain static please indicate that it is static and indicate when it was released.
- If a dataset is released on an irregular basis or “on-demand” please indicate that it is Irregular and leave release date as null.
- If a dataset can be published in real-time or near-real-time please indicate that it is continuous and leave release date as null.
- Notes: see https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/
False False ['date', 'datetime', 'null']

accessibility

Accessibility information allows researchers to understand access, usage, limitations, formats, standards and linkage or interoperability with toolsets.

usage

This section includes information about how the data can be used and how it is currently being used

dataUseLimitation

Please provide an indication of consent permissions for datasets and/or materials, and relates to the purposes for which datasets and/or material might be removed, stored or used. NOTE: we have extended the DUO to include a value for NO LINKAGE

title guidance is_list required type
Data Use Limitation Please provide an indication of consent permissions for datasets and/or materials, and relates to the purposes for which datasets and/or material might be removed, stored or used.
- General research use: This data use limitation indicates that use is allowed for general research use for any research purpose.
- Genetic studies only: This data use limitation indicates that use is limited to genetic studies only (i.e., no phenotype-only research).
- No general methods research: This data use limitation indicates that use includes methods development research(e.g., development of software or algorithms) only within the bounds of other use limitations.
- No restriction: This data use limitation indicates there is no restriction on use.
- Research-specific restrictions: This data use limitation indicates that use is limited to studies of a certain research type.
- Research use only: This data use limitation indicates that use is limited to research purposes (e.g., does not include its use in clinical care).
- No linkage: This data use limitation indicates there is a restriction on linking to any other datasets
False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

dataUseRequirement

Please indicate fit here are any additional conditions set for use if any, multiple requirements may be provided. Please ensure that these restrictions are documented in access rights information.

title guidance is_list required type
Data Use Requirements - Please indicate if there are any additional conditions set for use if any, multiple requirements may be provided.
- Please ensure that these restrictions are documented in access rights information.
- Collaboration required: This requirement indicates that the requestor must either agree to join a research consortium or collaborate with the primary study investigator(s).
- Ethics approval required: This requirement indicates that the requestor must provide documentation of local institutional review board (IRB)/ ethics review board (ERB) approval.
- Geographical restrictions: This requirement indicates that use is limited to within a specific geographic region.
- Institution-specific restrictions: This requirement indicates that use is limited to use within an approved institution.
- Not for profit use: This requirement indicates that use of the data is limited to not-for-profit organizations and not-for-profit use, non-commercial use.
- Project-specific restrictions: This requirement indicates that use is limited to use within an approved project.
- Publication moratorium: This requirement indicates that requestor agrees not to publish results of studies until a specific date.
- Publication required: This requirement indicates that requestor agrees to make results of studies using the data available to the larger scientific community.
- Return to database or resource: This requirement indicates that the requestor must return derived/enriched data to the database/resource.
- Time limit on use: This requirement indicates that use is approved for a specific number of months.
- User-specific restriction: This requirement indicates that use is limited to use by approved users.
False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

resourceCreator

Please provide the text that you would like included as part of any citation that credits this dataset. This is typically just the name of the publisher. No employee details should be provided.'

Examples:

  • National Services Scotland
name

Name of the organisation

title guidance is_list required type
Organisation Name False False ['Name[{}]', 'null']
gatewayId

Identifier on the gateway

title guidance is_list required type
Organisation Gateway Identifier False False ['str', 'null']
rorId

The Research Organization Registry (ROR) for the organisation, if applicable

title guidance is_list required type
Research Organization Registry Identifier False False ['str', 'null']

access

This section includes information about data access

accessRights

Please provide details for the data access rights

title guidance is_list required type
Access Rights - The URL of a webpage where the data access request process and/or guidance is provided. If there is more than one access process i.e. industry vs academic please provide both separated by a comma.
- If such a resource or the underlying process doesn’t exist, please provide “In Progress”, until both the process and the documentation are ready.
False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

accessService

Please provide a brief description of the data access services that are available including: environment that is currently available to researchers;additional consultancy and services;any indication of costs associated. If no environment is currently available, please indicate the current plans and timelines when and how data will be made available to researchers Note: This value will be used as default access environment for all datasets submitted by the organisation. However, there will be the opportunity to overwrite this value for each dataset.

title guidance is_list required type
Access Service Please provide a brief description of the data access services that are available including:
- environment that is currently available to researchers
- additional consultancy and services
- any indication of costs associated

If no environment is currently available, please indicate the current plans and timelines when and how data will be made available to researchers.
Note: This value will be used as default access environment for all datasets submitted by the organisation. However, there will be the opportunity to overwrite this value for each dataset.
False False ["LongDescription[{'anyOf': [{'maxLength': 50000, 'minLength': 2, 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • https://cnfl.extge.co.uk/display/GERE/Research+Environment+User+Guide

accessRequestCost

Please provide link(s) to a webpage detailing the commercial model for processing data access requests for the organisation (if available) Definition: Indication of commercial model or cost (in GBP) for processing each data access request by the data custodian.

title guidance is_list required type
Organisation Access Request Cost This information should cover the costs and/or services available to different audiences (i.e. academic, commercial, non-UK, etc.). This can be in the form of text or a URL. False False ["LongDescription[{'anyOf': [{'maxLength': 50000, 'minLength': 2, 'type': 'string'}, {'type': 'null'}]}]", 'null']

deliveryLeadTime

Please provide an indication of the typical processing times based on the types of requests typically received.

title guidance is_list required type
Access Request Duration - Less than 1 week: Access request process typically processed in less than a week.
- 1-2 weeks: Access request process typically processed in one to two weeks.
- 2-4 weeks: Access request process typically processed in two to four weeks.
- 1-2 months: Access request process typically processed in one to two months.
- 2-6 months: Access request process typically processed in two to six months.
- More than 6 months: Access request process typically processed in more than six months.
- Variable: Access request lead time is variable.
- Not applicable: Access request process duration is not applicable.
- Other: If the typical timeframe does not fit into the broad ranges i.e. lightweight application vs linked data application, please choose “Other” and indicate the typical timeframe within the description for the dataset.
False False ["DeliveryLeadTimeV2['Less than 1 week','1-2 weeks','2-4 weeks','1-2 months','2-6 months','More than 6 months','Variable','Not applicable','Other']", 'null']

jurisdiction

Please use country code from ISO 3166-1 country codes and the associated ISO 3166-2 for regions, cities, states etc. for the country/state under whose laws the data subjects' data is collected, processed and stored.

title guidance is_list required type
Jurisdiction A full list of country codes can be found here (alpha-2 column): https://www.iso.org/obp/ui/#search/code/ False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

dataController

Data Controller means a person/entity who (either alone or jointly or in common with other persons/entities) determines the purposes for which and the way any Data Subject data, specifically personal data or are to be processed.

title guidance is_list required type
Data Controller - Data Controller means a person/entity who (either alone or jointly or in common with other persons/entities) determines the purposes for which and the way any Data Subject data, specifically personal data or are to be processed.
- Notes: For most organisations this will be the same as the Data Custodian of the dataset. If this is not the case, please indicate that there is a different controller.
- If there is a different controller please complete the Data Processor attribute to indicate if the Data Custodian is a Processor rather than the Data Controller.
- In some cases, there may be multiple Data Controllers i.e. GP data. If this is the case, please indicate the fact in a free-text field and describe the data sharing arrangement or a link to it, so that this can be understood by research users.
- Example: NHS England
False False ["LongDescription[{'anyOf': [{'maxLength': 50000, 'minLength': 2, 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • NHS England

dataProcessor

A Data Processor, in relation to any Data Subject data, specifically personal data, means any person/entity (other than an employee of the data controller) who processes the data on behalf of the data controller.

title guidance is_list required type
Data Processor A Data Processor, in relation to any Data Subject data, specifically personal data, means any person/entity (other than an employee of the data controller) who processes the data on behalf of the data controller.
- Notes: Required to complete if the Data Custodian is the Data Processor rather than the Data Controller.
- If the Publisher is also the Data Controller please provide “Not Applicable”.
- Examples: Not Applicable, SAIL
False False ["LongDescription[{'anyOf': [{'maxLength': 50000, 'minLength': 2, 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • Not Applicable
  • SAIL

accessServiceCategory

Where access to data come from: TRE/SED, direct access, open acccess, varies based on project.

title guidance is_list required type
Access/governance requirements Select the category which best matches how a Researcher will access the dataset, if approved for access. If the access method changes based on the data required for the project (e.g. the dataset can be shared via secure email if the extract is fully anonymised, but must be accessed via a TRE/SDE if the extract is only pseudonymised) then select 'varies based on project'. False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • TRE/SDE

formatAndStandards

Section includes technical attributes for language vocabularies, sizes etc. and gives researchers facts about and processing the underlying data in the dataset.

vocabularyEncodingSchemes

Code value of the ontology vocabulary encoding

title guidance is_list required type
Controlled Vocabulary False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • OPCS4,NHS NATIONAL CODES,ICD10,OTHER

conformsTo

What the vocabulary conforms to.

title guidance is_list required type
Conforms To - List standardised data models that the dataset has been stored in or transformed to, such as OMOP or FHIR.
- If the data is only available in a local format, please make that explicit. If you are using a standard that has not been included in the list, please use “other” and contact support desk to ask for an addition.
- HL7 FHIR: https://www.hl7.org/fhir/.
- HL7 V2: https://www.hl7.org/implement/standards/product_section.cfm?section=13.
- HL7 CDA: https://www.hl7.org/implement/standards/product_section.cfm?section=10.
- HL7 CCOW: https://www.hl7.org/implement/standards/product_section.cfm?section=16.
- DICOM: https://www.dicomstandard.org/.
- I2B2: https://www.i2b2.org/.
- IHE: https://www.ihe.net/resources/profiles/.
- OMOP: https://www.ohdsi.org/data-standardization/the-common-data-model/.
- openEHR: https://www.openehr.org/.
- Sentinel: https://www.sentinelinitiative.org/sentinel/data/distributed-database-common-data-model.
- PCORnet: https://pcornet.org/data-driven-common-model/.
- CDISC: https://www.cdisc.org/standards/data-exchange/odm.
- Local: In-house developed data model.
- Other: Other standardised data model.
- NHS Data Dictionary: https://www.datadictionary.nhs.uk/.
- NHS Scotland Data Dictionary: https://www.ndc.scot.nhs.uk/Data-Dictionary/.
- NHS Wales Data Dictionary: https://www.datadictionary.wales.nhs.uk/.
False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • LOCAL,NHS DATA DICTIONARY

languages

Language code(s) of the language of the dataset metadata and underlying data is made available.

title guidance is_list required type
Language Code(s) False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • en

formats

Format(s) the dataset can be made available in

title guidance is_list required type
Dataset Format False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • CSV,JSON,SQL database table

linkage

Metadata for various linkages with datasets and other gateway entities

isGeneratedUsing

??

title guidance is_list required type
Is Generated Using False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

associatedMedia

Any media associated with the Gateway Organisation using a valid URI for the content. This is an opportunity to provide additional context that could be useful for researchers wanting to understand more about the dataset and its relevance to their research question

title guidance is_list required type
Associated Media False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • https://popdatasci.swan.ac.uk/centres-of-excellence/sail/,https://www.youtube.com/watch?v=ZK9-Jw3uVkw,https://saildatabank.com/,https://saildatabank.com/about-us/

dataUses

??

title guidance is_list required type
Data Uses False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

isReferenceIn

The keystone paper associated with the dataset. Also include a list of known citations, if available and should be links to existing resources where the dataset has been used or referenced.',

title guidance is_list required type
Is Reference in False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

tools

URL of any analysis tools or models that have been created for this dataset and are available for further use

title guidance is_list required type
Tools False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • https://conceptlibrary.saildatabank.com/

datasetLinkage

Dataset Linkage copied over from

isDerivedFrom

Indicate if derived datasets or predefined extracts are available and the type of derivation available. Notes. Single or multiple dimensions can be provided as a derived extract alongside the dataset

Examples:

  • Data will be minimised as appropriate relative to the data access application
pid

None

title guidance is_list required type
Persistent identifier of a dataset False False ["OneHundredFiftyCharacters[{'maxLength': 150, 'minLength': 2, 'type': 'string'}]", 'null']
title

None

title guidance is_list required type
Title of a dataset False False ["OneHundredFiftyCharacters[{'maxLength': 150, 'minLength': 2, 'type': 'string'}]", 'null']
url

None

title guidance is_list required type
Url of a dataset False False ["Url[{'anyOf': [{'format': 'uri', 'minLength': 1, 'type': 'string'}, {'type': 'null'}]}]", 'null']

isPartOf

If the dataset is part of a group or family

Examples:

  • UKCRC Tissue Directory and Coordination Centre
pid

None

title guidance is_list required type
Persistent identifier of a dataset False False ["OneHundredFiftyCharacters[{'maxLength': 150, 'minLength': 2, 'type': 'string'}]", 'null']
title

None

title guidance is_list required type
Title of a dataset False False ["OneHundredFiftyCharacters[{'maxLength': 150, 'minLength': 2, 'type': 'string'}]", 'null']
url

None

title guidance is_list required type
Url of a dataset False False ["Url[{'anyOf': [{'format': 'uri', 'minLength': 1, 'type': 'string'}, {'type': 'null'}]}]", 'null']

linkedDatasets

Links to other datasets.

Examples:

  • Yes. To any SAIL dataset & reference data.,ALL
pid

None

title guidance is_list required type
Persistent identifier of a dataset False False ["OneHundredFiftyCharacters[{'maxLength': 150, 'minLength': 2, 'type': 'string'}]", 'null']
title

None

title guidance is_list required type
Title of a dataset False False ["OneHundredFiftyCharacters[{'maxLength': 150, 'minLength': 2, 'type': 'string'}]", 'null']
url

None

title guidance is_list required type
Url of a dataset False False ["Url[{'anyOf': [{'format': 'uri', 'minLength': 1, 'type': 'string'}, {'type': 'null'}]}]", 'null']

isMemberOf

Dataset is a member of XXX(?)

pid

None

title guidance is_list required type
Persistent identifier of a dataset False False ["OneHundredFiftyCharacters[{'maxLength': 150, 'minLength': 2, 'type': 'string'}]", 'null']
title

None

title guidance is_list required type
Title of a dataset False False ["OneHundredFiftyCharacters[{'maxLength': 150, 'minLength': 2, 'type': 'string'}]", 'null']
url

None

title guidance is_list required type
Url of a dataset False False ["Url[{'anyOf': [{'format': 'uri', 'minLength': 1, 'type': 'string'}, {'type': 'null'}]}]", 'null']

investigations

Please provide the keystone paper associated with the dataset.

title guidance is_list required type
Investigations False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

Examples:

  • https://digital.nhs.uk/services/data-access-request-service-dars/register-of-approved-data-releases

Links to locations of information and or raw downloads of synthetic data associated with this dataset

title guidance is_list required type
Synthetic Data Web Links False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

publicationAboutDataset

DOIs for publications which describe the dataset.

title guidance is_list required type
Publication about the dataset True False ["Doi[{'anyOf': [{'pattern': '^10.\\d{4,9}/[-._;()/:a-zA-Z0-9]+$', 'type': 'string'}, {'type': 'null'}]}]"]

publicationUsingDataset

DOIs for publications which use the dataset for analysis.

title guidance is_list required type
Publication using the dataset True False ["Doi[{'anyOf': [{'pattern': '^10.\\d{4,9}/[-._;()/:a-zA-Z0-9]+$', 'type': 'string'}, {'type': 'null'}]}]"]

observations

Multiple observations about the dataset may be provided and users are expected to provide at least one observation
(1..*). We will be supporting the schema.org observation model (https://schema.org/Observation) with default values. Users will be encouraged to provide their own statistical populations as the project progresses.
Example:
<b> Statistical Population 1
</b> type: StatisticalPopulation populationType: Persons numConstraints: 0
<b> Statistical Population 2 </b> type: StatisticalPopulation populationType: Events numConstraints: 0 <b> Statistical Population 3 </b> type: StatisticalPopulation populationType: Findings numConstraints: 0 typeOf: Observation observedNode: <b> Statistical Population 1 </b> measuredProperty: count measuredValue: 32937 observationDate: “2017”"

observedNode

Please select one of the following statistical populations for you observation

title guidance is_list required type
Statistical Population - Persons: Unique persons recorded in the dataset
- Events: Unique events such as procedures and prescriptions within the dataset
-Findings: Unique findings included in the dataset such as diagnoses'
-Number of scans per modality: Unique scans for a specified imaging method modality (e.g. 12 x-rays)
False True ["StatisticalPopulationConstrainedV2['Persons','Events','Findings','Number of scans per modality']"]

Examples:

  • PERSONS

measuredValue

Please provide the population size associated with the population type the dataset i.e. 1000 people in a study, or 87 images (MRI) of Knee Usage Note: Used with Statistical Population, which specifies the type of the population in the dataset.

title guidance is_list required type
Measured Value An integer value size of the measured property, such as ‘1000’ for 1000 people in the study or ‘87’ for 87 MRI scans in the dataset. False True ['int']

disambiguatingDescription

If SNOMED CT term does not provide sufficient detail, please provide a description that disambiguates the population type.

title guidance is_list required type
Disambiguating Description If required please provide additional details that help distinguish between similar measured properties within your dataset, for example this is useful when SNOMED CT terms do not provide sufficient detail to distinguish between parts of the dataset population. False False ["AbstractText[{'anyOf': [{'maxLength': 500, 'minLength': 5, 'type': 'string'}, {'type': 'null'}]}]", 'null']

observationDate

Please provide the date that the observation was made. Some datasets may be continuously updated and the number of records will change regularly, so the observation date provides users with the date that the analysis or query was run to generate the particular observation. Multiple observations can be made i.e. an observation of cumulative COVID positive cases by specimen on the 1/1/2021 could be 2M. On the 8/1/2021 a new observation could be 2.1M. Users can add multiple observations.

title guidance is_list required type
Observation Date Provide the date, or datetime that the observation was made. Multiple observations of the same property can be provided, for example an observation of cumulative COVID positive cases by specimen on the 1/1/2021 with a measuredValue of 2000000, and a second observation entry on 8/2/2021 recording a measuredValue of as 3100000. False True ['date', 'datetime']

measuredProperty

Initially this will be defaulted to "COUNT"

title guidance is_list required type
Measured Property Descriptive term for the observation property measured. False True ['MeasuredProperty[{}]']

structuralMetadata

Descriptions of all tables and data elements that can be included in the dataset

name

The name of a table in a dataset.

title guidance is_list required type
Table Name False False ['str', 'null']

description

A description of a table in a dataset.

title guidance is_list required type
Table Description' False False ['str', 'null']

columns

A list of columns contained within a table in a dataset.

name

The name of a column in a table.

title guidance is_list required type
Column Name False True ['Name[{}]']

dataType

The name of a column in a table.

title guidance is_list required type
Column Name False True ['str']

description

A description of a column in a table.

title guidance is_list required type
Column Description False False ['str', 'null']

sensitive

A True or False value, indicating if the field is sensitive or not

title guidance is_list required type
Sensitive False True ['bool']

values

values in a dataset

name

Unique value in a column .

title guidance is_list required type
Value Name False True ['Name[{}]']
description

A description of a unique value in a column.

title guidance is_list required type
Value Description False False ['str', 'null']
frequency

The frequency of occurrance of a value in a column

title guidance is_list required type
Value Frequency False False ['int', 'null']

tissuesSampleCollection

metedata for tissue samples

id

ID of the tissue sample collection

title guidance is_list required type
ID False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

dataCategories

The type of data that is associated with the samples in the study. Can be several values MIABIS-2.0-13

title guidance is_list required type
Data Categories False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

materialType

The biospecimen saved from a biological entity for propagation e.g. testing, diagnostics, treatment or research purposes. Can be several values MIABIS-2.0-14

title guidance is_list required type
Material Type False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

accessConditions

Access conditions for the tissue sample collection

title guidance is_list required type
Access Conditions False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

collectionType

The type of the sample collection. Can be several values MIABIS-2.0-16

title guidance is_list required type
Collection Type False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

disease

Disease associated with the tissue sample collection

title guidance is_list required type
Disease False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

storageTemperature

Storage temperature of the tissue sample collection

title guidance is_list required type
Storage Temperature False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

sampleAgeRange

Age range of the tissue sample collection

title guidance is_list required type
Sample Age Range False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

tissueSampleMetadata

Metadata related to the tissue sample

id

ID of the tissue sample metadata

title guidance is_list required type
Metadata ID False False ['str', 'null']

sampleDonor

Information about the sample donor

id

ID of the sample donor

title guidance is_list required type
Donor ID False False ['str', 'null']
sex

Sex of the sample donor

title guidance is_list required type
Donor Sex False False ['str', 'null']
birthDate

Date of birth of the sample donor

title guidance is_list required type
Donor birth date False False ['date', 'datetime', 'null']
dataCategories

Data categories related to the sample donor

title guidance is_list required type
Donor Data Categories False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

sampleType

Type of the tissue sample

title guidance is_list required type
Sample Type False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

storageTemperature

Storage temperature of the tissue sample

title guidance is_list required type
Storage Temperature False False ['str', 'null']

creationDate

Date when the tissue sample metadata was created

title guidance is_list required type
Creation Date False False ['date', 'datetime', 'null']

anatomicalSiteOntologyCode

Ontology code for the anatomical site, this code must match an ICD-0-3 format

title guidance is_list required type
Anatomical Site Ontology Code False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

anatomicalSiteOntologyDescription

Ontology description for the anatomical site

title guidance is_list required type
Anatomical Site Ontology Description False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

anatomicalSiteFreeText

Free text describing the anatomical site

title guidance is_list required type
Anatomical Site Free Text False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

sampleContentDiagnosis

Diagnosis related to the sample content

title guidance is_list required type
Sample Content Diagnosis False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

useRestrictions

Restrictions on the use of the tissue sample

title guidance is_list required type
Use Restrictions False False ["CommaSeparatedValues[{'anyOf': [{'pattern': '([^,]+)', 'type': 'string'}, {'type': 'null'}]}]", 'null']

demographicFrequency

An object containing demographic frequency data categorised by age, ethnicity, and disease attributes.

age

Array of age bins and their corresponding counts.

bin

None

title guidance is_list required type
Age bin False True ["AgeEnum['0-6 days','7-27 days','1-11 months','1-4 years','5-9 years','10-14 years','15-19 years','20-24 years','25-29 years','30-34 years','35-39 years','40-44 years','45-49 years','50-54 years','55-59 years','60-64 years','65-69 years','70-74 years','75-79 years','80-84 years','85-89 years','90-94 years','95-99 years','100+ years']"]

count

None

title guidance is_list required type
Age count False True ['int']

ethnicity

Array of ethnicity bins and their corresponding counts.

bin

None

title guidance is_list required type
Ethnicity bin False True ["EthnicityEnum['White - British','White - Irish','White - Any other White background','Mixed - White and Black Caribbean','Mixed - White and Black African','Mixed - White and Asian','Mixed - Any other mixed background','Asian or Asian British - Indian','Asian or Asian British - Pakistani','Asian or Asian British - Bangladeshi','Asian or Asian British - Any other Asian background','Black or Black British - Caribbean','Black or Black British - African','Black or Black British - Any other Black background','Other Ethnic Groups - Chinese','Other Ethnic Groups - Any other ethnic group','Not stated','Not known']"]

count

None

title guidance is_list required type
Ethnicity count False True ['int']

disease

Array of diseases and their corresponding counts.

diseaseCode

None

title guidance is_list required type
Disease code False True ['str', 'int']

diseaseCodeVocabulary

None

title guidance is_list required type
Disease code vocabulary False True ["DiseaseCodeEnum['ICD10','SNOMED CT','MeSH']"]

count

None

title guidance is_list required type
Disease count False True ['int']

omics

Omics

assay

The specific 'omics assay that generated the dataset.

title guidance is_list required type
Omics assay The specific 'omics assay that generated the dataset. If the assay used to generate your dataset is not listed, please contract the gateway team by submitting an enquiry. False False ["Assay['NMR spectroscopy','Mass-spectrometry','Whole genome sequencing','Exome sequencing','Genotyping by array','Transcriptome profiling by high-throughput sequencing','Transcriptome profiling by array','Amplicon sequencing','Methylation binding domain sequencing','Methylation profiling by high-throughput sequencing','Genomic variant calling','Chromatin accessibility profiling by high-throughput sequencing','Histone modification profiling by high-throughput sequencing','Chromatin immunoprecipitation sequencing','Whole genome shotgun sequencing','Whole transcriptome sequencing','Targeted mutation analysis']", 'null']

platform

The specific technology or infrastructure used to perform the assay. If the omics platform used to create your dataset is not listed, please select other, a member of the gateway team will contact you to add an appropriate term(s) both to your record and to the metadata schema on your behalf.

title guidance is_list required type
Omics Platform The specific technology or infrastructure used to perform the assay. If the omics platform used to create your dataset is not listed, please select other, a member of the gateway team will contact you to add an appropriate term(s) both to your record and to the metadata schema on your behalf. False False ["Platform['Other','NMR Nightingale','Metabolon','Biocrates','Illumina','Oxford Nanopore','454','Hi-C','HiFi']", 'null']