HDR UK Datasets Integration

Introduction

The Gateway Metadata Integration (GMI) service enables data custodians to automate metadata transfer to the Gateway by configuring specific API endpoints. This technical guide provides instructions on how to set up and manage the GMI self-service on the Gateway. It also covers common pitfalls and error codes encountered during the integration testing process.

How to set up a GMI process

The following diagram (Fig 1) illustrates the steps involved in setting up a Gateway Metadata Integration process on the Gateway:

Fig 1: Gateway Metadata Integration process

Sign in to the Gateway with your preferred route. Make sure you have a Team set up on the Gateway. If you need assistance with this step, contact the HDR UK technology team using the link below.

https://www.healthdatagateway.org/about/contact-us

Step 2: Access to the Gateway GMI service

The GMI service is designed to enable data custodians to maintain datasets and integration independently. If you have the necessary permissions (Team Admininistrator or Developer), you can access the service by following these steps:

Go to Team Management > Integrations > Integration.
Click on “Create new Integration” to initiate the configuration (Fig 2).

Fig 2: Create a new Integration

Step 3: Create a new integration (integration configuration)

When creating a new integration, the following information needs to be provided:

Integration Type: Choose one of the available options (NOTE: only ‘datasets’ is currently available. In the future this might include data use register, tools, etc.)
Authentication Type: Select one of the authentication methods - API_Key, Bearer, or No_Auth.
- API_Key: Provides a simple way for APIs to verify the systems accessing them.
- Bearer: The service's script supports a static bearer token. It is strongly recommended to use HTTPS at all times to ensure security. If secure HTTP is not available, it is advisable not to use the service to prevent potential exploitation.
- NO_Auth: Choose this option when authentication is not required to access your catalogue.
Synchronisation Time: Specify the time at which the synchronisation process starts pulling data each day.
Base URL: Enter the main domain name of the API.
Datasets Endpoint: Provide the URL for listing all datasets available in the metadata catalogue.
Dataset Endpoint: Specify the URL that lists the latest version of metadata on the data custodians' servers. Please manually fill in this field to avoid making assumptions during the process.
Authorisation Token: (If API_Key or Bearer selected in "Authentication Type" above.) Enter the API key generated by the data custodian from their data server.
Notification Contacts: Add the relevant individuals for receiving notifications.

Once all the required fields are filled, click on “Save configuration” to store the information on the Gateway (Fig 3). The next step is to run a test to ensure the API connection works without any errors.

Fig 3: Integration configuration form

Step 4: Integration testing

The integration test covers two areas:

Testing the connection to the server as per the defined server details.
Verifying the given credentials for the authentication type provided.

If any of the above tests fail, an error message will be returned. If there are no errors, you can now enable the configuration, and the integration will go live (Fig 4).

**Note**: Configuration can only be enabled after a successful test.

Error Handling

Fig 4: Integration testing

If during normal operation the server changes or datasets are moved elsewhere, the integration may become invalid and GMI will disable it. In such cases, the synchronisation of datasets will cease, and you will receive a notification. To re-enable the integration, you will need to follow the configuration process again.

Error Codes

The GMI service utilises a list of error codes (Table 1). These error codes help in identifying and handling specific issues encountered during the integration testing process.

Error code	Message	Status
HTTP 200	Test Successful	Success
HTTP 400	Test Unsuccessful	Bad Request
HTTP 401	Test Unsuccessful	Unauthorized
HTTP 403	Test Unsuccessful	Forbidden
HTTP 404	Test Unsuccessful	Not Found
HTTP 500	Test Unsuccessful	Internal Server Error
HTTP 501	Test Unsuccessful	Not Implemented
HTTP 503	Test Unsuccessful	Gateway Timeout

Step 5: Manage integrations

Clicking on “Manage Integrations” displays a list of enabled and disabled integrations. This page provides an overview and allows for easy management and monitoring of the integrations (Fig 5).

Fig 5: Manage integrations

Custodian Datasets Endpoint

The HDR UK custodian specification has been developed for interoperability by providing a clear set of standards that can be followed to ensure that custodians can share metadata in a consistent format and meet the minimum requirements for sharing metadata within the community.

The Interface Diagram below (Fig 6) shows how the Gateway integration ingestion script handle and process metadata catalogues:

Fig 6: Integration script process metadata catalogues

The Gateway first contacts the /datasets endpoint you provide and interprets the response. It then compares the returned information with the existing records in the Gateway database. Based on the comparison, a decision will be made for each dataset on how the metadata will be handled. There are generally three scenarios:

1. New Dataset

If new data is detected through the ingestion script, it will be retrieved and stored in the Gateway database, and it will be made visible on the Gateway.

2. Updated Dataset

The Gateway ingestion script determines if a dataset has changed since the last synchronisation. It specifically compares the ID of the dataset and the version that was last provided with the current version. The script does not check for a newer version number but rather a different version number. This accounts for cases where a dataset may be reverted to a previous version. Updates to datasets are automatically made live on the Gateway, and the previous version of the dataset will be archived following existing Gateway processes.

3.Delete Dataset

The ingestion script can detect datasets that have been removed from the custodian metadata catalogue. If a dataset ID is no longer found in the /datasets endpoint, it will be considered a deleted dataset. A "deleted" dataset will be archived on the Gateway, along with all previous versions, and will no longer be visible on the Gateway following existing processes.