Using Genomic Data to Support Cancer Research

16 Jun

Written By N

GENOMICS ENGLAND

Genomics England (GEL) is a company setup and owned by the Department of Health and Social Care to conduct genomic research. Genomics England began as a vessel to execute the UK Government's plan to sequence 100,000 whole genomes and incorporate genomic medicine into routine care in the NHS.

They build and maintain a library of whole genome sequences of people and tumours, which researchers study to find the mechanisms of rare diseases and cancer, improve diagnoses and treatment for patients, and help the NHS embed genomics into routine healthcare.

The challenge

Genomics England (GEL), the NHS and Genomic Medicine Service (GMS) share a vision to maximise the clinical utility of genomics - driving adoption and use so that everyone can benefit from genomics. As part of this vision, the Clinical Genomics Platform (CGP) wants to provide self-service capabilities for healthcare teams internally at GEL and across the NHS to access genomic and clinical data.

However, Genome sequences, like all medical data, are highly sensitive personal information. Enabling life-saving medical research into this data must be balanced against patient privacy, and any use of patient data can only be used within the scope of the consent given by that patient.

Genomics England wanted to see if it would be practical to use the genome discovery tool Beacon V2, protocol to provide controlled access to their Genomic Database to selected research teams. They wanted to allow relevant users across the NHS to be able to query genomic variant data held by GEL.

Genetic variants that aid in diagnosing rare conditions are often unique and their interpretation benefits from comparing data across patients with similar conditions. However, sharing this type of data securely and efficiently is difficult - hindered by manual, localised processes across 7 regional Genomic Laboratory Hubs.

Beacon V2 allows researchers to query a Genomic Database and ask questions like “Do you have any patients with this particular disease I’m studying, and have a particular genetic polymorphism I think might be connected?”, and depending on the level of access granted to that researcher, the database might reply with “Yes”, or “Yes, we have 53”, or “Yes, and here is a list of their pseudonymized participant IDs”. Even a simple “Yes” answer might be all a researcher needs to then enter negotiations to apply to request more information or run a specific trial with those patients.

What we did

Register Dynamics supplied a team of two Senior Engineers supported by a more Junior Engineer to form an independent research and development team. Given that the work was investigative in nature, an Agile approach was taken to allow the requirements for the work to evolve as the team learnt more.

The team Register Dynamics provided a team of engineers who investigated the way phenotypic (“This patient has this diagnosis”) and genotypic (“This patient has these base pairs in their DNA” or “This patient has these amino acids in the protein produced by a particular gene”) data were produced by GEL’s data processing pipelines, and studied the Beacon V2 protocol.

Beacon V2 is a very flexible protocol, adaptable to a wide variety of genomic search applications, so we established how to express GEL’s search requirements within Beacon V2, specifically as what Beacon calls an “individual” search endpoint - one that returns information about individuals matching the search criterion, as opposed to information about specific biosamples, genes, or other objects of interest.

With this established, we proceeded to build a proof-of-concept (PoC) of the core search interface in Beacon V2, as a Lambda function within GEL’s Amazon Web Services development account; this was chosen to fit in with existing technology in use at GEL, and for ease of access to the underlying data via AWS Athena.

Beacon V2 provides several different ways to express genomic queries, so we implemented each in turn, finishing with the powerful but complicated HGVS form used in Beacon “genomicAlelleShortForm” queries. While this work on the Beacon V2 API server progressed, we also built a Web frontend to demonstrate the capabilities of the Beacon V2 server to non-programmers, adding support for different types of genomic queries as they became available from the API server.

This proof-of-concept was built on top of a sample dataset of fabricated patient data, for safety, but we also used a much larger dataset of real patient data for a series of performance tests, to establish how the prototype would scale when used with the entire dataset. We identified a problem with how the data produced by the import pipeline is structured, forcing Athena to read most of the data for every query, which caused unacceptable performance and costs when the database was very large. We prototyped a method to restructure the data, and in tests found that it enabled Athena to narrow down the data required to satisfy typical queries to a reasonable amount, that would not increase as the database grew.

The result

We quickly demonstrated that it was possible to implement a Beacon V2 search interface to the data held by GEL, and produced a written report detailing the process, and steps that would be required to build a full Beacon V2 server suitable for opening up to production access. Our performance analysis was also used by another team within GEL who perform similar queries, and used as a basis to improve their performance.

The success of our prototype led to GEL immediately starting a project to publish their Cancer Genomic Data via Beacon V2.; This production service will enable cancer researchers to securely query GEL’s databases for connections between certain mutations in cancers and the type of cancer, and then initiate trials of new therapies.

Tags:

Using Genomic Data to Support Cancer Research

GENOMICS ENGLAND

The challenge

What we did

The result

Using data to reduce teacher shortages