Putting the trust in data trusts

A report by Register Dynamics • 14 April 2019

Data trusts have shown themselves to have many and varied potential benefits: they can increase collaboration, make data use more transparent and accountable, and unlock potential economic value of data through making it more available. The independence and legally binding responsibilities of the data trust’s trustees lets them balance economic incentives with social responsibilities and bring about data sharing that delivers on the data trust’s purposes and works for everyone: data users, data providers and society as a whole.

We do not get these benefits for free – the extra governance of a data trust comes with a cost. For data trusts to be successful, the benefits need to not only outweigh the costs but also be so good that data trusts significantly outperform other methods of sharing data. We believe that in order to be able to offer these compelling advantages, the control and enforcement powers that trustees have needs to be high – high enough that misusing the data in trust is prohibitively difficult.

When data sharing goes wrong

We don’t have to look far to see examples of data sharing going badly wrong. “Cambridge Analytica” has become a household name associated with misuse of personal information and breach of public trust. The UK Government also got into hot water over the sharing of NHS health data with the Home Office for immigration enforcement purposes in the Windrush scandal.

Having a contract or agreement between data providers and data users has frequently been shown to be insufficient. In the Cambridge Analytica example, the contract terms were completely ignored and personal data was misused. Facebook’s defence was that the parties involved had acted outside of the terms of the data share and Facebook was therefore not at fault. The lack of any hard, technical evidence may deprive courts of useful information and has made it difficult for regulators, politicians, journalists and the general public to understand what happened.

This doesn’t give much confidence to individuals – without proactive enforcement of their terms, it is difficult to see how Facebook will prevent this behaviour from occurring again.

Traditionally, our most valuable assets have been financial and severe penalties for mismanaging them have been enforced. Over the last 20 years, data has become almost equivalently valuable. The best technique that society has come up with for holding people to account whilst they look after our most valuable assets is auditing.

Management of financial assets is subject to the most intense scrutiny – auditing is an expert profession into which a high level of trust is placed. As data continues to become more valuable, we will require a means to audit its use – a way to actually check how data is used and whether this matches what we expect. Data subjects need sharing to be audited to have confidence that if their data is misused, it can be stopped quickly and the impact understood.

Reputation is everything

Facebook’s reputation suffered greatly from the negative publicity despite their claims of innocence. Clearly, organisations looking to do similar data sharing are concerned about their own reputation and corporate risk. Their response to this has been to increase their assurance processes which are time-consuming and expensive. If they were able to audit and know categorically that their sharing partners and their own employees were not acting improperly – and to show that this was the case – the associated impact could be dramatically lower and they could save time and energy. Showing this could consist of publicly publishing some information or by being vouched for by an auditor.

Reputation usually comes from someone vouching that a person has conducted themselves properly in the past. In the case of high value financial transactions, the person doing the vouching is usually an external auditor. The auditor needs to be convinced that the conduct was proper even though they were not there to witness it first-hand. As the auditor is a professional who is expert in financial matters, when they vouch for a person they bolster that person’s reputation in a meaningful way.

The financial records of the person or institution they are auditing, along with supporting records from other institutions such as customers, suppliers and banks, provide the hard evidence for the audit. If the data-use records of a data trust can be made robust and complete enough to be auditable then external audit can provide similar reputational benefits to data providers and data users in data handling circles. This will make data trusts, and similar collaborative approaches like data cooperatives, the vehicle of choice for data sharing and over time people who choose not to use them will miss out on reputational benefits and will look like they have questionable behaviour to hide.

In addition to all of this, it is important for trustees to be able to build their own reputation and show that they are fulfilling their purpose. Risk that the data providers or data users in the trust can misbehave in an invisible manner is reputational risk for the trustee. Being able to audit the activity on the data in trust is the most effective way to reduce this risk to their reputation.

Making audit practical

Audit brings benefits to everyone – beneficiaries, trustees and the general public – but no-one will choose to do something that isn’t pragmatic. On the surface, it is difficult to imagine how an audit process could work that wouldn’t require as much effort put in as the original data use itself. These benefits can only be unlocked if actually doing auditing represents a good return on the time and effort involved.

Enabling reliable and repeatable audit to be automated (and therefore cost-effective) is exactly the end goal of audit technology. The vision is to allow trustees or specialist auditors to have a complete view of the who, what and, most importantly, why of data use – something that would otherwise be too burdensome to be realistic.

The field is still developing. We are not yet near a general solution or mature set of methods. External audit of data use has to date not been widespread, and data trusts offer some of the first real world requirements for this. Hence, there hasn’t yet been significant demand to commoditise the technology out of the costly research and development space and into the product space.

However, lots of progress has been made by custom-designed solutions for specific problems and in identifying patterns that can be reused. We will examine some of these solutions as case studies. Our purpose is to show that strong auditing is possible today and encourage others to build things that reduce the cost further in the future. With better audit unlocked for more use cases, data trusts can fulfil their potential as a powerful model that allows organisations to make better use of their data whilst still ensuring appropriate use of and confidentiality of the sensitive data that’s involved.

Case studies

A large collection of keys with white and yellow heads arranged in rows on a green background.

Certificate Transparency and DeepMind Health

With their work on Certificate Transparency, Google engineers Emilia Kasper, Adam Langley and Ben Laurie have shown that it is possible to link rigid technical trust with fuzzy social trust. Their solution does this without needing more trust in new parties and without needing to change existing power structures, making pragmatic audit possible and easy to adopt.

Large white sign with blue text that says, "Digital services so good that people prefer to use them." The sign is in front of a wall with numerous small sticky notes attached to it.

Aquae: Personal data for cross-Government services

As part of the Personal Data Exchange programme run by the Government Digital Service, the UK Government developed a new model of personal data sharing called Aquae (Attributes, Questions, Answers and Eligibility). It allows public sector services to reuse personal data from around government whilst maintaining the privacy of data subjects, allowing data controllers to retain control of their data, minimise what is shared and to audit when and how data was used.

Large metallic vault door with a circular locking mechanism and a sign labeled 'Vault Door'.

The 5 Safes Framework

When a data trust contains arbitrary data undergoing arbitrary analysis, it is not always possible or feasible to set up data access mechanisms that are easy to audit, or to perform that audit automatically. The 5 Safes Framework proposes “Safe Settings” where the analysis is brought to the data, rather than the traditional practice of downloading data and analysing it locally. This allows the analysis to be secured and for the results to be audited at the point they leave the control of the trust.

Technology

A diagram explaining the process of hashing a sentence in cryptography. It shows five examples of input sentences, each followed by a cryptographic hash function and the resulting hash value in hexadecimal. The input sentences involve a fox and a blue dog, with variations including specific words like "red", "over", "ev", and "or".

Merkle tree-based logging

Certificate Transparency, DeepMind’s Verifiable Data Audit and GDS’s Aquae all rely on logging technology that is append-only and can support “proofs” and “redaction”. But what is this miracle technology? How does it store things in a way that supports strong auditing? And what does it really mean to have a “proof” or a “redacted” item? The answer is a _Merkle tree_.

Open black leather binder with handwritten notes in blue and black ink, placed on a wooden surface. The notes include dates, times, and various items, possibly a personal or business ledger.

The ledger: realising the benefits of external audit

Allowing data held in trust to be externally audited provides the hard evidence necessary for the enforcement of contracts and reputational benefits that can improve the data sharing ecosystem. But how does external auditing actually work, what information does it require, and how can it be applied to data?

Acknowledgements

This report was produced as part of a data trusts project funded by the UK Government’s Office for Artificial Intelligence and Innovate UK. It builds on research from the ODI’s Innovation programme funded by Innovate UK. The views in this report are those of the authors.

We would like to thank Rachel Wilson, Olivier Thereaux, Celia Killen and Peter Wells for their feedback and support.

This report is licensed under Creative Commons Attribution-ShareAlike 4.0 International.