Why we need to know more about the UK government's COVID-19 data project

The UK's coronavirus contact tracing app has been kicked into the long grass, with the government now saying it isn't a priority and may not be ready until winter. The app—which has so far cost nearly £12 million – was supposed to be a key part of plans to identify and isolate anyone who had come into contact with someone reporting COVID-19 symptoms.

If the app does finally appear, it will now be based on a Google and Apple system, which means it won't store information in a central database. This had been the plan for the original government-developed system that had worried privacy researchers, including myself. But even if the app never gets off the ground, that shouldn't distract us from seeking more insight into what the government and a few companies with strong political connections are still doing with our health data.

I was one of nearly 200 UK information security and privacy academics who published a joint letter in April asking the government's digital health agency, NHSX, key questions about its plans for the app. At the time there was no data protection impact assessment (DPIA) – even the data privacy watchdog the Information Commissioner's Office (ICO) hadn't seen one.

There was no publicly available information on how the app would work or keep the data secure, and it was not clear that it would work at all. There was also no justification for the choice of a centralised data matching model that was intrinsically riskier to privacy.

We received answers to some of these soon after: an unsatisfactory DPIA, code for the app but not for the server, and a security analysis that included some justifications for centralised processing.

One of the purposes for the app was centralised planning for the COVID-19 response. In parallel, NHSX has been developing a "data dashboard" to manage all the data it is collecting for this purpose. The NHS website lists 59 sources of such data, several of which include records about individual patients, such as the Emergency Care Data Set.

Initially, Matthew Gould of NHSX claimed "all the data in the data store is anonymous". But that unlikely claim was corrected later with an acknowledgement that some data would be pseudonymous, meaning that combining it with other data could allow patients to be identified.

More worrying was the choice of partners by NHSX for this project. The data was to be stored on a platform developed by US company Palantir, which was originally funded by the CIA and counts numerous US government agencies as its customers. These include the FBI and the National Security Agency responsible for the secret government internet surveillance programme revealed by Edward Snowden.

Palantir's initial contract with the NHS, which reportedly didn't go to competitive tender in line with protocols introduced for the pandemic, charged a symbolic £1 for 45 engineers over three months. But it wasn't made clear how else the company would benefit. Palantir's UK operation is led by Louis Mosley, reportedly a former Tory activist.

The other contracted company, Faculty, has even stronger links to the government via Boris Johnson's chief adviser, Dominic Cummings, who gave it a key role in the Vote Leave campaign (under the firm's old name of AIS). The firm's director Marc Warner has also attended the government science advisory committee SAGE.

The inhabitants of the internet cobbled all this together into a nice conspiracy theory, which might be summarised as "the app is giving all our data to Dom's mates". This can be seen all over social media, for example in the responses to a popular tweet about our letter.

But while it appears the app is off the table—or at least that England and Wales will get a more privacy respectful one run by internet giants—there's still reason to be concerned about NHSX's use of patient data and how it's being shared with private firms. Palantir's original contract was published under legal pressure but its renewed contract has not. In particular, we do not know whether NHSX is paying Palantir properly this time.

We also know more clearly that there's a lot that we're not being told, as the government has only published a DPIA for data being combined and stored but not for how it is then being used for planning, including possibly through AI. The DPIA only assesses Palantir's role for data storage, and yet the firm's original contract also mentions "data analytics", "support tracking, surveillance, and reporting", and none of that is covered in the document. It also doesn't mention Faculty, which says it is working on data dashboards and modelling as part of its contract with NHSX.

Consultation with stakeholders and external experts is recommended for DPIAs, but none was done here. Even branches of the NHS in charge of health data handling, such as NHS Digital, do not appear to have been consulted.

Missing information

A DPIA should examine how the rights and freedoms of the people whose data is collected might be affected and ask: "What could possibly go wrong?" When you construct a large database including individual medical data, there are many possibilities for it to be used beyond its original function and for abuse, bias and unexpected harmful side-effects. Unfortunately, this DPIA only recognises low-level risks with their technical and organisational mitigations.

Overall, that leaves us in a position where we do not know what Palantir, Faculty and others are doing with NHS medical data. We do not know whether the risks of abuse of the data have been properly recognised and mitigated. But we do know that this kind of database is not protected against access by intelligence services.

A full DPIA for the NHSX's COVID-19 data operation might help. A more comprehensive solution would include a law to protect the pandemic-specific data programmes. But the proposal by the Joint Human Rights Committee has been rejected by the government. So for now, there's plenty still to worry about.

Provided by The Conversation