Interpreting data

A story broke earlier this week about the lockdown in Leicester (I was interviewed on this on BBC Radio 5, BBC Radio WM Birmingham, and BBC World Service on Tuesday).

It has turned out that the Public Health England has two streams of case reporting, with Pillar 1 broadly based on testing done in hospitals, and Pillar 2 done commercially. However, the two sources had not been combined in official reports (see this Financial Times article for an explanation). But the two data sources tell a completely different story:

From COVID-19_activity_Leicester_Final-report_010720_v3.pdf

Pillar 1 shows the epidemic to be following the (official) national trend, with the number of cases currently low and slowly declining since the ‘peak’ in April. Pillar 2 shows a completely different case, with the epidemic out of control since early June.

This is not only the case in Leicester, but in the whole of UK, as exemplified by people <19 years old, whereby Pillar 1 shows a nice decline in numbers and Pillar 2 shows a much more ongoing epidemic:

From COVID-19_activity_Leicester_Final-report_010720_v3.pdf

Now, to be fair, we all know that there are problems with interpreting data based on testing (see here for my take on it). But we at least hope that testing is somehow done representatively i.e. that it results somehow represent the actual trends, even if not the numbers (it is estimated that for every reported case, there are 3-10 unobserved cases).

What is worrying here is that in the case of Leicester data Pillar 2 results show a different trend to Pillar 1. This puts a question mark on any modelling that uses Pillar 1 data to predict the future of the outbreak. In particular, for the <19 years old, the reproductive number R would be well below 1 for Pillar 1, but much closer to 1 for the combined Pillar 1 and Pillar 2 data.

This has immense public health consequences as R is used to decide whether our control strategy is working or not, and the low values have been used to justify the reopening of the country.

It also questions the way in which the data are reported. Maybe it is a stretch of imagination, but there might be a temptation for some people in the authority to manipulate testing for their own political goals:

Our testing is so much bigger and more advanced than any other country (we have done a great job on this!) that it shows more cases. Without testing, or weak testing, we would be showing almost no cases.

Twitter, see also here.

In addition to Pillar 1 and Pillar 2 story, I have made a another interesting observation related to data this week.

Somebody commented on my article in The Conversation questioning what I had said about the lockdown effectiveness:

You state that “Other countries, including the UK, achieved significant progress in arresting the spread of the disease.” I fail to see how you can include the UK in that sentence. The UK is in 5th place at 298,466 cases in the listing by country, which includes an increase of 1,229 in the last 24 hours. So far, 41,969 have died, only 1,293 have recovered, and the number of active cases is still very high at 255,204 (Ref. at 11:15 PDT on June 16, 2020).

Anonymous expert request, The Conversation

Without going into merits of either my statement or the comment on the effectiveness of the UK response, I want to concentrate on the data quoted in the letter above. Covidly is a web site that collates the latest data on the COVID-19 pandemic. They are presented in a nice and comprehensive way and so are easy to access.

They are also easy to misinterpret.

So what about the UK data? What has caught my attention was the statement that only 1,293 individuals recovered and the number of active cases is 255,204. These two numbers just simply did not make sense. The total number of cases in the UK was 298,466 at the time of writing, including individuals who became ill in February. Surely, by June – three months after they became ill – they would have either recovered or died. Different estimates exist for the infectious period, but it is of the order of days, up to a week. A quick look at the graph at Covidly shows that there is only a trickle of people to be recovering from COVID-19 in the UK.

A screenshot from Covidly

Or, rather, the UK simply stopped reporting recovery from COVID-19. A quick look at the Johns Hopkins University spreadsheets shows that the UK stopped reporting recovery in early April and so the later entries in that database are 0. Note that the graph below shows cumulative number of recovered; the values after 12th April should not have been entered as 0, but as NA (not available). To interpret them as 0 is simply wrong.

Plot based on Johns Hopkins University data.

However, Covidly is trying to collect data from multiple sources. This is a noble task, as it hopes to fill gaps in reporting. But in this case, it provides even more confusion, as it uses other, also unreliable, sources.

To be fair to Covidly, the FAQ section says:

The vast majority of countries have some sort of reporting accuracy issue, including data for confirmed cases and deaths. The goal of this tool is to provide an easy way to visualize the data without judging the validity of the nation providing the data.

It is just not many people read the FAQs.

By the way, a quick calculation shows that the number of active cases on the Covidly database is a difference between total cases and dead and recovered cases. If the number of recovered cases is low, the remaining cases are (mis)labeled as active.