False positives and false negatives

There is now a lot of interest in testing for coronavirus. The ‘test-and-trace’ strategy in the UK has been hugely controversial with a lot of complaints about people missing tests. But there is also a lot of confusion about how reliable testing is – Does it overestimate the number of cases? Does it tell people wrongly about their status? Are there people who are told they are not infected but they actually go about and spread the disease? Conversely, how many people are told wrongly they have the disease and as a result, their life is completely turned upside down unnecessarily?

There are at least three broad types of tests for the coronavirus. The first one uses swabs from people’s throat and nose to detect the virus itself; they are based on a molecular technique called PCR. This detects viral RNA and uses a method of “amplification” by which small amounts of RNA which are difficult to detect are “amplified” by a lab procedure to reach the detection level. This test tells us who is currently infected.

The second one is based on blood tests and detects the presence of antibodies – molecules that our body uses to fight off the virus. This test gives information on how many people have been exposed to the disease in the past. The third group looks for symptoms like infection of lungs and concentrates on people who had a particularly bad case of the coronavirus infection.

This blog is about statistics, not about molecular biology, so it suffices to say that all tests are pretty accurate and are continuously being evaluated and improved. At the same time, none of the tests is 100% effective and so we need to look at the consequences of potential misreadings.

There are two kinds of errors that are associated with testing. Firstly, the test might produce a negative result when a person is in fact infected. In other words, a person who in reality is infected by the virus is told “all clear”. There are many reasons for this. The swab test, for example, only can pick up the virus during a certain period following infection – not too early and not too late. Also, it is not a pleasant test – the swab spatula need to go far into the throat or nose. If not inserted properly (particularly if you are doing it yourself!), it will not pick up the viral particles.

In statistics, such an error is called a “false negative” result.

Secondly, the test can produce a positive result when a person is not infected. They perhaps went to get tested because they had fever and cough, but it was caused by an “ordinary” common cold. However, the test swab was for some reason contaminated with the viral RNA, or there was a problem with the procedure. This is usually less likely, but not impossible, as discussed recently in The New York Times.

We call it a “false positive” result.

It is important to stress at this point that the problems do not in any way undercut the importance of testing. However, it is important to understand the limitations and think about ways to deal with them.

Let’s look at some numbers. Suppose that on a certain day 200,000 people were tested and 3,000 positive results returned. These numbers are not too dissimilar to what is happening in the UK at present (18th September 2020: 233,199 tests were carried out with 3,395 positives).

InfectedNot infectedTotal
Positive?False positives: ?3,000
NegativeFalse negatives: ??197,000
Total??200,000
Our first go at the table.

We are now asking, how many people who went to get tested were “true” positives – not how many people in the whole population carried the virus which is a different question.

To calculate this we also need to have an idea of how good the tests are and so how likely the “false positives” and “false negatives” are. We currently do not have firm estimates, but we know that the “false positives” are not very likely, but “false negatives” could be more probable. Let’s assume first that the odds for both “false positives” and “false negatives” are 1:999. In other words that the (conditional) probability of the test to produce a wrong result is 0.1% in each case – very low indeed.

Then, a simple calculation shows that:

InfectedNot infectedTotal
Positive2,803False positives: 1973,000
NegativeFalse negatives: 3196,997197,000
Total2,806197,194200,000
Table for 1:999 odds for both “false positives” and “false negatives”.

What does this mean? Firstly, the number of people “really” infected (2,806) is similar to but slightly lower than the 3,000 reported – good so far. Secondly, the number of “false negatives” is very low (3/2806=0.1%). However, there are quite a few “false positives” – in fact, 6% (197/3000) of all positive tests are “false”.

We have made quite optimistic assumptions about the accuracy of the test, particularly for the “false negative”. Changing the odds to 2:8 just for these entries (there is some evidence that the best rate at which we can detect the virus is about 80% if tested 3-4 days after infection and 1-2 days after symptoms), we get:

InfectedNot infectedTotal
Positive2,803False positives: 1973,000
NegativeFalse negatives: 350196,650197,000
Total3,153196,847200,000
Table for 2:8 odds for “false positives” and 1:999 for “false negatives”.

The number of “false positives” stays the same (roughly, as we are doing some rounding off), but the number of “false negatives” shoots up. We now expect to have more “true” cases (3,153) than detected and to include a sizeable proportion of “false negatives”.

Why do we bother? The consequences of these two “false” test results are a bit different. If a person is told they are COVID-positive, they will need to self-isolate. It might also mean that their family, or friends, or contacts, will need to isolate as well. Or, if this happens at school, or at a care home, then the whole class or a home population will be affected, albeit for a relatively short time. If this is based on a wrong diagnosis, it can be a nuisance and it can lead to economic and social hardship, but there are hardly any epidemiological consequences.

In fact, if at all, they seem to be positive rather than negative. Most people who go to get tested have some symptoms or have been in contact with those who have symptoms. Flu symptoms are similar to the ones for COVID-19 and so the self-isolation might break the flu transmission as well. Maybe this is one reason why Australia and New Zealand apparently have been recording fewer influenza cases in their Winter season (June-September 2020) than in other years (masks and social distancing contribute as well).

The consequences of “false negatives” are more serious, as a person that carries the coronavirus infection will get a green light to carry on normal life. This could lead to further infections and, if a person is involved in a superspreader event, could result in a large outbreak.

In this post I only looked at the efficiency of testing, thinking exclusively about people who decided or were told to be tested. Thus, I only really looked at the PCR/swab tests.


There is a whole group of people who are infected – and infectious – but who for one reason or another are not tested. Some of them will self-isolate if they have symptoms. But many will simply carry on, unwillingly – perhaps sometimes carelessly – spreading the virus. In the first wave (February-June) we think there were about 8 such asymptomatic and untested people for every positive tested case. It is believed that with better testing this multiplier is now 2-3. As with the “false negatives” talked about above, it is very important to “catch” as many of them and to reduce the potential for transmission. But, this is a story for another blog post.


For the mathematically curious, this is the set of equations involved in the calculations:

    \[ \begin{array}{lcl} x+y & = & 3000 \cr z+v & = & 197000 \cr x & = & 999 z \cr v & = & 999 y \end{array} \]

where

InfectedNot infectedTotal
PositivexFalse positives: y3,000
NegativeFalse negatives: zv197,000
Totalx+zy+v200,000

Plant Health Week – day 2

A couple of years ago I was writing a paper on how modelling can be used to address the epidemiology aspects of One Health. I was looking for examples when outbreaks of plant pests and pathogens were linked to catastrophic changes in human health. It was then that I became aware of the Bengal Famine of 1942-43.

In 1942 in Bengal, a province of then British India, a fungal infection, Cochliobolus miyabeanus (Brown Spot), was spreading through rice fields. The impact of the disease was intensified by tropical storms on 16-17 October which widely distributed the fungal spores while also killing 14,500 people and destroying fields and rice paddies.

When rice plants were attacked by the fungus, brown patches and discolouration appeared on leaves and stems, and the plants started to die.

Brown spot patches
http://www.knowledgebank.irri.org/training/fact-sheets/pest-management/diseases/item/brown-spot

The resulting carnage caused estimated yield losses of up to 91% of rice.

Massive starvation followed with the resulting decrease in resistance to diseases. Meanwhile, the weather also created conditions conducive to mosquito breeding leading to an outbreak of malaria. While the first wave of deaths (Winter 1942) was largely caused by starvation, the second wave (1943-44) was dominated by human disease, with malaria, cholera and smallpox thriving in an already affected population. As a result, an estimated 2-3 million people died in a population of 60m.

Bengal Famine
For details of copyright see https://commons.wikimedia.org/w/index.php?curid=58209087

Even today, rice seedling mortality rate of up to 60% caused by Brown Spot is recorded in some countries and crop yield can be reduced by up to 40%. Prevention is based on disease-free seed and resistant varieties as well as such practices as less dense planting and keeping weeds down. Soil and plant pesticides are also used to fight the disease.

There is an ongoing dispute about the origin and course of the Bengal Famine. The context was the middle of World War 2 with a threat of Japan invasion of British India. But, the British and local government and merchants have been accused of gross mismanagement of food supplies and thus of either causing or not alleviating the famine and death of so many people.

“Though administrative failures were immediately responsible for this human suffering, the principal cause of the short crop production of 1942 was the [plant] epidemic … nothing as devastating … has been recorded in plant pathological literature”.

Padmanabhan SY. The Great Bengal Famine. Annu.Rev.Phytopathol. 1973; 11(1): p. 11-24.doi:10.1146/annurev.py.11.090173.000303

Without going deeply into the political and social causes of and mechanisms for the Famine, the events of 1942-43 show how a plant disease outbreak can become a tipping point and trigger massive suffering.

On a personal note, here is me in India, working with colleagues on plant pest detection and control:

Own library.

Plant Health Week – day 1

So far my blog has been dominated by coronavirus and its spread and control, but there are other topics that are almost as important for our well-being. This week we are celebrating #PlantHealthWeek, part of the International Year of Plant Health 2020 #IYPH2020. To mark this occasion I hope to write a post each day, to give you, my readers, an idea of how important plants – and plant pests – are to our lives and to mention data and statistics.

Back in the early 1990s, I was a young post-doc in Cambridge who had just started working in the Department of Plant Sciences. Our Common Room (for those not familiar with the University of Cambridge, this is a room where we gathered for our regular tea and coffee breaks) was decorated with portraits of all Botany Professors from 19th century onwards. One painting was showing a rather imposing looking man with a really big moustache, Harry Marshall Ward (1854-1906; Professor of Botany 1895-1906).

In mid-1800s coffee was grown in Ceylon (modern Sri Lanka) and the British plantation owners planted a monoculture of coffee trees on almost every available piece of land. In the 1860s a disease, coffee rust, was spreading in the plantations, killing the trees. Leaves will develop yellow patches, the plant will be unable to produce food and the trees will start to die. The production dropped from 45,000 tons in 1870 to 2,300 tons in 1889.

Mycologists Michael Joseph Berkeley and Christopher Edmund Broome discovered that the disease was caused by a fungus and they named it Hemileia vastatrix – the first part of the name reflecting a half-smooth shape of “spores” i.e. little “seeds” by which the fungus spreads, and “vastatrix” meaning a “she-destroyer”.

http://www.nordbeans.cz/media/galleries/12/6032535876_9e3aaedae2_b-720×480.jpg

Ward came to Ceylon in 1880 to help find the cure for the coffee rust. He recommended avoiding monoculture by growing different types of coffee. He also pioneered “agroforestry” – growing coffee under trees to stop spores being flown by the wind. Growing coffee trees – which are quite small – together with tall trees is practised now as this helps to maintain the right microclimate and shielding from strong sun, as well as providing additional income from timber.

Ward did not manage to solve the Ceylon problem with the coffee rust and the coffee industry there was completely devastated. In order to find a replacement, plantation owners started growing tea which then became the famous Ceylon tea. The English tradition of drinking tea apparently has its origin at that time.

The coffee rust did not stop there and over the years spread throughout the world. There is a constant race between breeding new resistant coffee varieties and the pathogen overcoming the barriers, and between finding new pesticides and the fungus finding ways to resist them. Coffee rust, La roya, is a huge problem across the globe, causing multiple economic and social tragedies.

By the way, I still have not figured out how Professor Ward managed to drink or eat with his moustache…