False positives and false negatives

There is now a lot of interest in testing for coronavirus. The ‘test-and-trace’ strategy in the UK has been hugely controversial with a lot of complaints about people missing tests. But there is also a lot of confusion about how reliable testing is – Does it overestimate the number of cases? Does it tell people wrongly about their status? Are there people who are told they are not infected but they actually go about and spread the disease? Conversely, how many people are told wrongly they have the disease and as a result, their life is completely turned upside down unnecessarily?

There are at least three broad types of tests for the coronavirus. The first one uses swabs from people’s throat and nose to detect the virus itself; they are based on a molecular technique called PCR. This detects viral RNA and uses a method of “amplification” by which small amounts of RNA which are difficult to detect are “amplified” by a lab procedure to reach the detection level. This test tells us who is currently infected.

The second one is based on blood tests and detects the presence of antibodies – molecules that our body uses to fight off the virus. This test gives information on how many people have been exposed to the disease in the past. The third group looks for symptoms like infection of lungs and concentrates on people who had a particularly bad case of the coronavirus infection.

This blog is about statistics, not about molecular biology, so it suffices to say that all tests are pretty accurate and are continuously being evaluated and improved. At the same time, none of the tests is 100% effective and so we need to look at the consequences of potential misreadings.

There are two kinds of errors that are associated with testing. Firstly, the test might produce a negative result when a person is in fact infected. In other words, a person who in reality is infected by the virus is told “all clear”. There are many reasons for this. The swab test, for example, only can pick up the virus during a certain period following infection – not too early and not too late. Also, it is not a pleasant test – the swab spatula need to go far into the throat or nose. If not inserted properly (particularly if you are doing it yourself!), it will not pick up the viral particles.

In statistics, such an error is called a “false negative” result.

Secondly, the test can produce a positive result when a person is not infected. They perhaps went to get tested because they had fever and cough, but it was caused by an “ordinary” common cold. However, the test swab was for some reason contaminated with the viral RNA, or there was a problem with the procedure. This is usually less likely, but not impossible, as discussed recently in The New York Times.

We call it a “false positive” result.

It is important to stress at this point that the problems do not in any way undercut the importance of testing. However, it is important to understand the limitations and think about ways to deal with them.

Let’s look at some numbers. Suppose that on a certain day 200,000 people were tested and 3,000 positive results returned. These numbers are not too dissimilar to what is happening in the UK at present (18th September 2020: 233,199 tests were carried out with 3,395 positives).

InfectedNot infectedTotal
Positive?False positives: ?3,000
NegativeFalse negatives: ??197,000
Total??200,000
Our first go at the table.

We are now asking, how many people who went to get tested were “true” positives – not how many people in the whole population carried the virus which is a different question.

To calculate this we also need to have an idea of how good the tests are and so how likely the “false positives” and “false negatives” are. We currently do not have firm estimates, but we know that the “false positives” are not very likely, but “false negatives” could be more probable. Let’s assume first that the odds for both “false positives” and “false negatives” are 1:999. In other words that the (conditional) probability of the test to produce a wrong result is 0.1% in each case – very low indeed.

Then, a simple calculation shows that:

InfectedNot infectedTotal
Positive2,803False positives: 1973,000
NegativeFalse negatives: 3196,997197,000
Total2,806197,194200,000
Table for 1:999 odds for both “false positives” and “false negatives”.

What does this mean? Firstly, the number of people “really” infected (2,806) is similar to but slightly lower than the 3,000 reported – good so far. Secondly, the number of “false negatives” is very low (3/2806=0.1%). However, there are quite a few “false positives” – in fact, 6% (197/3000) of all positive tests are “false”.

We have made quite optimistic assumptions about the accuracy of the test, particularly for the “false negative”. Changing the odds to 2:8 just for these entries (there is some evidence that the best rate at which we can detect the virus is about 80% if tested 3-4 days after infection and 1-2 days after symptoms), we get:

InfectedNot infectedTotal
Positive2,803False positives: 1973,000
NegativeFalse negatives: 350196,650197,000
Total3,153196,847200,000
Table for 2:8 odds for “false positives” and 1:999 for “false negatives”.

The number of “false positives” stays the same (roughly, as we are doing some rounding off), but the number of “false negatives” shoots up. We now expect to have more “true” cases (3,153) than detected and to include a sizeable proportion of “false negatives”.

Why do we bother? The consequences of these two “false” test results are a bit different. If a person is told they are COVID-positive, they will need to self-isolate. It might also mean that their family, or friends, or contacts, will need to isolate as well. Or, if this happens at school, or at a care home, then the whole class or a home population will be affected, albeit for a relatively short time. If this is based on a wrong diagnosis, it can be a nuisance and it can lead to economic and social hardship, but there are hardly any epidemiological consequences.

In fact, if at all, they seem to be positive rather than negative. Most people who go to get tested have some symptoms or have been in contact with those who have symptoms. Flu symptoms are similar to the ones for COVID-19 and so the self-isolation might break the flu transmission as well. Maybe this is one reason why Australia and New Zealand apparently have been recording fewer influenza cases in their Winter season (June-September 2020) than in other years (masks and social distancing contribute as well).

The consequences of “false negatives” are more serious, as a person that carries the coronavirus infection will get a green light to carry on normal life. This could lead to further infections and, if a person is involved in a superspreader event, could result in a large outbreak.

In this post I only looked at the efficiency of testing, thinking exclusively about people who decided or were told to be tested. Thus, I only really looked at the PCR/swab tests.


There is a whole group of people who are infected – and infectious – but who for one reason or another are not tested. Some of them will self-isolate if they have symptoms. But many will simply carry on, unwillingly – perhaps sometimes carelessly – spreading the virus. In the first wave (February-June) we think there were about 8 such asymptomatic and untested people for every positive tested case. It is believed that with better testing this multiplier is now 2-3. As with the “false negatives” talked about above, it is very important to “catch” as many of them and to reduce the potential for transmission. But, this is a story for another blog post.


For the mathematically curious, this is the set of equations involved in the calculations:

    \[ \begin{array}{lcl} x+y & = & 3000 \cr z+v & = & 197000 \cr x & = & 999 z \cr v & = & 999 y \end{array} \]

where

InfectedNot infectedTotal
PositivexFalse positives: y3,000
NegativeFalse negatives: zv197,000
Totalx+zy+v200,000

Plant Health Week – day 2

A couple of years ago I was writing a paper on how modelling can be used to address the epidemiology aspects of One Health. I was looking for examples when outbreaks of plant pests and pathogens were linked to catastrophic changes in human health. It was then that I became aware of the Bengal Famine of 1942-43.

In 1942 in Bengal, a province of then British India, a fungal infection, Cochliobolus miyabeanus (Brown Spot), was spreading through rice fields. The impact of the disease was intensified by tropical storms on 16-17 October which widely distributed the fungal spores while also killing 14,500 people and destroying fields and rice paddies.

When rice plants were attacked by the fungus, brown patches and discolouration appeared on leaves and stems, and the plants started to die.

Brown spot patches
http://www.knowledgebank.irri.org/training/fact-sheets/pest-management/diseases/item/brown-spot

The resulting carnage caused estimated yield losses of up to 91% of rice.

Massive starvation followed with the resulting decrease in resistance to diseases. Meanwhile, the weather also created conditions conducive to mosquito breeding leading to an outbreak of malaria. While the first wave of deaths (Winter 1942) was largely caused by starvation, the second wave (1943-44) was dominated by human disease, with malaria, cholera and smallpox thriving in an already affected population. As a result, an estimated 2-3 million people died in a population of 60m.

Bengal Famine
For details of copyright see https://commons.wikimedia.org/w/index.php?curid=58209087

Even today, rice seedling mortality rate of up to 60% caused by Brown Spot is recorded in some countries and crop yield can be reduced by up to 40%. Prevention is based on disease-free seed and resistant varieties as well as such practices as less dense planting and keeping weeds down. Soil and plant pesticides are also used to fight the disease.

There is an ongoing dispute about the origin and course of the Bengal Famine. The context was the middle of World War 2 with a threat of Japan invasion of British India. But, the British and local government and merchants have been accused of gross mismanagement of food supplies and thus of either causing or not alleviating the famine and death of so many people.

“Though administrative failures were immediately responsible for this human suffering, the principal cause of the short crop production of 1942 was the [plant] epidemic … nothing as devastating … has been recorded in plant pathological literature”.

Padmanabhan SY. The Great Bengal Famine. Annu.Rev.Phytopathol. 1973; 11(1): p. 11-24.doi:10.1146/annurev.py.11.090173.000303

Without going deeply into the political and social causes of and mechanisms for the Famine, the events of 1942-43 show how a plant disease outbreak can become a tipping point and trigger massive suffering.

On a personal note, here is me in India, working with colleagues on plant pest detection and control:

Own library.