Positive and negative

I have been doing a lot of data analysis and modelling recently, drawing lots of figures like the ones in my previous post or below. I enjoy my work (although it can be very frustrating at times) and I enjoy maths, statistics and numbers (I know, I am a sad person). So, I have actually been enjoying working on coronavirus data.

Until something hit me yesterday. The UK Prime Minister is in intensive care with the virus and there was some uncertainty whether he is on the ventilator or not. I do not agree with him on a political or moral plane, but I suddenly had a face to one of my numbers. More, I suddenly realised that according to the numbers I had been looking at, he has a 50% chance of dying if he ends up on a ventilator. I had this very strong thought that I desperately want him to survive.

I think this is a sober thought. It is easy to hide behind figures and numbers and lose their meaning. But the work on epidemics means that each figure or number has a meaning. And the meaning is of life or death.

On a positive note, there seems to be continuing evidence that Italy and Spain are slowing down, Austria and possibly Denmark and Czechia are thinking of relaxing the regulations, and even the UK and the US are not growing as fast as they were few days ago. Possibly, the actions are making a dent.

These are slightly different plots than last time, as they show new cases and deaths every day, but similarly, the start is at the left bottom corner and the end (meaning yesterday) is where the point is. If the line goes up, it means there are more and more cases every day. If the line is horizontal, it means the numbers can still be increasing, but slowly (by the same number every day). If the line goes down, we are still seeing new cases or deaths, but the everyday increase is getting smaller.

So a ‘peak’ (like a top of the hill) in this plot is not a real ‘peak’ in cases but is a good indication that we are on the right trajectory. The ‘top of the hill’ will come later but it is difficult to say how far away that is.

Note how long it took for China to lower the numbers. This possibly supports one of our scenarios in a recent The Conversation article on ‘Four graphs that show how the coronavirus pandemic could now unfold‘.

Cases are a better indication of what happens now; death records are delayed by a week or more but are more reliable.

Some graphs

This might be a bit technical and long… You have been warned…

I have been working on repeating the analysis done by Financial Times on plotting cumulative cases and deaths from COVID-19, to illustrate what has been happening in different countries over the last month or so. The FT analysis is free to view but not to republish and I also wanted to do some more analysis on it.

I am using global and freely available data from Johns Hopkins University. The data are what they are – there are many caveats, reporting rate changes in different countries, and there is clearly massive underreporting. This is particularly difficult for deaths, as countries use very different definitions of what a case of death from COVID-19 mean – nobody dies directly from the virus, but from complications following the infection. Keeping this in mind, let’s look at some countries (more countries are shown in the FT graphs; if your favourite one is missing here, please let me know and I can add it).

The first plot shows the cumulative number of reported cases (i.e. all cases up to that day) since the 100 case threshold is reached. In other words, we shift the notifications to the ‘day 0’ corresponding to the actual date when the country reached 100 cases. Some countries will have reached this level earlier (China before 22nd January when the records start) and some later; we are simply tracing the growth of the epidemic to the same start. The advantage of this is that hopefully, this will allow us to discover a common pattern: if the UK follows Italy, but is say 2 weeks later, on our graph the two curves will be very similar.

I will do one more thing and will use a logarithmic scale on the vertical axis. This has to do with the fact that when diseases like COVID-19 spread, they tend to grow a bit like compound interest on your bank account. Thus, the growth is faster and faster, and the more cases we have on a particular day, proportionally more cases will come the next day. It is useful to think in terms of the time it takes to double the number of cases, and they can double every day, every two days, three days, a week, two weeks, etc. Of course, if the cases double every day, this means they grow much faster than if they double every week, for example. The advantage of a logarithmic scale is that if the cases were doubling every day, the graph would look like a straight line and hence this behaviour would be easy to spot in the data.

So, let’s have a look at the first set of graphs, showing the number of cumulative cases since the 100th case.

I grouped the countries roughly according to their general patterns:

Cumulative cases since 100th case by 31st March 2020. Broken straight lines show exponential growth with doubling time: day, 3 days and a week.

Each step line in the figure above is a record for a single country; starting on the bottom left on the day they recorded more than 100 cases, and ending on 31st March at the end of the line on the top right. Thus, China record (black line in the first graph) starts at over 500 cases on the 22nd January and ends with 82 279 cases on 31st March. South Korea (red) recorded 100 cases later than 22nd January and hence their epidemic has lasted for fewer days and the so the red line is shorter than in China. Malta (light blue) only recorded 100 cases few days ago and so their line is very short. The end of all lines corresponds to the 31st March.

The first of these graphs includes countries where the control seems to work to some extend. China, shown in black, started with a very steep growth, doubling the number of cases every day, but quickly managed to slow down until the cases do not really grow any more (this is the horizontal line extending to the right). South Korea (red) is similar, as it implemented successful control measures quite early; it differs from China as the number of cases is still growing, although quite slowly. Japan (green), Singapore (blue) and Malta (light blue) are examples of countries where for very different reasons, the spread is slow, with the number of cases doubling every week or more. Both South Korea and Singapore used testing to slow down the disease progress, and people in Japan potentially were protected by a BCG vaccine.

The second group of countries are here classed as ‘Western European countries’, although I am sure there will be others that have a very similar trajectory. The number of cases roughly doubles every three days or faster, although there is some evidence of a slow down (Italy: black, Spain: red, France: green, Germany: light blue), except in the UK (dark blue line) which still largely follows a straight line. Interestingly, Scotland (black dots) appears to progress slower than the rest of the UK.

The third graph shows some Central/Eastern European countries, mainly because I am interested in how Poland (red line) fares. The overall numbers and rate are somewhat smaller than in the previous graph and the trajectory seems to slow down earlier. Maybe this is because they started the lock down earlier.

The final graph is a slightly mixed bag. There is the US (black line) where the growth is faster than for the UK and other Western European countries and does not show much slowing down. Brazil (red) had initially a similar rate of spread but is perhaps now slowing down. Finally, India (blue, again because I am interested in how the virus spreads there) is relatively slow, but is picking up now.

Cumulative deaths since 100th case by 31st March 2020. Broken straight lines show exponential growth with doubling time: day, 3 days and a week.

Finally, death records show a similar story, although not much of stopping except for China (deaths records start at 10 rather than 100). This could be because deaths only occur after several days or even weeks from the case recording. The death record is, therefore, a bit like the record of the cases some time ago and so we are still too early to see the slowing down of the cases reflected in the deaths. On the other hand, we believe that death record is a more reliable reflection of the disease progress, as case reporting is notoriously difficult to interpret and liable to change as testing frequency changes.

So what do we conclude from these graphs? There is some cautious optimism as some countries managed to stop or slow down the spread, and we start seeing a bit of slowing down in Italy. The worrying side is the US where the growth is still fast and exponential and the number of deaths rises up very fast. I am also pleased that Scotland seems to be relatively protected – is that because of our diet of haggis and whisky?

Update and some (hopefully interesting) links

Apologies for being silent on this blog for several days. Firstly, there seem to be so many things to write about that it is hard to make a choice. I will try to pick up the threads soon.

Secondly, I have been busy writing popular science articles elsewhere. In one of them, we say:

(…) we should also be concerned about whether or not we will be living with the virus for a long time. Will we be able to eradicate COVID-19, as we did with Sars? Or will we need to learn to live with it like we do with the common cold?

You can read the full article on The Conversation; it is co-authored by Rowland Kao from the University of Edinburgh.

I have also written a popular article on epidemiological modelling for a weekly Polish magazine, PAUZA, published by the Polish Academy of Learning. If you read Polish, you will be able to access it next week as it goes through the publishing procedure.