hoakley June 27, 2020 General, Life, Macs, Technology

Does Covid-19 spread that rapidly? Figures from England, New York City and Brazil examined

Received wisdom is that, when a brand new virus starts infecting a completely susceptible population, the number of cases grows exponentially. This is expressed in models of spread which don’t grow linearly with time, but are like grains of rice being doubled on the squares of a chess board: before you get to the sixty-fourth square, you have more rice than there is in the whole world.

exponential

Exponential growth is a terrifying proposition. In this chart, I suppose that every fifteen days, each freshly infected case passes the disease onto three uninfected cases. A single case on day 0 grows to 177,147 on day 165. What appears to be an insignificant problem for the first two months then rapidly becomes overwhelming – and that’s what infection rates in countries like Brazil, or in the new resurgence in many US states, or any ‘second wave’ could do, at least until a large enough proportion of the population has become immune to slow the growth.

What’s even more horrifying about that chart is that the figures I use for its growth – new infections every 15 days (the serial interval or generation time), and 3 new cases for each existing one (the Reproduction Number, R₀) – are similar to those estimated for Covid-19. They also explain why governments who haven’t been too concerned when there are only a couple of hundred cases have already lost control of what will happen in the coming weeks.

Careful analysis of many real pandemics (see Yan and Chowell, for example) has shown that exponential growth occurs during the early phase of major outbreaks, but sub-exponential growth then becomes more usual. This article is an armchair epidemiological exploration using just the tools on my Mac listed at the end.

Data

As I explained previously, I’m very fussy as to the data I use. I’m here not so concerned at getting the largest numbers, but that they are attributed as closely as possible to the actual day on which test swabs were taken, rather than when the result was officially reported. This rules out the data normally reported daily through official channels up to WHO, which is more concerned with accounting for total numbers, and is often ‘corrected’ when previous errors have been discovered, sometimes even for political reasons.

I use the data for fresh cases diagnosed in England (not the whole UK) provided by Public Health England, and equivalent figures for New York City published by NYC Health. I also wanted some for Brazil, where diagnosis and data collection are more fraught. The best that I have found are those collected by Wesley Cota and colleagues on GitHub.

These three parts of the world are of course very different.

England is a component of the UK with a population of around 56 million. Although small numbers of cases of Covid-19 were reported earlier, the epidemic didn’t really get going here until around 1 March. National lockdown was put in force over the period 21-24 March, although many sources claim this didn’t take effect until 24 March, some of its most valuable restrictions were applied on 21st. Those measures appear to have been respected by the great majority of its population, and have been progressively eased during May and June, but many restrictions remain.

New York City is a much smaller and more densely populated area, with a population of around 18.8 million. First cases were reported around 1 March, and lockdown seems to have been applied from 22 March. My particular interest here is in the rate of growth in infections, which appears to have been extremely rapid during late March.

Brazil is a very large country with a population of 212.5 million, many of whom live in overcrowded conditions in its large cities. Compared to England and New York City, Covid-19 appears to have arrived later, with first significant numbers from 18 March. Currently, it’s reporting the largest numbers of fresh cases for any country, which may result from political conflict over lockdown measures and overcrowding. If any country exemplifies rapid and almost uncontrolled spread of Covid-19, it should be Brazil.

Time course of spread

For each dataset I have prepared a similar chart showing the number of Covid-19 cases confirmed each day, plotted as raw data with a trend curve fitted using LOESS, as described before.

englandfresh

In England, case numbers rose exponentially from around 1 March until they exceeded 1,000 after 18 March. They then rose linearly until levelling off to a peak at around 1-10 April, two weeks after lockdown measures took effect. Since then they have been declining more slowly than the original rate of growth, although the effect of recent relaxation in lockdown measures has yet to be seen. Peak death figures, with a maximum of 1,173 deaths on 21 April, lagged the peak in fresh cases by a further two weeks.

nycfresh

The pattern in fresh cases for New York City appears very similar to that in England. Here, the peak in case numbers is between 30 March and 5 April, but with a proportionately lower maximum number of daily deaths of 589 much earlier, on 7 April.

brazilfresh

As might be expected, the scatter of raw data points for Brazil is much greater, and increases with time. At present, this shows the early phase of exponential growth, following which there may be a linear phase from the middle of May. However, confidence is low, and it’s possible that the rate of growth is still accelerating. There’s also no evidence of case numbers reaching any peak yet.

Cumulative totals

One classical way of demonstrating the underlying mathematics of epidemic growth is to examine charts of the cumulative total of cases with time. Because of the large numbers involved, this obscures much of the detail seen in the previous charts, particularly during the falling daily figures after the peak, but it reveals the nature of spread. Sometimes, these charts are shown with a logarithmic scale for the Y axis; I avoid that here, retaining a linear scale.

englandcum

For England, this is a distinctively S-shaped (sigmoid) curve, although it isn’t symmetrical, in that growth occurred more rapidly than levelling off at the top. Indeed, the top of the curve continues to rise slowly and isn’t yet horizontal. This form of curve is also known as the logistic function, and is examined in detail on Wikipedia. It’s relatively straightforward to read its coefficients:

L, the maximum value, I will take for convenience as being 161,000, which for the English population represents around 3 cases per thousand.
k, the logistic growth rate, is about 3,450 cases per day per day, the maximum rate of rise.
x₀, the midpoint, occurs 43 days after the start of significant case numbers on 1 March.

nyccum

For New York City, this is another sigmoid curve similar to England’s, but with different coefficients:

L, the maximum value, is 210,000, or 11 per thousand population.
k, the logistic growth rate, is 5,083 cases per day per day.
x₀, the midpoint, occurs 30 days after the start of cases on 1 March.

brazilcum

Brazil shows no sign of reaching the top of any sigmoid curve, despite attaining a total of 1.2 million cases. Although this is now sub-exponential growth, it has just doubled in the last 20 days, which is truly alarming. Mathematically, the curve shown here is reasonably well-fitted by a quadratic equation, in which x² (squared) is significant.

Tentative conclusions

There’s no doubt that Covid-19 spreads very quickly, and the explosive rise in fresh cases in each of these three areas is forceful evidence that we need to do as much as we can to control that spread. But in practice, even (probably) in Brazil, cases don’t rise exponentially as some simple models would predict. The reality is that, like most other infectious diseases, its spread is sub-exponential.

There are many good reasons for this. Although the application and effectiveness of lockdown measures in these three places differs significantly, as does the density of their populations, initial exponential growth in the number of infections should settle to a more linear rate once the infection becomes more common, although still well below the rate at which the falling number of susceptible people starts to limit further spread.

For those modelling the spread of Covid-19, or trying to predict it, these observations demonstrate some of the limitations and pitfalls in many simple models. They also show some of the differences between areas.

Figures for England include large cities like London, in which infection spread earlier and more rapidly, more similar to that in New York City. Rises in numbers of cases in less densely populated areas occurred later and more gradually. UK strategy for healthcare provision anticipated that, with dedicated temporary hospitals being opened in time for the peak. Planning for New York City seems to have been more ad hoc. In spite of that, and the fact that there is still no curative treatment for Covid-19, death rates have been very different.

I have deliberately avoided looking at deaths in any detail, for many reasons. Most significantly, as the death rate from Covid-19 is strongly related to age and differs markedly by ethnic group too, it’s meaningless to interpret death data, or even worse to attempt any comparisons, without taking those into account. Although the concept of excess deaths is attractive, unless the underlying data are clean, robust and comparable, it too can quickly become misleading.

Finally, I hope these figures and charts give an indication of how much more analysis is going to be required before any of this starts making sense. Those who rush to draw conclusions and use them to construct political criticism could so easily look foolish when better analyses are performed in the months and years to come. Don’t try counting your chickens before you know whether they’re even birds!

Reference

Ping Yan and Gerardo Chowell (2019) Quantitative Methods for Investigating Infectious Disease Outbreaks, Springer. ISBN 978 3 030 21922 2.

Software used includes DataGraph from the App Store, and Igor Pro from WaveMetrics.

18Comments

Add yours

1

Sam on June 27, 2020 at 9:42 am

An interesting analysis. Though I fear that despite your care to look at clean data, the amount of undertesting at the onset of outbreaks in many countries may skew the early data. One can imagine that if testing is ramped up faster than the spread itself, more cases are caught and the growth rate appears larger than it really is. This may be more pronounced in countries like Germany, which ramped up testing a tracing only after the virus had already spread to large parts of the country. The effect on the data may be different for England which effectively stopped testing and tracing close to the onset of the lockdown.

LikeLiked by 1 person
- 2
  
  hoakley on June 27, 2020 at 10:41 am
  
  Thank you.
  I have considered the state of data in other posts here. In fact in the case of the data I’m using for England, there’s little evidence that it has been constrained by under testing, as it’s largely (possibly entirely) Pillar 1 data, and represents patients in NHS hospitals in the main.
  However, even if you assume undertesting in the early weeks or months, this can’t transform the curve into an exponential, can it? It would actually have the opposite effect, and make the cumulative curves more symmetrical sigmoids. As you write, this may underestimate how sub-exponential the growth in cases really was.
  Sadly, I don’t think we’re ever going to better data on occurrence, although the death data will clearly be the subject of extensive re-analysis over the coming months and years. So this is probably as good as we’ll get.
  Howard.
  
  LikeLike
  - 3
    
    Sam on June 27, 2020 at 12:28 pm
    
    In this case the Pillar 1 data for England should indeed be comparable throughout. And yes, potential under testing at the onset would indeed reinforce your point and lead to a slower onset and more symmetrical sigmoids.
    
    LikeLiked by 1 person
4

Kitsy Stratton on June 27, 2020 at 10:01 am

You’re amazing. The breadth and depth of your knowledge astounds me. And you give it so freely, without complication.

I’m blessed.

I love you, Howard.

LikeLiked by 1 person
- 5
  
  hoakley on June 27, 2020 at 10:42 am
  
  Thank you, Kitsy.
  Howard.
  
  LikeLike
6

Meighan Maley on June 27, 2020 at 1:10 pm

Since the federal government in the US is currently run by a ship of fools, crooks, and science-deniers, many in the US turn the Johns Hopkins website ( https://coronavirus.jhu.edu/map.html ) for data at a time when even those who run the CDC and the federal emergency management system are in Trump’s pocket. What do you think of the Hopkins analysis? Thank you.

LikeLike
- 7
  
  hoakley on June 27, 2020 at 10:46 pm
  
  Thank you.
  The Johns Hopkins charts are superb – as are several others. But they are based on generally available data, which has to cover a wide range of countries. What I have done here is – for England and New York City – tried to obtain the most accurate data, particularly with respect to the date of diagnosis. That doesn’t work so well for most other locations, but I wanted the detail to analyse. I’ve also spent several days exploring the data and deciding how best to present it. Websites like Johns Hopkins don’t have such a luxury, and aren’t asking such specific questions. Interestingly, we both use the same source for Brazil.
  Next week I’m going to look at the residuals in the figures after trend has been removed, to see if there is anything other than noise and the hebdomadal cycle.
  Howard.
  
  LikeLike
  - 8
    
    Meighan Maley on June 27, 2020 at 10:56 pm
    
    I certainly don’t have an understanding of data analysis that approaches yours but it’s a comfort to know that there are people who do while we all wait for a vaccine. Best wishes.
    
    LikeLiked by 1 person
    - 9
      
      hoakley on June 27, 2020 at 10:57 pm
      
      Thank you, Meighan. Stay safe!
      Howard.
      
      LikeLike
10

G.J. Parker on June 28, 2020 at 6:04 pm

I tend to plot number of new cases today as a function of new cases yesterday, typically for the past three weeks. Data is noisy, but simply counting the number of times the points are above or below the y=x line gives you an idea if it’s spreading or not.

Fitting logistic function (or really any curve to raw data) is fraught w/ errors and can be significantly affected by the quality of the data you’re trying to fit.

Not sure I understand your criticism on death rates. Yes, different demographics have different case mortality and so to use it to compare countries w/ different demographics is bad (i.e. Brazil has a much younger population than either England or U.S.), but within a country it is useful.

btw, https://rt.live adjusted their algorithm for Rt… seems much more reasonable now.

LikeLiked by 1 person
- 11
  
  hoakley on June 28, 2020 at 9:13 pm
  
  Thank you.
  Are you seriously suggesting that the data on those sigmoid curves are “fraught with errors”?
  My interest here, as I hope I’d made clear, is not whether Covid-19 is spreading, but the rate of growth in cases during large epidemics. In particular, whether this conforms to many popular models as being exponential. I’m not sure of any other method which answers that question, but I’m open to suggestions.
  There’s a lot more to risk of death from Covid-19 than mere age. Several studies have now associated high death rates with social deprivation, gender, with different ethnic groups, and there’s also the strong association with certain chronic diseases. Before you can make any comparisons, even for parts of a single city, you have to take those into account. That immediately makes the task a great deal tougher – even standardising for different age distributions requires detailed demographic information which often isn’t available or is only projected.
  Finally, if you accept that in these real world examples, growth in rate of infections isn’t exponential but follows quite different and sub-exponential patterns, what does Rt mean? Most authorities seem to be switching from Rt to using simple rates of change – as you’re estimating.
  Apropos of which, how do you deal with the hebdomadal cycle in case numbers? One of the great things about fitting functions is that you can examine your residuals, including looking for cyclical effects in them, as I showed in my previous article.
  Howard.
  
  LikeLike
  - 12
    
    G.J. Parker on June 29, 2020 at 12:47 am
    
    Didn’t mean to offend.
    
    But, yes, I think fitting to a logistic function is in error (nor can I tell the robustness of the fits). So what x0 is 30 days? Isn’t that more a function of society response instead of intrinsic to the virus itself? The logistic function is a solution of dN/dt = r N (1 – N/k) which is used to describe population dynamics. What is the reason for it to apply here? Are you trying to compare how effective the societal response between NYC, England and Brazil? Then it might be useful. In other words, what does it mean that Brazil’s data fits to a quadratic?
    
    Only the initial stage of a pandemic is exponential. Since there is a finite number of people, the curve *must* become sub-exponential. That said, even if 10x more people have been infected compared to official accounts, there is still a vast majority of people who are still susceptible- still close to exponential growth as simple models suggest. Again, I would argue, that these sigmoid fits show the initial (intrinsic, if you will) behavior of the virus and transmission. But the intermediate and roll over (yet to be seen in Brazil) is due to societies’ response. I expect we’ll see proof of this in U.S. data soon.
    
    I agree mortality is strongly dependent on many factors. However, those factors should be time invariant over periods of weeks or month. Therefore, is it not valid to look at death time histories for a given location for a temporal comparison? I’m personally more concerned about temporal variations at a location. Sorry for that assumption. Otherwise, yes, I agree with you.
    
    Rt is just the time variant of R0: how many secondary infections from one primary infection. R0 is intrinsic to the virus according to simple models. Rt allows for time variations of R0 due to seasons, societal response, etc. Again, just an estimate of the growth/decay of infections. That is, we or the virus is in control.
    
    Thanks for the new word (hebdomadal)! I don’t. i don’t do any smoothing, filtering or fitting of the raw data. It’s a simple visualization of the temporal trend (connecting points with arrows helps). You can see them and predict them by simply counting as you ‘walk’ along the path. Nice thing, it’s just a curl | grep | awk | gnuplot to generate the graphs.
    
    LikeLiked by 1 person
    - 13
      
      hoakley on June 29, 2020 at 10:31 pm
      
      Thanks – no offence taken, just a robust defence!
      I’m sorry, I didn’t state explicitly what should have been evident from the sigmoid curves: I didn’t try fitting logistic functions to them, as they’re asymmetric, so they would have been a poor fit. The curves are fitted using LOESS, which isn’t (simply) parametric, and the coefficients are simply read from the LOESS fits and original data.
      I notice that you appear to be proceeding from model to data, which is the converse of what I have learned to do in any experimental or observational science. You obtain your data, explore them thoroughly, and from those develop potential models. If these curves in real life are sigmoid, then if your models don’t produce sigmoid curves there’s something wrong with them. I know that’s a very traditional approach, but it has worked for a long time, particularly in biological sciences.
      If you look at the point that the curves for England and New York cease resembling exponential curves, they’re long before any significant effects on the size of the susceptible population. In the case of England, that’s around 40,000 cases in a population of around 56 million; in New York City it’s around 50,000 in a population of around 18.8 million. No, those curves aren’t exponential in the slightest, they’re sigmoidal. As are many if not most such curves (see the reference provided). If that’s the case, then the whole concept of R0 and Rt needs to be called into question – as it has by many epidemiologists working on these and other data for Covid-19. Only politicians are left quoting “R numbers” which they use to support their decisions.
      I’m fascinated that you explain the ‘transition’ from that early phase in terms of “societies’ response”. Because if you look carefully at the dates, and allow for the fact that changes in behaviour such as lockdown take around 2 weeks before they’re reflected in the fresh case figures, you’re claiming that society in England did something to slow transmission well before the lockdown started on 21 March, and similarly for New York City around 10 March. I wasn’t in New York at the time, but I was in England, and I don’t recall anything of significance changing to prevent spread, apart from things like handwashing. The pubs, restaurants, etc., were all still thriving, and I think over that period there were some major mass events – for which the UK government has been criticised since. Far from trying to stop spread, if anything actions should have promoted it. Yet 2 weeks later, growth in new infections slowed. Or perhaps it was always on a similar trajectory anyway?
      Your suppositions about mortality also fly in the face of the pattern of mortality in the UK. Early mortality was among the older population at large, particularly those with pre-existing medical conditions, and in deprived areas in cities. Later, when Covid-19 affected care homes badly, the pattern changed, and relatively few younger people seem to have died. There also seems to be a secular trend, which has been reported in other studies, in which age-adjusted mortality has declined since April. Of all the things we can observe about Covid-19, patterns of mortality are by far the most complex, and are also greatly delayed from the start of infection. The UK still has quite high daily deaths because many of those dying now have been ventilated for several weeks, having sustained their original infections way back in April. Looking at mortality is going to be a slow and painstaking process over the coming years, I fear. It has also been complicated by (necessarily) simplistic approaches to cause of death – death certificates contain a complex web of causes, and unravelling each can be difficult. They’re not as crisp as these figures suggest, nor do excess deaths tell the full story, as they don’t establish why more people died than usual.
      I’m starting now to separate out the different components involved in the original figures – trend (as previously shown), cyclical effects which occur at hebdomadal and other frequencies, and residuals. This is fairly classic time series analysis. So more to come yet.
      Howard.
      
      LikeLike
  - 14
    
    G.J. Parker on July 1, 2020 at 11:04 pm
    
    Great- and good defense.
    
    You are perceptive. I do write first principle simulations and compare to experimental results. If you have a valid simulation, then it’s easy to back out what is the physical cause of some experimental result. Better, it can predict future experimental results or show how to improve something. I admit, completely biased on this approach (inverse approach is fine, but i’m usually paid to find a physical causation, not a correlation). Alas, here, there is no set of physical equations i can write down and solve and compare to actual data. You can start with simple models (SIR, etc.) and then start putting in time variation in the ‘constants’ or add extra terms, but finding a justification (other than ‘it fits’) is difficult. And any conclusion seems equally difficult.
    
    It’s easy to take, say, SIR and input a temporal variation on beta (or R0) to get the infection curve to become linear. Not sure what that means. To argue that people only change their behavior *after* the government says something is questionable. I modified my behavior long before my local, state or federal government told me to do something. I’m in no way unique. My sister-in-law’s family started their ‘stay-in-place’ a month before they were told. I recall being in San Fransisco the day before Valentine Day (Feb 13) and perhaps 1/3 of the people were wearing masks on the street (bit surprised at the time). Local government was the first to act, but it wasn’t until early March. This could (not saying it does) explain the transition between exponential and linear. Or it could be more subtle. Say your significant other gets sick. Odds are you won’t be going to a tavern, sporting event, the beach, etc, so you’ve removed yourself from the pool, at least temporarily. If you believe the underlying equations are non-linear, little changes could cause big effects. The simple models have some truth (there are physical phenomena that are truly exponential) but certainly not the full truth.
    
    Politicians trying to justify a bad policy by using bad math is nothing new. At least on this side of the pond. I can’t find a free copy of your reference currently. R0/Rt may very well not accurately describe the future, but it gives an idea of the current state of affairs. For example, I’m not going to Los Vegas anytime soon mostly because the Bayesian estimate of Rt is the highest in the U.S.. That’s useful information, no? Agree it doesn’t tell me what it will be in a month from now. In other words, give me *some* metric that tells me if there’s a problem right now and quantify how bad it is.
    
    Collecting elderly (who may have multiple health problems) at fixed locations and then underpaying and understaffing the people providing care is not a great policy when a pandemic hits even if it does maximize profits. It’s not surprising the virus (and resulting deaths) found such places. Here, they are not moving elderly out of such facilities, nor are they doing much for the homeless or rural people who don’t have easy access to healthcare (i.e. simple lack of oxygen leading to unnecessary death).
    
    What I am most concerned with is what happened during the 1918 flu. The first wave was typical: old and very young got infected and it wasn’t too remarkable (in infection rate or mortality). The second wave was much more deadly to ‘healthy’ and ‘young’ people. Supposedly this was because of WWI. When soldiers (typically young and healthy) got severely sick (more deadly strain), they were put on trains and confined at hospitals. Natural selection then simply promoted a more deadly variant for the second wave. Looking at Florida, Texas and Brazil today (majority young and healthy), i hope history will not repeat itself.
    
    Look forward to your time analysis.
    
    Thanks for the discussion. Hope it was useful. I found it interesting, so thank you.
    
    LikeLiked by 2 people
    - 15
      
      hoakley on July 2, 2020 at 9:04 pm
      
      Thank you – we agree on a great deal, particularly the shocking problems with care of the elderly.
      Howard.
      
      LikeLiked by 1 person
16

Bob on June 30, 2020 at 11:08 am

Thanks Howard. This is interesting. I’ve been tracking the numbers from worldometers since the beginning of March and the total number of infections shows as being very linear. Of course, given the size of that number globally at this point, it’s a ten-thousand-foot view. I’d have to dig into the day-to-day values to find any exponential blip. What I have found more intuitive is the rate of change, which I track on a 4-day rolling average. I mentioned previously that I use linear least squares on intervals of data to provide slope values as trends. This tracks nicely (if such a phrase can be used under these circumstances) with happenings in the US, where the slope was just slightly negative in May (number of new infections slowly going down) and turning positive in June as states began to open up. Globally, the slope has been much worse in June. Brazil probably contributes much of the global rate increase.

LikeLiked by 2 people
- 17
  
  hoakley on June 30, 2020 at 9:11 pm
  
  Thank you.
  Yes, trends in the US and Brazil are alarming at present. Let’s hope they level off again very soon – that does appear to be happening now in some US states which have shown recent rapid rises.
  Howard.
  
  LikeLiked by 1 person
18

Jules bercy on August 7, 2020 at 4:15 am

https://thecurious605531859.wordpress.com/europe-covid-19-live/

LikeLiked by 1 person