License: GNU General Public License v3.0

Jupyter Notebook 91.18% Python 8.80% Shell 0.02%

covid19_inference_forecast's Introduction

Bayesian inference and forecast of COVID-19

Current code development takes place in the new repository.
The research article is available on arXiv and is in press at Science. In addition we published technical notes, answering some common questions: technical notes.
Here, we keep updating figures and provide the original code for the research article. To get started, see SIR_Germany_3scenarios_with_sine_weekend.ipynb, which generates Fig. 3 of the research article, and scripts/paper200429/, which is the directory of all scripts used for the article. It runs e.g. in Google Colab. Requirement is PyMC3 >= 3.7.
Documentation is available for this repo as well as the new repo.
Please take notice of our disclaimer.

Daily updated scenarios

With a second wave hitting Germany and an ordered lockdown on the 2nd November, case number are more interesting than ever before. In the following we show a list of daily updated figures using different data sources and the same modeling approach as in our paper.

Additionally there is a lightweight website developed by us to show the age dependent incidence across different regions in Germany (updates every 6 hours). The website can be access here and was created for our aerzteblatt paper.

Scenario using weekly changepoints and reporting date data (OWD)

Daily updated cases by reporting date are retrieved from Our World in Data.

Scenario using weekly changepoints and nowcasting data (RKI)

We recommend using plot based on reporting date above. The nowcast imputation step of the RKI doesn't take into account varying reporting delays and weekly variations in the percentage of people which report the begin of the symptoms. Both effects lead to biases in recent case numbers (see also twitter:StefFun)

Daily updated nowcasting data is available at the Robert Koch Institute, but is delayed by four days to one week.

What-if scenarios lockdown

General

We show three different scenarios starting today (see date of latest updated). One with a very low reproduction number R starting today (R=0.7; green), a second one, where the underling reproduction number does not change from the current estimate (dark blue), a third one with a reproduction number of R=1.3 (light blue). Please take the numbers as rough orientation only.

Legacy

Data until 2nd February 2021

Around Christmas and new year one can clearly see that the reporting was not working as effectifly as before or afterwards. Our model is able to bridge this behaviour if enought data before and afterwards is given.

Data until 25th December 2020

Instead of further thougher restrictions around the 14th of December we proposed a lockdown right after Christmas on the 25th November.

Data until 2nd November 2020

With a second lockdown (light) present in Germany from the 2nd November onwards, we try to model three different lockdown scenarios.

Until Nov. 1, the R-value was about 1.3. In order for the case numbers to decrease rapidly, it must be lowered to R=0.7.

We extrapolate three scenarios: (Red) The lockdown on November 2 has no effect. (Orange) A mild lockdown, with a resulting reproduction number of 1.0 and (green) a strict lockdown with a resulting reproduction number of 0.7, thus as effective as in spring. The current development (blue data points) suggests that the lockdown is "mild", i.e. not sufficient to reduce the case numbers. To reduce the case numbers, not only does an R below 1 need to be achieved, but the R must be well below 1, for example at 0.7 as in spring. Otherwise, the case numbers decrease only very slowly.

The discrepancy between data (blue) and prediction (yellow) may be due to (a) the application of stricter test criteria (there are not enough tests for all suspect cases), and (b) the fact that the change in behavior was implemented before November 1. The lockdown was probably already partially implemented before it was prescribed.

Modeling forecast scenarios in Germany (updated figures of the paper)

Our aim is to quantify the effects of intervention policies on the spread of COVID-19. To that end, we built a Bayesian SIR model where we can incorporate our prior knowledge of the time points of governmental policy changes. While the first two change points were not sufficient to switch from growth of novel cases to a decline, the third change point (the strict contact ban initiated around March 23) brought this crucial reversal. - Now, a number of stores have been opened and policies have been loosened on the one hand, which may lead to increased spreading (increased $\lambda^\ast$ ). On the other hand, masks are now widely used and contact tracing might start to show effect, which both may reduce the spread of the virus (decrease $\lambda^\ast$ ). We will only start to see the joint effects of the novel govenrmental policies and collective behavior with a delay of 2-3 weeks. Therefore, we show alternative future scenarios here.

Alternative forecast scenarios, projecting the relaxation of restrictions on May 11 2020

If the effective growth rate stays on the current (all-time low) value, new cases will further decrease (green). A low number of new daily cases might bring a full control of the spread within reach (see our position paper by the four German research associations; Endorsement; Position paper).
If the relaxation of restrictions causes an increase in effective growth rate above zero, the daily new reported cases will increase again (red).

The current scenarios are based on the model that incorporates weekly reporting modulation (less cases reported on weekends).

Scenario focus on three change points

Scenario assuming three change points with a weekly modulation of reported cases

What if

What if the growth would have continued with less change points?

We fitted the four scenarios to the number of new cases until respectively March 18th, March 25th, April 1st and April 7th.

This figure was used widely in German media, including TV, to illustrate the magnitude of the different change points.

covid19_inference_forecast's People

Contributors

Stargazers

Watchers

covid19_inference_forecast's Issues

function filter_one_country() returns wrong results for China, France etc.

First of all: Thank you very much for your commitment and for the release of the covid19_inference_forecast source code via Github. Good job indeed!

In the past few days I have also read in and processed data from Johns Hopkins University for my own analyzes. I noticed the following problem: There are (at least) three constellations concerning the columns 'Province/State' and 'Country/Region':

(1) empty/null cell for 'Province/State', a single row for the 'Country/Region' (e.g. Germany)
(2) multiple rows for one Country/Region (e.g. China) with all entries for 'Province/State' (inside the main country
(3) same as (2) above + an additional row for the main country (e.g. France) with an empty/null cell for 'Province/State', here the other rows (with names for 'Province/State') contain external territories.

That means your function filter_one_country() needs some further improvements:

(A) looking for country with empty/null content in the 'Province/State' column
(this should work for countries like Germany (1) and France (3)
(B) filter all rows for the selected country and calculate the sum of all numbers (cases, deaths, ...)

The following code works for me (only as a suggestion - I'm a Python/Pandas newbie):

    y = d[(d['Province/State'].isnull()) & (d['Country/Region']==country)]

    if len(y)==1:

        y = y.values[0][x0:]

    elif len(y)==0:

        y = d[d['Country/Region']==country].sum().values[x0:]

    else:

        print('ERROR: country = ' + country)

Happy Easter!
André

(Computational Analytics Group @ Fraunhofer IIS Dresden)

figure B (what if scenario, Nov 13, 2021) range of y axis

Please fix the ymax in the (very interesting) diagram B of your latest what-if scenario. Thank you, André

Lambda parameter in the SIR system

Hello. Having studied your article and the corresponding code, I had questions about the λ parameter. The article says that λ is a spreading rate (infection rate, FIG.1 in article). Let's look at the code now

def SIR_model(λ, μ, S_begin, I_begin, N):
    new_I_0 = tt.zeros_like(I_begin)
    def next_day(λ, S_t, I_t, _):
        new_I_t = λ/N*I_t*S_t
        S_t = S_t - new_I_t
        I_t = I_t + new_I_t - μ * I_t
        return S_t, I_t, new_I_t
    outputs , _  = theano.scan(fn=next_day, sequences=[λ], 
                               outputs_info=[S_begin, I_begin, new_I_0])
    S_all, I_all, new_I_all = outputs
    return S_all, I_all, new_I_all

# fraction of people that are newly infected each day
λ = pm.Lognormal("λ", mu=np.log(0.4), sigma=0.5)

This code corresponds to a system of ODEs

$\frac{dS}{dt} = -\frac{\lambda IS}{N}$

$\frac{dI}{dt} = \frac{\lambda IS}{N} - \mu I$

$\frac{dR}{dt} = \mu I$

The λ parameter here, according to the literature, is not infection rate (0<λ<1), but effective contact rate (λ>0)!

If λ is infection rate (Murray, p.320) or transmission rate (per capita), the system of equations should look like this:

$\frac{dS}{dt} = -{\lambda IS}$

$\frac{dI}{dt} = {\lambda IS} - \mu I$

$\frac{dR}{dt} = \mu I$

Maybe I'm wrong, but in my opinion it greatly affects the interpretation of the model results.

This is also important when calculating R0. The basic reproduction number R0 is calculated as infection_rate*N/recovery_rate=effective_contact_rate/recovery_rate. In article R0 is calculated as infection_rate/recovery_rate (0.41/0.12=3.4).

Windows 10 multicore workaround

Hi,
Thanks for doing this great job!
I am trying to run this computation on a MS Window 10 system.
If I have more than one core my Jupyter-Notebook-Kernel crashes.
But, it helps to restrict the number of cores to one, e.g.
trace = pm.sample(model=model, init='advi', cores=1)
Best regards,
Rainer, MPIDR

More condensed data preprocessing

Hi there,

Really cool model!
I'm working to understand & run your model and found it a bit easier to preprocess the JHU data with a bit of pandas:

First, to reformat the awkwardly formatted original JHU data to a DataFrame with multi-index:

def _jhu_to_iso(fp_csv:str) -> pandas.DataFrame:
    """Convert Johns Hopkins University dataset to nicely formatted DataFrame.

    Drops Lat/Long columns and reformats to a multi-index of (country, state).
    """
    df = pandas.read_csv(fp_csv, sep=',')
    # change columns & index
    df = df.drop(columns=['Lat', 'Long']).rename(columns={
        'Province/State': 'state',
        'Country/Region': 'country'
    })
    df = df.set_index(['country', 'state'])
    # datetime columns
    df.columns = [datetime.datetime.strptime(d, '%m/%d/%y') for d in df.columns]
    return df

Then filtering by country/state is much easier:

country = 'Germany'
state = None

# load & transform
df_confirmed = _jhu_to_iso(fp_confirmed) # <-- filepath or URL to original CSV
df_deaths = _jhu_to_iso(fp_deaths)
df_recovered = _jhu_to_iso(fp_recovered)

# filter
df = pandas.DataFrame(columns=['date', 'confirmed', 'deaths', 'recovered']).set_index('date')
df['confirmed'] = df_confirmed.loc[(country, state)]
df['deaths'] = df_deaths.loc[(country, state)]
df.index.name = 'date'

With datetime objects in the DataFrame index, one can slice directly with the date:

date_data_begin = datetime.datetime(2020, 3, 1)
date_data_end = df.index[-1]
df.loc[date_data_begin:date_data_end, 'confirmed'].values

cheers

priesemann-group / covid19_inference_forecast Goto Github PK