Topic: How can we predict the number of cases accurately? Note:<

I read this pre-print today that might be useful here: <a href="https://www.medrxiv.or

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

For learning the relationship of measures and parameters, <a href="https://github.com/

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I merged the branch, but this issue <a class="issue-link js-issue-link" data-error-tex

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

How to predict the number of cases accurately about covid19-sir HOT 16 CLOSED

lisphilar commented on June 16, 2024

How to predict the number of cases accurately

from covid19-sir.

Comments (16)

lisphilar commented on June 16, 2024 2

Because COVID-19 crisis is ongoing, we take (should take) measures and it is difficult to find y=f(t) where t is time and y is parameter value.
However, it may be possible to find y=f(measures) by

estimation of parameters of all phases in many countries
gathering information about measures, including lockdown, new medicines and physical distancing
learning the relationship of measures and parameter values

How do you think?

from covid19-sir.

ilyasst commented on June 16, 2024 1

I read this pre-print today that might be useful here: https://www.medrxiv.org/node/79983.external-links.html
A repo hosts the updated dataset (daily): https://github.com/amel-github/covid19-interventionmeasures
There is also a website to explore the dataset: http://covid19-interventions.com/

I do not know which dataset is the best, but just in case this one can be useful.

from covid19-sir.

joydisette commented on June 16, 2024 1

Hi,
I found this dataset from oxford that gathers all measures from countries and already calculates a score. The data is updated daily (with delays for some countries depending on when they make the info public and language barriers ) They are providing an API and the dataset is also provided as time series. https://covidtracker.bsg.ox.ac.uk/
Here is the related github repo https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md
Do you think this can be useful?

from covid19-sir.

lisphilar commented on June 16, 2024 1

Dear @joydisette ,
Thank you for your participating in this discussion!
OxCGRT you mentioned is very useful because it has reliable scoring method.
I think "data/OxCGRT_latest.csv" is more suitable as the raw data than API because we need all countries' data.

With a quick lock at the documents (Methodology for calculating indices) and the CSV file, I found the following useful columns:

GovernmentResponseIndexForDisplay
StringencyIndexForDisplay
ContainmentHealthIndexForDisplay
EconomicSupportIndexForDisplay

Can we predict the parameter values of SIR-F (or another) model with the four index values?

Because SIR-F parameters and the index values are phase-dependent and the start date of SIR-F does not the same as that of the index values, prediction methods will be complicated.
This needs further discussion.
Thank you.

from covid19-sir.

lisphilar commented on June 16, 2024

For learning the relationship of measures and parameters, tslearn Python package may be useful. With tslearn, we can make clusters of time series data (Rt/parameters of SIR-like models of each countries) and find patterns of time evolution.

For each pattern, we will try to create y=f(effective date of measures).

With natural language processing, it is necessary to prepare a dataset which has the effictive dates and the level of measures (lockdown etc.).
As an example, Kaggle dataset: COVID-19 containment and mitigation measures is daily updated.

from covid19-sir.

lisphilar commented on June 16, 2024

Dear @ilyasst ,
Thank you for providing the links!
I will create a Python class to analyse the data. Users will clone the GitHub repository and get information from the dataset with the class.

For each country, we can categorize the measures with Measure_L1. Then, we will calculate score using Measure_L2 and Measure_L3 of Master_list_CCCSL_v2_ordered.csv

For example, scores in "Case identification, contact tracing and related measures" category will be

L2	L3	Score
Activate case notification	Covid-19 as a notifiable disease	1
Airport health check	Health certificate requested	2
Airport health check	Health declaration	3

In "Environmental measures' L1,

L2	L3	Score
Approval of new biocidal product	None	1
Environmental cleaning and disinfection	Airplanes	2
Environmental cleaning and disinfection	Airports, ports and borders	3

I could not find mentions in the preprint, but (L2, L3) with large index numbers seems more enhansed measures. Is this scoring system reliable?

from covid19-sir.

joydisette commented on June 16, 2024

@lisphilar Since you think this can be relevant, I will look more closely and get back to you ASAP. I think we should be able to get the index and SIRF-model dates match, since for our implementation, we calculate the model's parameters for every new change in the curve. Cc @ilyasst could you comment on this?

from covid19-sir.

lisphilar commented on June 16, 2024

Data cleaning class was added to issue4 branch.

from covid19-sir.

lisphilar commented on June 16, 2024

I merged the branch, but this issue #3 continues.

from covid19-sir.

lisphilar commented on June 16, 2024

Dear @ilyasst and @joydisette ,
How can we estimate the number of change points of parameters?
CovsirPhy and my Kaggle notebook is using S-R trend analysis to find change points (not using exponential trend analysis now) and I can predict the change points with user-defined number of change points.
However, it is difficult to determine the number automatically.

S-R trend analysis:
For SIR-like models, log10(Susceptible) = - a * Recovered + b with constant values (a, b). This is derived from the ODEs. When the model parameters change, the slope will be changed.

Thank you.

from covid19-sir.

ilyasst commented on June 16, 2024

Dear @lisphilar ,

If I understand well our problem, our goal is to detect n trend changes for each parameter of an SIR-like model. Currently, the trend change is done using prophet, which requires the number of changepoints as an input and information about the kind of trend (linear, log, exp, ...), on the log10(susceptible). Thus, we are looking for a tool or method that can be used to detect n changepoints of a time series.

How about this tool https://pypi.org/project/trendet/ ?

Do you think we could use it to detect the trend changes ?

from covid19-sir.

lisphilar commented on June 16, 2024

Dear @ilyasst ,
Thank you for your comment and link.
Yes, the problem is how to detect the number of trend changes n_points=0 of the parameters.
I used fbprophet in the older versions of the Kaggle notebook, but I'm using Optuna in the following steps in covsirphy.analysis.sr_change.ChangeFinder class now.

Plot log10(Susceptible) vs. Recovered (n_points=0)
Because the regression line does not fit, n_points+=1
Optuna package suggests n_points change points
scipy.optimze.curve_fit performs curve fitting with f(x)=A exp(-Bx) for each phase (between change points)
Calculate weighted average of RMSLE scores
Repeat step 3.-5. to find better combination of change points
Repeat step 2.-6. to find better value of n_points
Remove 0th phase (from the first date of the dataset to the first change point) from the dataset
- This is because the number of cases is low and it is difficult to calculate parameters of the model in 0th phase

For Italy, I assumed n_points=4 as shown in the next figure.

I tried to develop automated ChangeFinder, but step 2. (finding the best value of n_points) is a challenge. When n_points is equal to the number of days in the dataset, RMSLE score is the best. However, this is not suitable for phase-dependent anaysis certainly.

I read the source code of trendet package. Trendet is for stock time series data and seems to detect change points based on plus/minus signs of . I think it is difficult to use this algorithm because Susceptible decreases monotonically and Recovered increases monotonically...

Thank you for discussion.

from covid19-sir.

ilyasst commented on June 16, 2024

I have tried to look around for a solution to this problem. I believe that the solutions proposed here by R. Killick are relevant to our problem. More specifically, the Pruned Exact Linear Time (PELT) algorithm appears to be the most interesting one as it specifically addresses the changepoint problem and aims at being fast and accurate.

The ruptures package (https://github.com/deepcharles/ruptures) offers an implementation of PELT. I tried playing around with ruptures and the Italy Suscpetible / Recovered dataset. Since the ruptures package expects a time-series (signal), I tried using it on both the Recovered and Suscpetibles data; then on the Susceptibles only and finally on the ratio of Susceptibles/Recovered.
Here is the code I have used (note that it might be possible to improve the results by better adjusting the parameters):

import matplotlib.pyplot as plt
import ruptures as rpt
import pandas as pd

data_ = pd.read_csv('italy_SR_data.csv')  
signals = [data_.drop(columns = ["Date"]), 
            data_.drop(columns = ["Date", "Recovered"]),
            data_["Susceptible_actual"]/data_["Recovered"]
            ]

for data_SR in signals:
    # detection
    algo = rpt.Pelt(model="rbf", jump = 2, min_size = 6).fit(data_SR.to_numpy())
    result = algo.predict(pen=0.5)
    # display
    rpt.display(data_SR, result)
    plt.show()

For both the Recovered and Suscpetibles:

Note that I did not specify any breakpoints, the algorithm"found" them itself. The x axis presents time evolution (days) starting at the day the first recovery was added to the dataset (extracted from the Scenario class) which is the 2020-02-22 for Italy. The red and blue background colors show the different changepoints (or phases) detected using PELT.

When considering the Susceptibles data only:

And for Suscpetibles / Recovered :

Since the first part of this one is hard to read, here is the same plot except that I removed the 10 first days:

Ruptures seems to be a possible solution for our problem. I could not investigate it more than this for now. There are a few things that could be improved by better investigating the PELT algorithm and the rupture package: the parameters used in the script above, the kind of cost function used (although I could not find specific directions about which one should be used), the kind of data we use to find the changepoints (it would be possible to feed the "signal" of Susceptibles as a function of Recovered, as you are currently doing, by interpolating the dataset to get a single Suscpetible value per Recovered value for example), ...

There are a few other solutions we can look into if this one is eventually not usable (most of them seem to rely on PELT):

Cheers,
ilyass

from covid19-sir.

lisphilar commented on June 16, 2024

Dear @ilyasst ,
Thank you ery much for information!
I agree with your idea and ruptures package is very useful for our issue.

As you mentioned, "interpolating the dataset to get a single Suscpetible value per Recovered value for example" will be necessary, because we need to use Recovered as x and Susceptible as y.

Susceptible depends on Recovered in S-R trend analysis as follows.
$\frac{\mathrm{d}S}{\mathrm{d}R}=-\frac{\beta}{N\gamma}S$
This means
$S(R)=Ne^{-\frac{\beta R}{N\gamma}}$
and
$\log(S(R))=log(N)-\cfrac{\beta R}{N\gamma}$

Planed steps:

Convert (R, S) = (1, 1000), (2, 996), (5, 980),... to (R, S) = (1, 1000), (2, 996), (3, NA), (4, NA), (5, 980),...
Fill in NAs (the way to fill in NAs needs duscussion)
Find change points (Recovered value) with ruptunes
Convert Recovered value to Date

I think filling with spline curve pandas.series.interpolate("spline", order=2) is effective, but I need your ideas.
Thank you always.

from covid19-sir.

lisphilar commented on June 16, 2024

Dear @ilyasst ,
I created "issue3" branch for this issue.
Please help me with editing covsirphy.analysis.sr_change.ChangeFinder class. run() method will detect the change points and show() method will show the figure.

This is just a draft. Please use "issue3" branch for discussion and pull request.

Test codes will be saved in "tests/test_change_finder.py" file and we can perform the tests with pipenv run pytest -v --durations=0 command.

Thank you always for your cooperation.

from covid19-sir.

lisphilar commented on June 16, 2024

The following issues will be discussed in new pages.

Keep track parameter values/reproductive number of all countries with a simple code
Find relationship of reproductive number and measures automatically

from covid19-sir.

How to predict the number of cases accurately about covid19-sir HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent