Comments (16)
Because COVID-19 crisis is ongoing, we take (should take) measures and it is difficult to find y=f(t) where t is time and y is parameter value.
However, it may be possible to find y=f(measures) by
- estimation of parameters of all phases in many countries
- gathering information about measures, including lockdown, new medicines and physical distancing
- learning the relationship of measures and parameter values
How do you think?
from covid19-sir.
I read this pre-print today that might be useful here: https://www.medrxiv.org/node/79983.external-links.html
A repo hosts the updated dataset (daily): https://github.com/amel-github/covid19-interventionmeasures
There is also a website to explore the dataset: http://covid19-interventions.com/
I do not know which dataset is the best, but just in case this one can be useful.
from covid19-sir.
Hi,
I found this dataset from oxford that gathers all measures from countries and already calculates a score. The data is updated daily (with delays for some countries depending on when they make the info public and language barriers ) They are providing an API and the dataset is also provided as time series. https://covidtracker.bsg.ox.ac.uk/
Here is the related github repo https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md
Do you think this can be useful?
from covid19-sir.
Dear @joydisette ,
Thank you for your participating in this discussion!
OxCGRT you mentioned is very useful because it has reliable scoring method.
I think "data/OxCGRT_latest.csv" is more suitable as the raw data than API because we need all countries' data.
With a quick lock at the documents (Methodology for calculating indices) and the CSV file, I found the following useful columns:
GovernmentResponseIndexForDisplay
StringencyIndexForDisplay
ContainmentHealthIndexForDisplay
EconomicSupportIndexForDisplay
Can we predict the parameter values of SIR-F (or another) model with the four index values?
Because SIR-F parameters and the index values are phase-dependent and the start date of SIR-F does not the same as that of the index values, prediction methods will be complicated.
This needs further discussion.
Thank you.
from covid19-sir.
For learning the relationship of measures and parameters, tslearn Python package may be useful. With tslearn, we can make clusters of time series data (Rt/parameters of SIR-like models of each countries) and find patterns of time evolution.
For each pattern, we will try to create y=f(effective date of measures).
With natural language processing, it is necessary to prepare a dataset which has the effictive dates and the level of measures (lockdown etc.).
As an example, Kaggle dataset: COVID-19 containment and mitigation measures is daily updated.
from covid19-sir.
Dear @ilyasst ,
Thank you for providing the links!
I will create a Python class to analyse the data. Users will clone the GitHub repository and get information from the dataset with the class.
For each country, we can categorize the measures with Measure_L1
. Then, we will calculate score using Measure_L2
and Measure_L3
of Master_list_CCCSL_v2_ordered.csv
For example, scores in "Case identification, contact tracing and related measures" category will be
L2 | L3 | Score |
---|---|---|
Activate case notification | Covid-19 as a notifiable disease | 1 |
Airport health check | Health certificate requested | 2 |
Airport health check | Health declaration | 3 |
In "Environmental measures' L1,
L2 | L3 | Score |
---|---|---|
Approval of new biocidal product | ย None | 1 |
Environmental cleaning and disinfection | Airplanes | 2 |
Environmental cleaning and disinfection | Airports, ports and borders | 3 |
I could not find mentions in the preprint, but (L2, L3) with large index numbers seems more enhansed measures. Is this scoring system reliable?
from covid19-sir.
@lisphilar Since you think this can be relevant, I will look more closely and get back to you ASAP. I think we should be able to get the index and SIRF-model dates match, since for our implementation, we calculate the model's parameters for every new change in the curve. Cc @ilyasst could you comment on this?
from covid19-sir.
Data cleaning class was added to issue4 branch.
from covid19-sir.
I merged the branch, but this issue #3 continues.
from covid19-sir.
Dear @ilyasst and @joydisette ,
How can we estimate the number of change points of parameters?
CovsirPhy and my Kaggle notebook is using S-R trend analysis to find change points (not using exponential trend analysis now) and I can predict the change points with user-defined number of change points.
However, it is difficult to determine the number automatically.
S-R trend analysis:
For SIR-like models, log10(Susceptible) = - a * Recovered + b
with constant values (a, b)
. This is derived from the ODEs. When the model parameters change, the slope will be changed.
Thank you.
from covid19-sir.
Dear @lisphilar ,
If I understand well our problem, our goal is to detect n
trend changes for each parameter of an SIR-like model. Currently, the trend change is done using prophet, which requires the number of changepoints as an input and information about the kind of trend (linear, log, exp, ...), on the log10(susceptible). Thus, we are looking for a tool or method that can be used to detect n changepoints of a time series.
How about this tool https://pypi.org/project/trendet/ ?
Do you think we could use it to detect the trend changes ?
from covid19-sir.
Dear @ilyasst ,
Thank you for your comment and link.
Yes, the problem is how to detect the number of trend changes n_points=0
of the parameters.
I used fbprophet in the older versions of the Kaggle notebook, but I'm using Optuna in the following steps in covsirphy.analysis.sr_change.ChangeFinder
class now.
- Plot
log10(Susceptible)
vs.Recovered
(n_points=0
)
- Because the regression line does not fit,
n_points+=1
- Optuna package suggests
n_points
change points - scipy.optimze.curve_fit performs curve fitting with f(x)=A exp(-Bx) for each phase (between change points)
- Calculate weighted average of RMSLE scores
- Repeat step 3.-5. to find better combination of change points
- Repeat step 2.-6. to find better value of
n_points
- Remove 0th phase (from the first date of the dataset to the first change point) from the dataset
- This is because the number of cases is low and it is difficult to calculate parameters of the model in 0th phase
For Italy, I assumed n_points=4
as shown in the next figure.
I tried to develop automated ChangeFinder, but step 2. (finding the best value of n_points
) is a challenge. When n_points
is equal to the number of days in the dataset, RMSLE score is the best. However, this is not suitable for phase-dependent anaysis certainly.
I read the source code of trendet package. Trendet is for stock time series data and seems to detect change points based on plus/minus signs of . I think it is difficult to use this algorithm because Susceptible decreases monotonically and Recovered increases monotonically...
Thank you for discussion.
from covid19-sir.
I have tried to look around for a solution to this problem. I believe that the solutions proposed here by R. Killick are relevant to our problem. More specifically, the Pruned Exact Linear Time (PELT) algorithm appears to be the most interesting one as it specifically addresses the changepoint problem and aims at being fast and accurate.
The ruptures package (https://github.com/deepcharles/ruptures) offers an implementation of PELT. I tried playing around with ruptures and the Italy Suscpetible / Recovered dataset. Since the ruptures package expects a time-series (signal), I tried using it on both the Recovered and Suscpetibles data; then on the Susceptibles only and finally on the ratio of Susceptibles/Recovered.
Here is the code I have used (note that it might be possible to improve the results by better adjusting the parameters):
import matplotlib.pyplot as plt
import ruptures as rpt
import pandas as pd
data_ = pd.read_csv('italy_SR_data.csv')
signals = [data_.drop(columns = ["Date"]),
data_.drop(columns = ["Date", "Recovered"]),
data_["Susceptible_actual"]/data_["Recovered"]
]
for data_SR in signals:
# detection
algo = rpt.Pelt(model="rbf", jump = 2, min_size = 6).fit(data_SR.to_numpy())
result = algo.predict(pen=0.5)
# display
rpt.display(data_SR, result)
plt.show()
For both the Recovered and Suscpetibles:
Note that I did not specify any breakpoints, the algorithm"found" them itself. The x axis presents time evolution (days) starting at the day the first recovery was added to the dataset (extracted from the Scenario class) which is the 2020-02-22 for Italy. The red and blue background colors show the different changepoints (or phases) detected using PELT.
When considering the Susceptibles data only:
And for Suscpetibles / Recovered :
Since the first part of this one is hard to read, here is the same plot except that I removed the 10 first days:
Ruptures seems to be a possible solution for our problem. I could not investigate it more than this for now. There are a few things that could be improved by better investigating the PELT algorithm and the rupture package: the parameters used in the script above, the kind of cost function used (although I could not find specific directions about which one should be used), the kind of data we use to find the changepoints (it would be possible to feed the "signal" of Susceptibles as a function of Recovered, as you are currently doing, by interpolating the dataset to get a single Suscpetible value per Recovered value for example), ...
There are a few other solutions we can look into if this one is eventually not usable (most of them seem to rely on PELT):
- https://pypi.org/project/changefinder
- https://pypi.org/project/change-finder/
- https://github.com/ruipgil/changepy
Cheers,
ilyass
from covid19-sir.
Dear @ilyasst ,
Thank you ery much for information!
I agree with your idea and ruptures package is very useful for our issue.
As you mentioned, "interpolating the dataset to get a single Suscpetible value per Recovered value for example" will be necessary, because we need to use Recovered as x and Susceptible as y.
Susceptible depends on Recovered in S-R trend analysis as follows.
This means
and
Planed steps:
- Convert (R, S) = (1, 1000), (2, 996), (5, 980),... to (R, S) = (1, 1000), (2, 996), (3, NA), (4, NA), (5, 980),...
- Fill in NAs (the way to fill in NAs needs duscussion)
- Find change points (Recovered value) with ruptunes
- Convert Recovered value to Date
I think filling with spline curve pandas.series.interpolate("spline", order=2)
is effective, but I need your ideas.
Thank you always.
from covid19-sir.
Dear @ilyasst ,
I created "issue3" branch for this issue.
Please help me with editing covsirphy.analysis.sr_change.ChangeFinder
class. run()
method will detect the change points and show()
method will show the figure.
This is just a draft. Please use "issue3" branch for discussion and pull request.
Test codes will be saved in "tests/test_change_finder.py" file and we can perform the tests with pipenv run pytest -v --durations=0
command.
Thank you always for your cooperation.
from covid19-sir.
The following issues will be discussed in new pages.
- Keep track parameter values/reproductive number of all countries with a simple code
- Find relationship of reproductive number and measures automatically
from covid19-sir.
Related Issues (20)
- [New] DataEngineer().subset() convert categorical variable into dummy/indicator variables HOT 1
- [Dev] update version of cancel-workflow-action from 0.10.0 to 0.11.0 to fix warning in workflow
- [Docs] fix "ValueError: cannot insert City, already exists" when running notebook 01_data_preparation.ipynb
- [Data] ValueError when getting vaccination data in Japan
- [Dev] Update test code per deptry version 0.11.0 release
- [Bug] UserWarning when plotting: The figure layout has changed to tight HOT 2
- [Bug] Error when the number of Fatal cases are not changing with SIR-D model HOT 10
- [Docs] fix URL expiration regarding pepy.tech
- [Docs] show data flow diagram of CovsirPhy on README
- [Bug] FutureWarning with the behavior of DataFrame concatenation with empty or all-NA entries
- [Bug] FutureWarning: Series.__getitem__ treating keys as positions is deprecated
- [Bug] FutureWarning: The default of observed=False is deprecated
- [Bug] FutureWarning: Setting an item of incompatible dtype is deprecated
- [New] Drop Python 3.8 support HOT 1
- [New] Add Python 3.12 support
- [Bug] FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '<FloatingArray>
- [Bug] FutureWarning: Series.__getitem__ treating keys as positions is deprecated
- [Bug] DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns
- [Bug] tests failed with covsirphy.util.error.NotIncludedError: 'Population' was not included in the 'column list of raw data'. The dataframe has ISO3, Province, City, Date as columns.
- [Docs] update the last year in citation, 2020-2023 to 2020-2024
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from covid19-sir.