lisphilar / covid19-sir Goto Github PK

CovsirPhy: Python library for COVID-19 analysis with phase-dependent SIR-derived ODE models.

Home Page: https://lisphilar.github.io/covid19-sir/

License: Apache License 2.0

Python 68.43% Makefile 0.81% Jupyter Notebook 30.77%

covid19 epidemiology python covid covid-19 coronavirus epidemic-simulations epidemic-model data-science analysis

covid19-sir's Introduction

👨‍💼Work as a data anaylist in clinical data scieince field and 👨‍💻develop Python project, including CovsirPhy library, as a hobby.
Clinical research associate previously.

Clinical data scientist in Japan. Python library developer.
分子生物学(2012-2018 M.Eng.), 生命情報科学(2014- 独学), 医薬品臨床試験(2018−2023 CRA), 医療データ解析(2024-).
#CovsirPhy

Category	Item	Inforamtion
Info	Keyword	Molecular biology, Neurochemistry, Bioinformatics, Clinical trials, Clinical data science, Python library development, Japanese history
	Language	Japanese, Python, English
	Where	Japan
	Favorite book	What Is Life? by Schrödinger
Job	2018/4-2023/12	CRA / Clinical research associate
	2024/1-current	Data analyst in clinical data science
Academic	Degree	Bachelor of Engineering (Life Science Program) Master of Engineering (Chemical and Energy Engineering)
	College	College of Engineering Science, YOKOHAMA National University, Japan.
	Graduate school	Graduate school of Engineering, YOKOHAMA National University, Japan.
	Subject of bachelor/master's thesis	Protein-protein interaction site analysis to develop a drug for central nervous system injuries
Tool	Python	From 2014
	R	Learned from 2012 to 2014, from 2023
	Editor	Visual Studio Code
	Note taking	Obsidian, Google Colaboratory
	OS	Windows / Windows subsystem for Linux
	Learning	Data science with Python (self-study) / Coursera / edX

Lisphilar: Life science + philosophy of science + molecular biology

Products

I have published the following libraries and notebooks. Please collaborate with me for development!

COVID-19 data analysis

GitHub/PyPI: CovsirPhy: Python library for COVID-19 data analysis with phase-dependent SIR-derived ODE models
Kaggle Notebook: COVID-19 data with SIR model
Dataset: COVID-19 dataset in Japan

Scenario files of Sengokushi (Japanese history, written in Japanese)

GitHub: 豊徳二重公儀の統合

Stats

covid19-sir's People

Contributors

Stargazers

Watchers

covid19-sir's Issues

Interface to create example datasets easily

Is your feature request related to a problem? Please describe.
Interface to create example datasets is necessary to provide tools to develop new ODE models.

Describe the solution you'd like

JHUData class includes the methods of PhaseData, SRData and ODEData
Create ExampleData that is a sub-class of JHUData
ExampleData produces example datasets with pre-set/applied parameters and models

ImportError when pip install with Kaggle Nontebook

Summary:
In Kaggle Nontebook, the following installation causes ImportError (cannot import name 'ModelBase' from covsirphy.ode.mbase)

!pip install covsirphy

CovsirPhy version 2.5.2

Environment:
Python 3.8, pipenv, WSL.

Need change of ODE simulation system from non-dim to dimensional system

As mensioned in #4 (comment) and #1 (comment), we need to change the ODE simulation system from non-dimenstional system to dimensional system.

Advantage of non-dimensional system:

we can estimate the parameter values efficiently and compare the parameter values
we can compare the parameter values with that of the other countries easily

Dis-advantage of non-dimensional system (may be root cause of issue#1):

substraction in diferencial equations reduces the accuracy of numerical simulation
it is difficult to determine the minimum value of dydt
dimensionalization of simulated values increases error with actual values

USA scenario analysis: not show line in S-R trend analysis

Summary:
With USA data, the following codes do not show fitting line in S-R trend analysis.

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
scenario = cs.Scenario(jhu_data, population_data, country="US")
scenario.trend()

Environment:
Python 3.8, pipenv, WSL.

Add demonstration and the outputs of quick usage

Add the following documents.

GIF file of demonstration
Quickest usage: Jupyter notebook, included in GitHub pages
Quick usage: Jupyter notebook, included in GitHub pages

JHUData.subset(country="US") causes KeyError in Kaggle

Summary:
JHUData.subset(country="US") causes KeyError with Kaggle datasets.

CovsirPhy version 2.5.2

Related classes:

covsirphy.JHUData

Codes and outputs:
(Local environment with Kaggle API)

import covsirphy as cs
data_loader = cs.DataLoader("input")
jhu_data = cs.JHUData("/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv")
population_data = cs.PopulationData(
    "/kaggle/input/covid19-global-forecasting-locations-population/locations_population.csv"
)
scenario = cs.Scenario(jhu_data, population_data, country="US")
scenario.records()

This causes KeyError as follows.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-170-0205c0c93a9f> in <module>
      1 usa_scenario = cs.Scenario(jhu_data, pop_data, "US")
----> 2 usa_scenario.records().tail()

/opt/conda/lib/python3.6/site-packages/covsirphy/analysis/scenario.py in records(self, show_figure, filename)
     95             Records with Recovered > 0 will be selected.
     96         """
---> 97         df = self.jhu_data.subset(country=self.country, province=self.province)
     98         if not show_figure:
     99             return df

/opt/conda/lib/python3.6/site-packages/covsirphy/cleaning/jhu_data.py in subset(self, country, province, start_date, end_date, population)
    251         # Subset with area
    252         df = self._subset_area(
--> 253             country, province=province, population=population
    254         )
    255         # Subset with Start/end date

/opt/conda/lib/python3.6/site-packages/covsirphy/cleaning/jhu_data.py in _subset_area(self, country, province, population)
    206                 return df.loc[df[self.R] > 0, :]
    207             raise KeyError(
--> 208                 f"Records of {province} in {country} were not registered.")
    209         # Province was not selected and COVID-19 Data Hub dataset
    210         c_level_set = set(

KeyError: 'Records of - in US were not registered.'

Environment:
Python 3.8, pipenv, WSL.

Estimater of ODE parameters does not perform parallel jobs

Summary:
Estimator.run(n_jobs=-1) does not perform paralle jobs. This may be caused by threading method of Optuna package.

CovsirPhy version 2.3.2

Related classes:

covsirphy.Estimator
covsirphy.Scenario

Environment:
Python 3.8, pipenv, WSL.

Failed in addition of past phase manually

Dear Rakesh,
Thank you for your feed-back.
I changed the code to fix #110 in GitHub, but new error occurred as you mentioned. I will fix this issue today and update the package in PyPI.

By the way, please edit each "summary", "Codes and outputs" section etc. in the issue template. I think this is more useful for you.

Dear Lisphilar,
Please let me know how to divide the different phases in case of scenario matter as per own need.Kindly help me if possible.

With regards,
Rakesh

Originally posted by @SM-ins in #116 (comment)

Fixing Bug: ParserError with Population class

Hii,

pop_data = cs.Population(
"../input/world-population/API_EN.POP.DNST_DS2_en_csv_v2.csv"
)
pop_data.cleaned().tail()

i am getting the error given below while running the above code...,please help if possible...

ParserError: Error tokenizing data. C error: Expected 3 fields in line 5, saw 62

With regards,
Rakesh

Revise stdout of parameter estimation

Summary:
Stdout of parameter estimation in scenario analysis could be revised.

CovsirPhy version 2.4.1

Related classes:

covsirphy.Scenario

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
scenario.trend(set_phases=True)
scenario.estimate(cs.SIRF)

This code returns
10th phase with SIR-F model finished 67 in 1 min 3 sec. etc.
10th phase with SIR-F model finished 67 trials in 1 min 3 sec. is better.

Environment:
Python 3.8, pipenv, WSL.

Change default value of Estimator.run(timeout_iteration) to 5 seconds

Summary:
Because parameter estimation completes within 5 seconds in some phases, the default value of timeout_iteration of Estimator.run() can be 5 seconds.

CovsirPhy version 2.4.1

Related classes:

covsirphy.Estimator

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
scenario = cs.Scenario(jhu_data, population_data, "Japan")
scenario.trend()
scenario.estimate(cs.SIRF)

Environment:
Python 3.8, pipenv, WSL.

Select a country with ISO3 code

Is your feature request related to a problem? Please describe.
In version 2.4, we can specify countries only with country name in JHUData and Scenario class.

Describe the solution you'd like
For standard users, create a method of CleaningBase class to convert ISO3 code to country name.

ModuleNotFoundError: No module named 'better_exceptions' in installation

Summary:
When installed CovsirPhy with pip command, "ModuleNotFoundError: No module named 'better_exceptions'" occurs and we cannot install this package.
This error was mentioned as a comment of Kaggle notebook.

CovsirPhy version 2.2.2

Codes and outputs:

pip install git+https://github.com/lisphilar/covid19-sir#egg=covsirphy

This code causes the following error.

Collecting covsirphy from git+https://github.com/lisphilar/covid19-sir#egg=covsirphy
  Cloning https://github.com/lisphilar/covid19-sir to /tmp/pip-build-y0_kp_8r/covsirphy
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-y0_kp_8r/covsirphy/setup.py", line 3, in <module>
        setup()
...
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 994, in _gcd_import
      File "<frozen importlib._bootstrap>", line 971, in _find_and_load
      File "<frozen importlib._bootstrap>", line 941, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "<frozen importlib._bootstrap>", line 994, in _gcd_import
      File "<frozen importlib._bootstrap>", line 971, in _find_and_load
      File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 678, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/tmp/pip-build-y0_kp_8r/covsirphy/covsirphy/__init__.py", line 6, in <module>
        import better_exceptions
    ModuleNotFoundError: No module named 'better_exceptions'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-y0_kp_8r/covsirphy/

Environment:
Python 3.6.4, pip, WSL.

Long-term ODE simulation shows negative number of cases

Summary:
The number of cases must be a non-negetive integer. However, long-term ODE simulation shows negative number of cases.

Version 2.0.1

Related classes:

covsirphy.analysis.simulator.ODESimulator
covsirphy.ode.sirf.SIRF

Code:

import covsirphy as cs
# Settings
eg_population = 1_000_000
eg_tau = 1440
step_n = 1000  # Step number of simulation
param_dict = {"theta": 0.002, "kappa": 0.005, "rho": 0.2, "sigma": 0.075}
y0_dict = {"x": 0.999, "y": 0.001, "z": 0, "w": 0}
# Simulation
simulator = cs.ODESimulator(country="Example", province="Example-1")
simulator.add(
    model=cs.SIRF, step_n=step_n, population=eg_population,
    param_dict=param_dict, y0_dict=y0_dict
)
simulator.run()
# Non-dimensional
nondim_df = simulator.non_dim()

Output:
nondim_df is a dataframe and shows predicted values in non-dimensional ODE model.
(t: time step, x: Susceptible/Population, y: Infected/Population, z: Recovered/Population, w: Fatal/Population for SIR-F model.)
x, y, z and w must be a positive number of cases, but some x values are negative values.

Frequency:
Always

Environment:
Python 3.8, pipenv, WSL

Show parameter values and OxCGRT scores in the same dataframe

Is your feature request related to a problem? Please describe.
As mentioned in #3, it is useful to show the parameter values and OxCGRT scores in a dataframe. This new method will be used for learning the relationship of parameter values and OxCGRT scores.

Describe the solution you'd like

Create cs.Scenario(..., country="country name") instance
Perform S-R trend analysis and find change points and set phases with cs.Scenario.trend()
Calculate parameter values of phases with cs.Scenario.estimate(cs.SIRF)
Calculate parameter values of each day using this new method
Combine with OxCGRT data using this new method and create a dataframe

(with Kaggle API) KeyError for covsirphy.Population.value(country="JPN")

Summary:
KeyError was raised when covsirphy.Population.value(country="JPN") was done when the datasets were downloaded with "input.py" (Kaggle API).

CovsirPhy version 2.3.0

Related classes:

covsirphy.cleaning.population.Population

Codes and outputs:

import covsirphy as cs
data_loader = DataLoader("input")
population_data = data_loader.population()
population_data.value("JPN")

This code returns KeyError: 'JPN is not registered. Please use ISO3 code, like JPN.'

Environment:
Python 3.8, pipenv, WSL.

Citation was set mistakenly for local datasets

Summary:
DataLoader.jhu() etc. must set the citations when retrieving from remote servers. However, citation was set when using local files downloaded from Kaggle API.

CovsirPhy version 2.4.1

Related classes:

covsirphy.DataLoader

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu(local_file="covid_19_data.csv")

Environment:
Python 3.8, pipenv, WSL.

How to predict the number of cases accurately

Topic:
How can we predict the number of cases accurately?

Note:
With version 2.0.0, we perform the following steps.

Split time-series data to some phases using S-R trend analysis
Estimate the parameter values of an ODE model using data of each phase
Predict parameter values of future phases using values of the last phase
Simulate the ODE model with predicted parameter values

Please share your ideas to update the steps/create a new approach.

Un-necessary optumization is done for fixed parameters

Summary:
Un-necessary optumization is done for fixed parameters in hyperparameter optimization of math models. This bug was mentioned by prbocca on a Kaggle notebook.

CovsirPhy version 2.1.1

Related classes:

covsirphy.phase.estimator.Estimator

Current:

p_dict.update( { k: trial.suggest_uniform(k, *v) for (k, v) in model_param_dict.items() } )

Proposed:

p_dict.update( { k: trial.suggest_uniform(k, *v) for (k, v) in model_param_dict.items() if k is not in self.fixed_dict.keys() } )

Documentation of the detail of usage

Is your feature request related to a problem? Please describe.
Quick usage is in README.md and example codes are in example directory.
However, it is difficult to get detaild information of this package.

Describe the solution you'd like
Create GitHub Pages with Sphinx to document the details of this package.

Figure of S-R trend analysis: 10th phase converted to 0Initial in legend

Summary:
In the figure of S-R trend analysis, 10th phase was labeled as "0Initilal" phase.

CovsirPhy version 2.4.1

Related classes:

covsirphy.ChangeFinder
covsirphy.Trend

Codes and outputs:

import covsirphy as cs
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
scenario = cs.Scenario(jhu_data, population_data, country="Japan")
scenario.trend()

Environment:
Python 3.8, pipenv, WSL.

ImportError of CovsirPhy: cannot import name 'ModelBase'

Summary:
When importing CovsirPhy in Kaggle Notebook, ImportError occurred.

CovsirPhy version 2.5.1

Environment:
Python 3.8, pipenv, WSL.

Codes:

!pip install covsirphy
import covsirphy as cs

Error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-3-a646b9deb9e0> in <module>
----> 1 import covsirphy as cs
      2 cs.get_version()

/opt/conda/lib/python3.6/site-packages/covsirphy/__init__.py in <module>
     10     better_exceptions_installed = False
     11 from covsirphy.__version__ import __version__
---> 12 from covsirphy.analysis import ODESimulator, ChangeFinder
     13 from covsirphy.analysis import PhaseSeries, Scenario
     14 from covsirphy.cleaning import Term, CleaningBase, DataLoader

/opt/conda/lib/python3.6/site-packages/covsirphy/analysis/__init__.py in <module>
     13 
     14 for m in modules:
---> 15     m_imported = import_module(f"{__name__}.{m.stem}")
     16     for (k, v) in m_imported.__dict__.items():
     17         if not k.startswith("__"):

/opt/conda/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

/opt/conda/lib/python3.6/site-packages/covsirphy/analysis/scenario.py in <module>
     13 import numpy as np
     14 import pandas as pd
---> 15 from covsirphy.ode import ModelBase
     16 from covsirphy.cleaning import JHUData, PopulationData, Term
     17 from covsirphy.phase import Estimator

/opt/conda/lib/python3.6/site-packages/covsirphy/ode/__init__.py in <module>
     13 
     14 for m in modules:
---> 15     m_imported = import_module(f"{__name__}.{m.stem}")
     16     for (k, v) in m_imported.__dict__.items():
     17         if not k.startswith("__"):

/opt/conda/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

/opt/conda/lib/python3.6/site-packages/covsirphy/ode/sirfv.py in <module>
      3 
      4 import numpy as np
----> 5 from covsirphy.ode.mbase import ModelBase
      6 
      7 

/opt/conda/lib/python3.6/site-packages/covsirphy/ode/mbase.py in <module>
      3 
      4 import numpy as np
----> 5 from covsirphy.ode.mbasecom import ModelBaseCommon
      6 
      7 

/opt/conda/lib/python3.6/site-packages/covsirphy/ode/mbasecom.py in <module>
      2 # -*- coding: utf-8 -*-
      3 
----> 4 from covsirphy.cleaning.term import Term
      5 
      6 

/opt/conda/lib/python3.6/site-packages/covsirphy/cleaning/__init__.py in <module>
     13 
     14 for m in modules:
---> 15     m_imported = import_module(f"{__name__}.{m.stem}")
     16     for (k, v) in m_imported.__dict__.items():
     17         if not k.startswith("__"):

/opt/conda/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

/opt/conda/lib/python3.6/site-packages/covsirphy/cleaning/example_data.py in <module>
      4 import pandas as pd
      5 from covsirphy.cleaning.jhu_data import JHUData
----> 6 from covsirphy.analysis.simulator import ODESimulator
      7 from covsirphy.ode.mbase import ModelBase
      8 

/opt/conda/lib/python3.6/site-packages/covsirphy/analysis/simulator.py in <module>
      7 from scipy.integrate import solve_ivp
      8 from covsirphy.cleaning.term import Term
----> 9 from covsirphy.ode.mbase import ModelBase
     10 
     11 

ImportError: cannot import name 'ModelBase'

With importing two times, importing was successfully completed.

OSError when trying update input folder in Kaggle

Dear Rakesh(@SM-ins),
Thank you for your feed-back!

cs.Population does not input the CSV files directly downloaded from THE WORLD BANK. We need DataLoader class to use them.
Please run the following codes with the latest version (>2.3.0).
data_loader = cs.DataLoader("../input")
pop_data = data_loader.population()
pop_data.cleaned().tail()
Type of pop_data is equal to cs.Population and "locations_population.csv" will be saved in "../input" directory. This is different from "API_EN.POP.DNST_DS2_en_csv_v2.csv", but well organized.

Best Regards,
Lisphilar

Dear Lisphilar,
Thankyou so much for the response.

i tried to learn Your following code:-

data_loader = cs.DataLoader("../input")

pop_data = data_loader.population()
pop_data.cleaned().tail()

but now i am getting a new error:-
OSError: [Errno 30] Read-only file system: '/kaggle/input/locations_population.csv'

With regards,
Rakesh

Originally posted by @SM-ins in #42 (comment)

Change of error description: dataset does not have expected columns

I will change the error statement.
Before:
ParserError: Error tokenizing data. C error: Expected 3 fields in line 5, saw 62
After:
KeyError: Raw data of PopulationData must have Country.Region, Province.State, Population but not included.

Originally posted by @lisphilar in #42 (comment)

Speed-up of ODESimulator using numba.njit

Summary:
Scenario.estimate() is time-consuming and uses ODESimulator many times. To accerate ODESimulator, consider to use numba package.

CovsirPhy version 2.4.1

Related classes:

covsirphy.ODESumulator
covsirphy.Scenario

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
scenario = cs.Scenario(jhu_data, population_data, "Japan")
scenario.trend()
scenario.estimate(cs.SIRF)

scenario.estimate(cs.SIRF) takes 3-5 minutes.

Environment:
Python 3.8, pipenv, WSL.

How to replace JHU data with country-wise data (India) and error in cleaning country level datasets

Dear @lisphilar ,

while running the following code,i am getting mentioned error:-

ind_data = cs.CountryData("/kaggle/input/covid19-in-india/covid_19_india.csv",Country="India")
ind_data.set_variables(
date="Date", confirmed="Positive", fatal="Fatal", recovered="Discharged", province=None
)
ind_data.cleaned().tail()

TypeError: init() got an unexpected keyword argument 'Country'

I just replace the country by India in place of japan and change the file path too...,but in spite of that I am getting above error..,will you please help..

With regards,
Rakesh

population_data.value() returns total value of all records in one area

Summary:
population_data.value() returns total value of all records in one area, not the last value.

CovsirPhy version 2.4.1

Related classes:

covsirphy.PopulationData

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
population_data.value("Italy")

This code returns 21279409008 and greater than the population value of Italy.

Environment:
Python 3.8, pipenv, WSL.

Cleaned dataset of country-specific data is empty

Summary:
cs.CountryData.cleand() needs to return an un-empty dataframe, but returns an empty dataframe.

Version 2.1.0

Relatedc classes:

covsirphy.cleaning.country_data.CountryData

Code:

import covsirphy as cs
jpn_data = cs.CountryData("input/covid_jpn_total.csv", country="Japan")
jpn_data.set_variables(
    date="Date", confirmed="Positive", fatal="Fatal", recovered="Discharged"
)
print(jpn_data.cleaned())

This code returns an empty dataframe.

Environment:
Python 3.8, pipenv, WSL

Add example dataset to this repository

Is your feature request related to a problem? Please describe.
To try this package, it is necessary to prepare a dataset in advance. This sometimes prevent new users to use this package.

Describe the solution you'd like
Include Japanese dataset to this package.
Kaggle: COVID-19 dataset in Japan is maintained by me and this can be included in this repository.

PopulationData.value(): add "date" argument

Is your feature request related to a problem? Please describe.
Population values may change in the near future and COVID-19 Data Hub includes the population values for each date. "Date" argument will be useful for phase-dependent analysis.

Describe the solution you'd like
Add "date" argument PopulationData.value() and this method returns the value of the date.
Default value of date will None and None means the last date.

Error in find change points with CovsirPhy 2.5.4-alpha

Dear Lisphilar,
Thankyou for the updates.As You have made a single scenario phase of India,we are unable to predict the future data...,I am getting following error now while executing predicting model.

ind_scenario.clear()
ind_scenario.add_phase(days=7)
ind_scenario.simulate().tail(7).style.background_gradient(axis=0)

NameError: Initial value of Susceptible must be specified in @y0_dict.

Please check it once...,if possible..

Thankyou.

Make input.sh compatible with all OSes (re-write input.sh using python)

Is your feature request related to a problem? Please describe.
Currently, input.sh works for Ubuntu (it might work on MacOS if SVN is available but I did not test it), however it can definitely not be used for Windows.

Describe the solution you'd like
input.sh could be written in python which would make it possible to execute it using any OS as long as the python environment is properly setup.

Set random seed of hyperparameter optimization

Is your feature request related to a problem? Please describe.
For reproducibility, CovsirPhy needs to set random seed of hyperparameter optimization.

Describe the solution you'd like
Add argument seed to covsirphy.ChangeFinder.run() and covsirphy.Estimator.run(), and set the seed to Optuna package.

Describe alternatives you've considered
Repeat optimization until the results will be constant.

Keep track parameter values/reproductive number of all countries

Is your feature request related to a problem? Please describe.
A simple codes keep track values of parameter and reproductive number of all countries.

Describe the solution you'd like

Method of JHUData to get the country list.
Create a class to track parameter values of all countries.

[New] dataset of Population pyramid

What dataset you need?
Population pyramid dataset needs to include the population values per ages in each country.

How will you use the dataset for your analysis?
As mentioned in #53 and my Kaggle Notebook, population pyramid data is useful to analyse beta/rho parameter of SIR-F model.

Some phases have small number of days (<=2) suggested by ChangeFinder

Summary:
`` needs to return ... but returns ...

CovsirPhy version

Related classes:

covsirphy.
(optional)

Codes and outputs:

This code returns

Environment:
Python 3.8, pipenv, WSL.

Hi Lisphilar,
i am getting following error while running the below code:-

ind_scenario.trend()

ValueError: @end_date must be over 23Jun2020.

Please help me out if possible...

With regards,
Rakesh

India scenario analysis: Description of KeyError when Scenario.param_history()

Dear @lisphilar ,

Initialy the code,i.e. _ = ind_scenario.param_history(targets=["rho", "sigma"]).T was working but now it's throwing following error..,please help me if you wish..

KeyError: '@targets must be selected in Population, Rt.'

With regards,
Rakesh

Upating of dataset in Kaggle

Dear @lisphilar ,

Please help me in updating date in following scenario's....,actually i want to add date as per my wish..

1)ind_scenario = cs.Scenario(jhu_data, pop_data, "India")
ind_scenario.records().tail()

2)ind_scenario.trend()

With regards,
Rakesh

India scenario analysis: records from 10Jun2020 was not included in analysis

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.
Dear Lisphilar,

is it possible to divide the phases of India from 25th march to current date,i.e. till 17th July..in scenario analysis..,please help me if possible..

With regards,
Rakesh

Last date of ODE simulation does not match end date of the last phase

Summary:
Scenario.simulate() needs to simulate until the end date of last phase. However, the last date of the dataframe returned by Scenario.simulate() does not match the end date of the last phase.

Version 2.0.2

Related classes:

covsirphy.analysis.scenario.Scenario
covsirphy.analysis.simulator.ODESimulator

Code/output 1:

import covsirphy as cs
# Read dataset
jhu_data = cs.JHUData("input/covid_19_data.csv")
pop_data = cs.Population("input/locations_population.csv")
# Set phase
ita_scenario = cs.Scenario(jhu_data, pop_data, country="Italy")
ita_scenario.trend(n_points=4, set_phases=True)
ita_scenario.add_phase(end_date="31Dec2020")
# Show the end date of the last phase
print(ita_scenario.get("End", phase="last"))

This returns "31Dec2020"

Code/output 2:

# Hyper parameter estimation
ita_scenario.estimate(cs.SIRF)
# Simulation
pred_df = ita_scenario.simulate()
print(pred_df.loc[pred_df.index[-1], "Date"])

This returns "27Sep2020" etc.

Environment:
Python 3.8, pipenv, WSL

TypeError of scenario.param_history(show_box_plot=False)

Summary:
scenario.param_history(show_box_plot=False)

CovsirPhy version 2.4.1

Related classes:

covsirphy.Scenario

Codes and outputs:

import covsirphy as cs
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
scenario = cs.Scenario(jhu_data, population_data, country="Japan")
scenario.trend()
scenario.estimate(cs.SIRF)
scenario.param_history(targets=["Rt"], divide_by_first=False, show_box_plot=False)

This code raises TypeError: line_plot() got an unexpected keyword argument 'show_figure'

Environment:
Python 3.8, pipenv, WSL.

Change data source: the number of cases, JHU to COVID-19 Data Hub

We are using JHU dataset with cs.DataLoader.jhu() and cs.JHUData() now.
However, this dataset has critical errors (e.g. Italy: Confirmed=241184 and Recovered=11811 on 03Jul2020, Recovered << Confirmed) and the errors may not be corrected. So, we need change the source data that is maintained.

I found COVID-19 Data Hub and this has Python Interface.
We can retrieve the datasets as follows.

pip install covid19dh

import covid19dh
# Country level
country_df = covid19dh.covid19(country=None, level=1, verbose=False)
# For some countries, province-level data is included
province_df = covid19dh.covid19(country=None, level=2, verbose=False)
# List of citation
covid19dh.cite(country_df)

OxCGRT data and population values are included in this dataset.

In the next version, I will try to change the data source.
Thank you.

Upating of dataset in Kaggle

Dear @lisphilar ,

Please help me in updating date in following scenario's....,actually i want to add date as per my wish..

1)ind_scenario = cs.Scenario(jhu_data, pop_data, "India")
ind_scenario.records().tail()

2)ind_scenario.trend()

With regards,
Rakesh

Low accuracy of parameter estimation for SIR-FV and SEWIR-F model

Summary:
Accuracy of parameter optimization is high for SIR, SIR-D, SIR-F model (RMSLE scores are about 0.1), but that is low for SIR-FV and SEWIR-F model (RMSLE scores are about 30).

CovsirPhy version 2.2.1

Related classes:

covsirphy.SIRFV
covsirphy.SEWIRF

Codes:
Codes are in example/sirfv_model.py and example/sewirf_model.py

Environment:
Python 3.8, pipenv, WSL.

How to use template data: The number of days go out

Dear Lisphilar(@lisphilar ),

i am getting keyError as "others" in # @marcoferrante estimation's code...,Please help me out if possible...

With regards,
Rakesh

PopulationData.update() add population values without init

Summary:
If population value has been registered, PopulationData.update() does not register new value correctly.

CovsirPhy version 2.4.1

Related classes:

covsirphy.PopulationData

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
# Update population value
population_data.update(‎126_180_643, country="Japan")
population_data.value("Japan")

This code does not retun ‎126180643

Environment:
Python 3.8, pipenv, WSL.

End date of Scenario.simulate() does not match the end date of a phase

Summary:
End date of Scenario.simulate() does not match the end date of a phase.

CovsirPhy version 2.4.2

Related classes:

covsirphy.Scenario

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
scenario = cs.Scenario(jhu_data, population_data, country="Italy")
scenario.trend()
scenario.estimate(cs.SIRF)
scenario.add_phase(end_date="01Jan2020")
scenario.summary()
scenario.simulate()

The last date of summary was 01Jan2020, but the last date of simulated records was 22Dec2020.

Environment:
Python 3.8, pipenv, WSL.

DataLoader failed in saving CSV files

Summary:
DataLoader failed in the dataset retrieved from COVID-19 Data Hub.

CovsirPhy version 2.4.0

Related classes:

covsirphy.DataLoader

Codes and outputs:

import covsirphy as cs
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()

This code should save a CSV file in "input" directory, but failed.

Environment:
Python 3.8, pipenv, WSL.

Automatic downloading of dataset: total population

Is your feature request related to a problem? Please describe.
As mensioned in #26 , this package needs to include a data loader which enable us to download the datasets automatically. In this issue, dataset about "total population" will be discussed.

Describe the solution you'd like

Find/create a dataset about "total population of each country"
Create a Python class for automatic downloading

Dataset design

Downloading the dataset does NOT request API keys, including Kaggle API keys.
The dataset must include country names and population values.
The dataset should include province names because CovsirPhy.Population uses province field.

Dear Rakesh(@SM-ins),

Dear Rakesh(@SM-ins),
Thank you for your feed-back!

cs.Population does not input the CSV files directly downloaded from THE WORLD BANK. We need DataLoader class to use them.
Please run the following codes with the latest version (>2.3.0).

data_loader = cs.DataLoader("../input")
pop_data = data_loader.population()
pop_data.cleaned().tail()

Type of pop_data is equal to cs.Population and "locations_population.csv" will be saved in "../input" directory. This is different from "API_EN.POP.DNST_DS2_en_csv_v2.csv", but well organized.

Best Regards,
Lisphilar

Originally posted by @lisphilar in #42 (comment)

lisphilar / covid19-sir Goto Github PK

covid19-sir's Introduction

Products

COVID-19 data analysis

Scenario files of Sengokushi (Japanese history, written in Japanese)

Stats

covid19-sir's People

Contributors

Stargazers

Watchers

Forkers

covid19-sir's Issues

Recommend Projects

Recommend Topics

Recommend Org