hrossman / pymsm Goto Github PK

License: MIT License

Python 42.63% Jupyter Notebook 55.87% TeX 1.50%

pymsm's Introduction

Multistate competing risk models in Python

Read the Docs
For details, read the JOSS paper

Hagai Rossman, Ayya Keshet, Malka Gorfine 2022

PyMSM is a Python package for fitting competing risks and multistate models, with a simple API which allows user-defined model, predictions at a single or population sample level, statistical summaries and figures.

Features include:

Fit a Competing risks Multistate model based on survival analysis (time-to-event) models.
Deals with right censoring, competing events, recurrent events, left truncation, and time-dependent covariates.
Run Monte-carlo simulations for paths emitted by the trained model and extract various summary statistics and plots.
Load or configure a pre-defined model and run path simulations.
Modularity and compatibility for different time-to-event models such as Survival Forests and other custom models.

Installation

pip install pymsm

Requires Python >=3.8.

Alternatively if you want to work with the latest development version, you can also directly install it from GitHub. To do that, you will need to

Clone the repository to REPO_FOLDER (choose your own location)
Got to the location of the repository cd $REPO_FOLDER (Note! Not the pymsm folder, but the one above)
Run pip install -e pymsm. This imports the package for your Python interpreter

Quick example

# Load data (See Rotterdam example for full details)
from pymsm.datasets import prep_rotterdam
dataset, states_labels = prep_rotterdam()

# Define terminal states
terminal_states = [3]

#Init MultistateModel
from pymsm.multi_state_competing_risks_model import MultiStateModel
multi_state_model = MultiStateModel(dataset,terminal_states)

# Fit model to data
multi_state_model.fit()

# Run Monte-Carlo simulation and sample paths
mcs = multi_state_model.run_monte_carlo_simulation(
              sample_covariates = dataset[0].covariates.values,
              origin_state = 1,
              current_time = 0,
              max_transitions = 2,
              n_random_samples = 10,
              print_paths=True)

    stateDiagram-v2
    s1 : (1) Primary surgery
    s2 : (2) Disease recurrence
    s3 : (3) Death
    s1 --> s2: 1518 
    s1 --> s3: 195 
    s2 --> s3: 1077

Full examples

Background and Motivation

Multi-state data are common, and could be used to describe trajectories in diverse health applications; such as describing a patient's progression through disease stages or a patient’s path through different hospitalization states. When faced with such data, a researcher or clinician might seek to characterize the possible transitions between states, their occurrence probabilities, or to predict the trajectory of future patients - all conditioned on various baseline and time-varying individual covariates. By fitting a multi-state model, we can learn the hazard for each specific transition, which would later be used to predict future paths. Predicting paths could be used at a single patient level, for example predict how long until a cancer patient will be relapse-free given his current health status, or at what probability will a patient end a trajectory at any of the possible states; and at the population level, for example predicting how many patients which arrive at the emergency-room will need to be admitted, given their covariates.

Capabilities

PyMSM is a Python package for fitting multi-state models, with a simple API which allows user-defined models, predictions at a single or population sample level, and statistical summaries and figures. Features of this software include:

Fitting a Competing risks Multistate model based on various types of survival analysis (time-to-event) such as Cox proportional hazards models or machine learning models, while taking into account right censoring, competing events, recurrent events, left truncation, and time-dependent covariates.
Running Monte-carlo simulations (in parallel computation) for paths emitted by the trained model and extracting various summary statistics and plots.
Loading or configuring a pre-defined model and generating simulated data in terms of random paths using model parameters, which could be highly useful as a research tool.
Modularity and compatibility for different time-to-event models such as Survival Forests and other custom ML models provided by the user.
The package is designed to allow modular usage by both experienced researchers and non-expert users. In addition to fitting a multi-state model for a given data - PyMSM allows the user to simulate trajectories, thus creating a multi-state data-set, from a predefined model. This could be a valuable research tool - both for sharing sensitive simulated individual data and as a tool for any downstream task which needs individual trajectories.

Citation

If you found this library useful in academic research, please cite:

@article{Rossman2022, doi = {10.21105/joss.04566},
url = {https://doi.org/10.21105/joss.04566},
year = {2022},
author = {Hagai Rossman and Ayya Keshet and Malka Gorfine},
title = {PyMSM: Python package for Competing Risks and Multi-State models for Survival Data},
journal = {Journal of Open Source Software} }

Also consider starring the project on GitHub

This project is based on methods first introduced by the authors of Roimi et. al. 2021.
Original R code by Jonathan Somer, Asaf Ben Arie, Rom Gutman, Uri Shalit & Malka Gorfine available here. Also see Rossman & Meir et. al. 2021 for an application of this model on COVID-19 hospitalizations data.

pymsm's People

Contributors

Stargazers

Watchers

Forkers

stefanocoretta yinpuli hertera1 alfredojf

pymsm's Issues

Broken links in README

Hi! I am one of the reviewers for your JOSS paper. I will be adding comments as issues in the next hours/days.

I tried following the links to the full examples on the README but they are dead.

Docs and onboarding material

Some of these are easy tasks, some are just suggestions to provide better docs.

The github action badges on the README.md point to the .svg image, but should point to the action (ex: https://github.com/hrossman/pymsm/actions/workflows/tests.yml instead of https://github.com/hrossman/pymsm/actions/workflows/tests.yml/badge.svg)
Running the code here did give an error at the end (a lifelines convergence error - can these be caught and report back to the user something more useful?)
In this intro, can you describe what a survival MSM is and why one might want to use it? How does it fit into the survival analysis toolkit?
Maybe a quick explanation or interpretation of the plot on this page. How should I read it?
In examining a model, what should I, the naive reader, be looking for? (BTW I like the color scheme you picked for the stackplot)
The full examples are nice. I think you could add more context about what the dataset is describing, how you are modelling it / choices you make, and conclusions you (the experts) draw from it).
Some broken latex on this page

Overall, I like your docs! I would like more emphasis on educating newbies. Y'all are experts, teach me!

Paper feedback

typo in Costume -> Custom
How does it compare to software outside of Python, too?

Community guidelines

From the checklist:

Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Typically project authors will include a CONTRIBUTING.md with this information.

Definition of time_entry_to_origin and time_transition_to_target

Dear Sir/Madam,

I am confused by the denifition of time_entry_to_origin and time_transition_to_target in some of your examples, especially time_transition_to_target. It seems that time_entry_to_origin can be the the time difference between target state and baseline.

Is time_transition_to_target the time difference between target state and origin state OR the time difference between target state and baseline?

Unordered path on Montecarlo simulations.

Dear all,

Thanks for making this package available.

While running a Montecarlo simulation with model.run_monte_carlo_simultion, I found that the output trajectories are either unsorted or the times represents the time between events.

As an example:

    simulated_paths = model.run_monte_carlo_simulation(
        sample_covariates=covariates,
        origin_state=0,
        current_time=0,
        n_random_samples=100,
        max_transitions=3,
        n_jobs=5,
        print_paths=True,
    )

the outcome is

States: [3, 4, 6]
Transition times: [28.000029122914018, 912.999970877086, 1451.0]
States: [3, 6]
Transition times: [32.000013949386066, 2359.999986050614]
States: [3, 4, 1]
Transition times: [26.000044015249372, 425.9999559847506, 366.0]

in the last of the path reported above, last time_at_each_state is lower than the preceding ($366<425$).
Does this imply:

the output of the simulation in path.states are unsorted and need to be sorted according to path.time_at_each_state;
the times in path.time_at_each_state represent the time between states, hence we need to compute the cumulative time ourselves.

Finally I'm doubting about the structure of the input date for Pathobject: in states I read:

States visited (encoded as positive integers, 0 is saved for censoring), in the order visited. Defaults to None.

hence I suppose that time_at_each_state should also be ordered but maybe it should be a time difference between states?

Automated tests

👋 Hi folks, I'm the reviewer for your JOSS submission. I'll be making issues in this repo for my feedback.

Currently, github actions will only run automated tests (CI) on pull requests. I would suggest making this on pull_request and push. Contributors will push to the main branch, and you want CI to run for these commits, too.
There is installation instructions to install from pypi, but what about locally? I git pulled the repo, ran pip3 install -e ., and the py.test to test. That can be documented in a section on the README or the docs.
tests in test_msm_examples.py and test_sim.py should assert some output. There should be a reason you are running a test - assert that.
This can be fixed (found running py.test on my machine, py3.9.6)

src/pymsm/datasets/__init__.py:319
  /Users/camerondavidson-pilon/code/pymsm/src/pymsm/datasets/__init__.py:319: DeprecationWarning: invalid escape sequence \R
    1: "Discharged\Recovered",

If possible, try to fix the lifelines warnings that show up. Not always possible, I know.
stepfunc seems like an important function, but has lots of potential edge cases. This would be a good function to test as well.

Error message

I have prepared a pandas dataframe according to the documentation https://hrossman.github.io/pymsm/usage/Preparing_a_dataset/#preparing-a-dataset-for-multistate-modeling-with-pymsm but I receive this error message when I try to fit the model:

Init MultistateModel

from pymsm.multi_state_competing_risks_model import MultiStateModel
multi_state_model = MultiStateModel(
dataset=dfML,
covariate_names=covariate_cols,
terminal_states=terminal_states)

AttributeError Traceback (most recent call last)
Cell In[64], line 3
1 # Init MultistateModel
2 from pymsm.multi_state_competing_risks_model import MultiStateModel
----> 3 multi_state_model = MultiStateModel(
4 dataset=dfML,
5 covariate_names=covariate_cols,
6 terminal_states=terminal_states)

File ~\AppData\Roaming\Python\Python311\site-packages\pymsm\multi_state_competing_risks_model.py:127, in MultiStateModel.init(self, dataset, terminal_states, update_covariates_fn, covariate_names, event_specific_fitter, competing_risk_data_format, state_labels, trim_transitions_threshold)
125 self._trim_transitions()
126 else:
--> 127 self._assert_valid_input()

File ~\AppData\Roaming\Python\Python311\site-packages\pymsm\multi_state_competing_risks_model.py:156, in MultiStateModel._assert_valid_input(self)
154 # Check the number of times is either equal or one less than the number of states
155 for obj in self.dataset:
--> 156 n_states = len(obj.states)
157 n_times = len(obj.time_at_each_state)
158 assert n_states == n_times or n_states == n_times + 1

AttributeError: 'str' object has no attribute 'states'