rmnldwg / lyscripts Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 2.0 2.55 MB

Scripts that are used in the pipelines of the lycosystem

Home Page: https://lyscripts.readthedocs.io

License: MIT License

Python 100.00%

cli machine-learning python scripts

lyscripts's Introduction

Hey, I'm Roman 👋

🔭 Working on probabilistic models to predict how cancer spreads
👯 Interested in collaborating on datasets of lymphatic progression patterns in head & neck cancer
💬 Always happy to hear feedback on our interactive Lymphatic Progression eXplorer (LyProX)

📚🔍 Research fields

I am a PostDoc in the medical physics research group of Prof. Jan Unkelbach at the University Zurich and the University Hospital Zurich.

In our main project, we try to model the risk for metastases in the lymph system of patients with squamous cell carcinomas in the head & neck region. You can read more on that in an excellent paper by a PostDoc in our group: Pouymayou et al. You can also check out our code for the lymph model, which is a python package containing the code to learn and compute this risk of lymphatic metastases using Bayesian networks (mentioned paper) and also - this is new - hidden Markov models (Ludwig et al).

Another project deals with optimal fractionation schemes. Fractionation is the splitting of a prescribed dose of radiation designed to kill cancer cells in a tumor into multiple sessions to allow the healthy parts of the body to recover better. Innovative technologies like the MR-LinAc at our institution enable us to tackle this problem with reinforcement learning

🔭 Topics I'm interested in

probabilistic models
interpretable machine learning methods
statistical learning theory

and also (though not necessarily research-related)

🌌 (theoretical) astrophysics (I did my master in this group)
web development
open source

🛠️ Tech Stack


Writing
Coding
Dev
Software
Learning

Thanks a lot for reading 😃

📫 In case you want to reach me: [email protected]

lyscripts's People

Contributors

Stargazers

Watchers

Forkers

larstwi julianbro

lyscripts's Issues

make predict scripts more modular

The two scripts prevalence and risk are pretty messy functions and should be cleaned up.

plot function `draw` overwrites some `hist_kwargs`

Because the hist_kwargs are updated in the wrong order, histogram settings such as the alpha value cannot globally be changed from their hardcoded values.

add command to preprocess raw data into LyProX format

When we receive raw data from our collaborators, it needs to be preprocessed before we can upload it to LyProX. For that a dedicated general script would be useful that only parses some column mapping instructions for any given raw data.

colliding modalities

Both the enhance script and the sample script (and maybe some more) access the key modalities in the params.yaml file. However, they use it for different purposes: The former combines all defined modalities into "consensus" diagnoses, while the sampling program uses all defined modalities for inference.

This clash needs to be resolved. An idea would be to use different lists of modalities for the different scripts or have it provided to the scripts via an optional argument.

refactor scripts into library

I would like to refactor the scripts as they are into libraries of atomic, reusable functions that I can then again combine into versatile and declarative scripts.

Compatibility with lymph v1.0

Make this package compatible with version 1.0 of lymph-model.

This will represent a major version change!

add precompute commands

For speedier computation of risks and prevalences, it can make sense to precompute prior and posterior state distributions. The new lymph API allows that via the methods state_dist() and posterior_state_dist(). Thus, this package should take advantage of that.

`enhance` command not deterministic

The output of the lyscripts data enhance command is not fully deterministic: The order of the columns varies from run to run. As far as I can tell, the content remains the same. Nonetheless, this is annoying and should be fixed.

wrapped function in `rich` context

When a function that is wrapped in the report_state decorator gets called inside a report.status context, a rich.errors.LiveError gets raised. It complains about two "live displays" running at the same time.

So, I should make sure that all such wrapped functions stand on their own and not inside rich contexts.

prevalence raises error about midext when using `Unilateral`

When running the prevalence prediction with the Unilateral model, an error is raised that says the information about the midline extension is missing.

logical OR consensus seems wrong

Comparing the results of the logical OR consensus computed by lyscripts to what LyProX does, I see quite some differences.

Create docs by version

Right now, the documentation exists always only for the latest version. But it would be helpful too look at earlier version's docs. Maybe I can adapt the respective GitHub action to enable that.

See here for some ideas.

implement logging

It would be great to have the ability to log progress and intermediate results in a file. I think this should be quite straightforward with the use of rich, but extending this would also be nice.

make sampling deterministic

Use numpy's seed function before starting the sampling rounds so that by providing the same seed value can reproduce the same sampling round.

I am not sure this will work, as I think I have tried this before.

wrong indent length in nested markdown docs

The utility function generate_markdown_docs in the lyproxify.py file uses three spaces as depth of indentation for nested lists. This is wrong, it should be four spaces.

use PyPI version of `rich_argparse`

Since I have started working on this, the package rich_argparse has been published on PyPI. I should use this and maybe add some custom syntax highlighting.

Filter command

It could be useful to have a command that filters datasets based on some common features. E.g., filter based on tumor location, subsite, T-category, ...

Exporting histograms & plots to HDF5

Right now lyscripts contains utilities and commands to compute predicted and observed distributions over prevalences and risks. It also defines functions to plot these computed values, e.g. when the computations have been stored as HDF5 files. What it is missing is methods to export these plots to re-read them later. This is sort of the missing link to efficiently use lyscripts as a library for computing and plotting histograms over risks and prevalences.

`enhance` fails when data is already enhanced

When an input file contains e.g. the columns max_llh and one wants to (re)compute the max_llh, it raises an exception when concatenating.

convergence sampling does not thin

When sampling until convergence, the script does not thin out the chain by e.g. keeping only every fifth sample. This leads to repeated values when a new proposal for a walker gets rejected multiple times and hence decreases the statistical power of subsequently computed values.

Ideally, the thin_by parameter that is read from the parameter YAML file and used in the TI procedure should also apply to the convergence sampling procedure.

sampling is uninformative

There are several issues with the information provided by the sampling script:

During burnin, it implies to know how long the sampling takes, but doesn't. It'd be better if it conveyed to the user that it samples until convergence.
There's always a warning about not finding the attribute _random in the ConvenienceSampler. I think this is due to a recent update of the emcee package.
When performing a thermodynamic integration, it only provides the value of the beta parameter. But also providing acceptance rates or maybe even mean and standard deviation of the sampled parameters would also be super helpful.

update type hints to Python 3.10

I am still using the old syntax for type hints. I can replace the typing type hints with the built-in version of Python 3.10 now.

split functionality into subcommands

The argparse package provides functionality for sub-commands, which I would like to implement, so that I could do stuff like

python -m lyscripts thermoint --args

and stuff like that. Essentially similar to how git has many subcommands.

use generators over samples

Instead of complicated custom enumerators inside functions that compute likelihoods, prevalences and risks, I could simply implement them as generators. In this way, I could set up progress bars and what not outside the function generating these values.

Traceback (most recent call last):
  File "/opt/anaconda3/envs/lynforigin/bin/lyscripts", line 8, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/lynforigin/lib/python3.8/site-packages/lyscripts/__init__.py", line 123, in main
    args.run_main(args)
  File "/opt/anaconda3/envs/lynforigin/lib/python3.8/site-packages/lyscripts/plot/histograms.py", line 124, in main
    draw(
  File "/opt/anaconda3/envs/lynforigin/lib/python3.8/site-packages/lyscripts/plot/utils.py", line 270, in draw
    axes.hist(content.values, **tmp_hist_kwargs)
  File "/opt/anaconda3/envs/lynforigin/lib/python3.8/site-packages/matplotlib/__init__.py", line 1446, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/opt/anaconda3/envs/lynforigin/lib/python3.8/site-packages/matplotlib/axes/_axes.py", line 6944, in hist
    p._internal_update(kwargs)
  File "/opt/anaconda3/envs/lynforigin/lib/python3.8/site-packages/matplotlib/artist.py", line 1223, in _internal_update
    return self._update_props(
  File "/opt/anaconda3/envs/lynforigin/lib/python3.8/site-packages/matplotlib/artist.py", line 1197, in _update_props
    raise AttributeError(
AttributeError: Polygon.set() got an unexpected keyword argument 'kwargs'

Used command: lyscripts plot histograms models/prevalences.hdf5 plots/hist_prev_ipsiI.png --names ipsiI/early ipsiI/late

Lyscripts version: 0.7.2
Lymph version: 0.4.3

Used files: used_files.zip

Workaround: Use lyscripts version 0.5.11 to plot the histograms.