Giter VIP home page Giter VIP logo

libpetab-python's People

Contributors

cthoyt avatar dantongwang avatar dependabot[bot] avatar dilpath avatar dweindl avatar elbaraim avatar erikadudki avatar fbergmann avatar ffroehlich avatar janhasenauer avatar jvanhoefer avatar larafuhrmann avatar lcontento avatar leonardschmiester avatar loosc avatar m-philipps avatar merktsimon avatar paulflang avatar pauljonasjost avatar paulstapor avatar plakrisenko avatar yannikschaelte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libpetab-python's Issues

Simulation is not plotted correctly using visualization specification

Hey,

I am using PEtab in the develop branch on a WSL.

I tried to plot some simulations and experimental data using the visualization specification.
These are the files I used:
issue.zip

an here is the code:

from petab.visualize import plot_data_and_simulation
import matplotlib.pyplot as plt


ax = plot_data_and_simulation(data_file_path,
                              condition_file_path,
                              visualization_file_path,
                              simulation_file_path)

It returns:
wrong
The simulations are not correctly plotted.

If in PEtab/petab/visualize/plot_data_and_simulation.py the lines 138 and 139 are commented out, the simulations are correctly plotted:
correct

I am not sure, what the use of lines 138 and 139 is. But at least for my model they seem to be problematic, although I had no problem with re-plotting the test models Isensee and Fujita locally.

Errorbars not directly plotted

Good morning,

I am using PEtab in the develop branch on a WSL.

I tried to plot the experimental data with error bars using the visualization specification.
These are the files I used:
issue.zip

and here is the code:

from petab.visualize import plot_data_and_simulation
import matplotlib.pyplot as plt


ax = plot_data_and_simulation(data_file_path,
                              condition_file_path,
                              visualization_file_path,
                              simulation_file_path)

The noise for each datapoint is directly given in the measurement table, so I chose plotTypeData = provided in the visualization specification file. Still no errorbars appear:

wrong

To plot the errorbars plotted_noise='provided' has to be added to the plot_data_and_simulation function:
correct

As the errorbars for e.g. plotTypeData = MeanAndSD are also directly plotted without that any additional argument has to be set. I think it would be nice to implement this for provided aswell or instead mention in the example notebook/documentation the need to set the plotted_noise argument.

Add: Allow plotting by observableID & simCondID at the same time

Could in some cases be useful to plot e.g. a single observable for only a few conditions.
So far an error rises: "Plotting without visualization specification file and datasetId can be performed via grouping by simulation conditions OR observables, but not both. Stopping."

Cleanup visualization docstrings

In petab/visualize/*:

  • Ensure things look fine with sphinx (e.g. cd doc; make html; firefox build/html/index.html)
  • Add typehints and remove types from docstrings
  • Document arguments
  • Document return types

visu-error if dataset_id not given, but yValues, yValues need to be sorted alphabetically

If in the visualization-specification file datasetId is not given, but yValues is given, the yValues need to be sorted alphabetically, otherwise the following error is thrown:

Traceback (most recent call last):
  File "/home/erika/Documents/Python/PEtab_my_files/visu_test.py", line 37, in <module>
    simulation_file_path
  File "/home/erika/Documents/env-petabpypesto37/lib/python3.7/site-packages/petab/visualize/plot_data_and_simulation.py", line 136, in plot_data_and_simulation
    plotted_noise)
  File "/home/erika/Documents/env-petabpypesto37/lib/python3.7/site-packages/petab/visualize/helper_functions.py", line 572, in create_or_update_vis_spec
    vis_spec = expand_vis_spec_settings(vis_spec, columns_dict)
  File "/home/erika/Documents/env-petabpypesto37/lib/python3.7/site-packages/petab/visualize/helper_functions.py", line 487, in expand_vis_spec_settings
    vis_spec[select_conditions].loc[:, column].values[0])
IndexError: index 0 is out of bounds for axis 0 with size 0

Process finished with exit code 1

This is because

obs_uni = list(np.unique(exp_data[OBSERVABLE_ID]))

(helper_functions.py, line 446)
sorts observables alphabetically.
Would be good, if it's not mandatory that yValues are sorted alphabetically.

Should unspecified optional strings be the empty string or NaN?

At the moment, there can be NaNs (after pd.read_csv) in optional PEtab string columns, such as observableNames, that, if interpreted as a string, are converted to the string literal 'nan'.

>>> import numpy as np
>>> str(np.nan)
'nan'

An issue can occur in the AMICI plotting functions. This issue can be fixed by replacing

elif model.getObservableNames()[iy] != '':

with

elif model.getObservableNames()[iy] in ['', 'nan']:

to correctly identify unspecified observable names. However, testing for the string 'nan' seems unintuitive, and this fix might cause another issue if an observable is named 'nan'.

Here's a solution, which could be implemented in PEtab, and might resolve the issue in AMICI.

$ cat test_str.csv
observableId    observableName
a_id    a_name
b_id    
>>> import pandas as pd
>>> df1 = pd.read_csv('test_str.csv', sep='\t')
>>> df2 = pd.read_csv('test_str.csv', sep='\t')
>>> df2['observableName'] = df2['observableName'].fillna('')
>>> df1
  observableId observableName
0         a_id         a_name
1         b_id            NaN
>>> df2
  observableId observableName
0         a_id         a_name
1         b_id               

pandas 1.2.x requirement?

I just tried installing petab in the colab environment, unfortunately that failed, since colab only seems to work with pandas up to 1.1.5. Any chance we could lower our requirement here?

Collecting petab
  Downloading petab-0.1.20-py3-none-any.whl (84 kB)
...
Collecting pandas>=1.2.0
  Downloading pandas-1.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
     |████████████████████████████████| 11.3 MB 12.6 MB/s 
...
Installing collected packages: pandas, colorama, petab
  Attempting uninstall: pandas
    Found existing installation: pandas 1.1.5
    Uninstalling pandas-1.1.5:
      Successfully uninstalled pandas-1.1.5
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires pandas~=1.1.0; python_version >= "3.0", but you have pandas 1.3.3 which is incompatible.
Successfully installed colorama-0.4.4 pandas-1.3.3 petab-0.1.20
WARNING: The following packages were previously imported in this runtime:
  [pandas]
You must restart the runtime in order to use newly installed versions.

encoding issue while writing out experiment

If after reading the Sneyd_PNAS2002 model from the benchmark collection, i write it out again using the petab library, i get the following exception:

self = <encodings.cp1252.IncrementalEncoder object at 0x000001DBE8E097B8>
input = 'Ca_dose_response__1\t10 μM IP_3,  0.1 μM Ca^{2+}\t10.0\t0.1\r\r\n'
final = False

    def encode(self, input, final=False):
>       return codecs.charmap_encode(input,self.errors,encoding_table)[0]
E       UnicodeEncodeError: 'charmap' codec can't encode character '\u03bc' in position 23: character maps to <undefined>

cp1252.py:19: UnicodeEncodeError

it turns out the unit is given in special characters. Since PEtab had no problems reading it, maybe the writers should pass a utf8 encoding attribute when writing out the tables. Rather than just:

conditions.py:55: in write_condition_df
    df.to_csv(fh, sep='\t', index=True)

add: plotting STD if the column noiseParameter has values and no replicates available

e.g. the Chen model has specified noiseParameters in the measurement data, and no replicate measurement data is available. (so, standard deviation cant be calculated from replicates)
For plotting the standard deviation if no measurement replicates are available, and the column noiseParameters has values, plot these values as standard deviation.

writer functions do not perform sanity checks + use misleading options for writing tsv

When having dedicated writer functions such as petab.conditions.write_condition_df etc, I would actually expect that they perform some kind of sanity check on the output. Having pretty bare wrappers around pd.to_csv doesn't seem to helpful, especially since the writers use index=True, which is not consistent with the providede spec and, since the readers don't use index_col=0, leaves an Unnamed 0 column in the imported DataFrame.

Missing validity check for simulationConditionId.

Which problem would you like to address? Please describe.
petab.lint.lint_problem does not check whether simulationConditionIds in measurement_df are valid condition Ids.

Describe the solution you would like
petab.lint.lint_problem flags condition ids in measurement_df['simulationConditionId'] that are not defined in the condition table.

Describe alternatives you have considered
Actually specifying valid simulationConditionIds

Additional context
Add any other context about the request here.

add case in visualization routine: in measurementData file, if same simulationConditionId, but differing preequilibrationId etc.

in file 'get_data_to_plot.py', see ll. 92-102
TODO: Here not the case: So, if entries in measurement file: preequCondId, time, observableParams, noiseParams, observableTransf, noiseDistr are not the same, then -> differ these data into different groups!
now: go in simulationConditionId, search group of unique simulationConditionId
e.g. rows 0,6,12,18 share the same simulationCondId, then check if other column entries are the same (now: they are), then take intersection of rows 0,6,12,18 and checked other same columns (-> now: 0,6,12,18) and then go on with code.
if there is at some point a difference in other columns, say e.g. row 12,18 have different noiseParams than rows 0,6 the actual code would take rows 0,6 and forget about rows 12,18

Implement validation of PEtab visualization files

  • Implement consistency checks
    • plotId
    • [plotName]
    • plotTypeSimulation
    • plotTypeData
    • datasetId
    • [xValues]
    • [xOffset]
    • [xLabel]
    • [xScale]
    • [yValues]
    • [yOffset]
    • [yLabel]
    • [yScale]
    • [legendEntry]
  • Add as command line option to petablint

see also #1

observables but not species supported in `noiseFormula`

The documentations states that the noiseFormula in the observable table can be specified like so:
noiseParameter1_observable_pErk + noiseParameter2_observable_pErk*pErk

However, when I use a species in the observable formula, I get the following error:

TypeError                                 Traceback (most recent call last)
/media/sf_DPhil_Project/Project07_Parameter Fitting/PEtab/petab/calculate.py in evaluate_noise_formula(measurement, noise_formulas, parameter_df, simulation)
    187     try:
--> 188         noise_value = float(noise_value)
    189     except TypeError:

~/venvs/std/lib/python3.8/site-packages/sympy/core/expr.py in __float__(self)
    324             raise TypeError("can't convert complex to float")
--> 325         raise TypeError("can't convert expression to float")
    326 

TypeError: can't convert expression to float

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-11-8dc41b43303c> in <module>
     34 problem = core.DisFitProblem(PETAB_YAML)
     35 problem.write_jl_file()
---> 36 problem.optimize()
     37 problem.plot_results('c0', path='plot.pdf')
     38 problem.write_results()

/media/sf_DPhil_Project/Project07_Parameter Fitting/df_software/DisFit/DisFit/core.py in optimize(self)
    386         print(self.petab_problem.parameter_df)
    387 
--> 388         self._results['fval'] = -petab.calculate_llh(self.petab_problem.measurement_df.loc[:, cols],
    389             pd.concat([self.petab_problem.simulation_df.rename(columns={'measurement': 'simulation'}), ndf], axis=1),
    390             self.petab_problem.observable_df,

/media/sf_DPhil_Project/Project07_Parameter Fitting/PEtab/petab/calculate.py in calculate_llh(measurement_dfs, simulation_dfs, observable_dfs, parameter_dfs)
    272     for (measurement_df, simulation_df, observable_df, parameter_df) in zip(
    273             measurement_dfs, simulation_dfs, observable_dfs, parameter_dfs):
--> 274         _llh = calculate_llh_for_table(
    275             measurement_df, simulation_df, observable_df, parameter_df)
    276         llhs.append(_llh)

/media/sf_DPhil_Project/Project07_Parameter Fitting/PEtab/petab/calculate.py in calculate_llh_for_table(measurement_df, simulation_df, observable_df, parameter_df)
    314 
    315         # get noise standard deviation
--> 316         noise_value = evaluate_noise_formula(
    317             row, noise_formulas, parameter_df, petab.scale(simulation, scale))
    318 

/media/sf_DPhil_Project/Project07_Parameter Fitting/PEtab/petab/calculate.py in evaluate_noise_formula(measurement, noise_formulas, parameter_df, simulation)
    188         noise_value = float(noise_value)
    189     except TypeError:
--> 190         raise TypeError(
    191             f"Cannot replace all parameters in noise formula {noise_value} "
    192             f"for observable {observable_id}.")

TypeError: Cannot replace all parameters in noise formula 0.1*A + 0.5 for observable obs_a.

When I replace the species with an observable everything works fine.

Logic of petab.visualize.helper_functions.get_vis_spec_dependent_columns_dict

So I am currently running into the following issue:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1434, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "pretrain_per_sample.py", line 70, in <module>
    petab.visualize.plot_data_and_simulation(
  File "/Users/ffroehlich/Documents/HMS/mechanismEncoder/venv/lib/python3.8/site-packages/petab/visualize/plot_data_and_simulation.py", line 128, in plot_data_and_simulation
    exp_data, vis_spec = create_or_update_vis_spec(exp_data,
  File "/Users/ffroehlich/Documents/HMS/mechanismEncoder/venv/lib/python3.8/site-packages/petab/visualize/helper_functions.py", line 572, in create_or_update_vis_spec
    vis_spec = expand_vis_spec_settings(vis_spec, columns_dict)
  File "/Users/ffroehlich/Documents/HMS/mechanismEncoder/venv/lib/python3.8/site-packages/petab/visualize/helper_functions.py", line 487, in expand_vis_spec_settings
    vis_spec[select_conditions].loc[:, column].values[0])
IndexError: index 0 is out of bounds for axis 0 with size 0

I explicitly specify PLOT_ID but no DATASET_ID in my visualization table (if I do specify DATASET_ID, which is not specified in my measurements_df, petab will just show empty plots, ... bummer).

This means that I end up with https://github.com/PEtab-dev/PEtab/blob/c3d51d6ce98a533c686243dd9ba163276785fc44/petab/visualize/helper_functions.py#L558, which makes no effort at all to respect the previously defined plot_id_list, but instead creates a new set of plot ids of the form plot{i}. Those PLOT_ID of course don't match what's in my vis_spec, which means that the select_conditions in https://github.com/PEtab-dev/PEtab/blob/c3d51d6ce98a533c686243dd9ba163276785fc44/petab/visualize/helper_functions.py#L484 will never match anything.

I am at loss how this is conceptually supposed to work at all, so I would appreciate if anybody could enlighten me such that I can fix this issue.

visualization, multiple yValues in one plot

Documentation isn't very explicit about whether this should only be a single value or whether this can be multiple values (and if so in which format they should be passed).

plotting petab problem with simulations by observables does not work

Which problem would you like to address? Please describe.
Plotting a petab.Problem together with a simulation_df by observable_ids does not work properly: Simulation is plotted as constant zero.

plot_petab_problem(
    petab_problem,
    sim_data=simulation_df.rename(columns={"measurement": "simulation"}),
    observable_id_list=[["abs_pSCTF"]],
)

leads to

image

Describe the solution you would like
Simulations should be plotted by their values.

petab.visualize.* changes matplotlib rcParams

... this is at least a major inconvenience as it prevents users using their own styles, and therefore, should be changed.

Currently affects petab/visualize/plotter.py and petab/visualize/helper_functions.py.

Visualization: if column 'XValues' not provided and multiple conditions should be plotted automatically

If in the visualization table, the column XValues is not provided, it is inferred:

see conversation in PEtab-dev/PEtab#283.

Visualization: cropped error bars

It often happens that (in line plots), the errors bars are not completely visible. It seems, y-limits are set based on the measurement or simulation, ignoring the error bars. This should be changed. Happens at both, the lower and the upper end.

Implement validation CompositeProblem

  • All basic checks via lint_problem
  • basic parameter table checks
  • Special checks related to parameter table dependence on multiple model and measurement and condition files

Visualization: plotTypeData ' replicate ' not working

Which problem would you like to address? Please describe.
I try to plot individual replicates of the data.

  1. If I have the column plotTypeData with replicate in the visualization table, the data is still plotted as MeanAndSTD. You manually have to add plotted_noise='replicate' in:
plot_data_and_simulation(data_file_path,
condition_file_path,
visualization_file_path,
simulation_file_path, plotted_noise='replicate'
)
  1. Doing that, an error arises:
Traceback (most recent call last):
  File "/home/erika/Documents/Python/PEtab_my_files/Boehm_visu_test.py", line 21, in <module>
    simulation_file_path, plotted_noise='replicate'
  File "env-petabpypesto37/lib/python3.7/site-packages/petab/visualize/plot_data_and_simulation.py", line 165, in plot_data_and_simulation
    exp_conditions, sim_data)
  File "env-petabpypesto37/lib/python3.7/site-packages/petab/visualize/helper_functions.py", line 765, in handle_dataset_plot
    plot_lowlevel(plot_spec, ax, conditions, measurement_to_plot, plot_sim)
  File "env-petabpypesto37/lib/python3.7/site-packages/petab/visualize/plotting_config.py", line 93, in plot_lowlevel
    conditions[conditions.index.values],
AttributeError: 'numpy.ndarray' object has no attribute 'index'

so conditions seems to have the wrong data type, here its an numpy.array with the time-points, that should be plotted on the x-axis.

Describe the solution you would like

  1. It would be nice if only defining plotTypeData with replicate in the visualization table would be enough (and not additionally adding **plotted_noise='replicate'** in the function plot_data_and_simulation. Could be easier to understand for beginners.
  2. Despite of addressing the error, it would be nice, if also one datapoint could be plotted as replicate, such that I can plot one datapoint with the marker 'x' (For now I have to plot it with MeanAndSTD which uses the marker '.' which leads to invisible overlapping between data points and simulation points ('o').
    Screenshot from 2021-02-03 19-37-54

Describe alternatives you have considered

Additional context

single objectivePriorParameters are cast to numpy.float64

When all objectivePriorParameters are either empty or set to a single value, the respective column will be read with dtype numpy.float64, which may cause problems down the line as spec says this should be a string. Not a issue currently since all currently available priors require two parameters, but may lead to problems if that changes.

datasetId not optional for visualization table

When trying to use petab.visualize.plot_data_and_simulation without the column datasetId in the visualization table and without datasetId and preequilibrationConditionId produces the following error (file paths edited), while petablint does not produce any error. Attached a working example as well as the fix. One could make datasetId and preequilibrationConditionId mandatory.
example.zip
@elbaraim

  File "...AMICI_PEtab_simulation.py", line 62, in <module>
    ax = plot_data_and_simulation(exp_data=dir_measurments,
  File "...petab/visualize/plot_data_and_simulation.py", line 128, in plot_data_and_simulation
    exp_data, vis_spec = create_or_update_vis_spec(exp_data,
  File "...petab/visualize/helper_functions.py", line 572, in create_or_update_vis_spec
    vis_spec = expand_vis_spec_settings(vis_spec, columns_dict)
  File   "...petab/visualize/helper_functions.py", line 487, in expand_vis_spec_settings
    vis_spec[select_conditions].loc[:, column].values[0])
IndexError: index 0 is out of bounds for axis 0 with size 0```

Python API for linter

A 1-line python API to the linter would be nice, basically calling petablint::main with arguments.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.