Giter VIP home page Giter VIP logo

ssps's People

Contributors

agitter avatar dpmerrell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ssps's Issues

Add test case for expected my_predictions.csv

The first test case we create can be an integration test for my_predictions.csv. If we add the expected output to tests/expected/my_predictions.csv or somewhere similar, we can check the version created during continuous integration with the version stored in the repository.

Our SINGE test cases use the csvdiff Python program to compare a predicted and reference edge list. It has the ability to tolerate floating point differences due to stochastic algorithms. We could that approach here as well.

Installation and usability issues Oct 2022

@evangorstein reported that the conda instructions in the readme give an UnsatisfiableError at the line conda install -c bioconda -c conda-forge snakemake. I was not able to reproduce this on my system, but I am not surprised because the current instructions' behavior could be quite dependent on how Python and Anaconda are configured locally.

There are a few ways to improve our current setup:

  1. Installing all dependencies in one step so that conda can avoid conflicts
  2. Pinning versions of essential packages
  3. Switching to snakemake-minimal instead of snakemake

For 1), I propose this new command that would create a new environment and install the required packages all at once:

conda create -n ssps -c conda-forge pandas matplotlib numpy bioconda::snakemake-minimal

@evangorstein does that work on your system? If it does, I'll update the readme.

For 2), I don't think we need to make updates now. The conda environment is primarily used to run the Snakemake workflow and generate figures. It should not be too dependent on specific package versions. In case we need to pin versions later, here are the versions known to work together and work with our code in my local environment:

conda list output
# Name                    Version                   Build  Channel
appdirs                   1.4.3                      py_1    conda-forge
attrs                     19.3.0                     py_0    conda-forge
brotlipy                  0.7.0           py37h4ab8f01_1000    conda-forge
ca-certificates           2020.4.5.1           hecc5488_0    conda-forge
certifi                   2020.4.5.1       py37hc8dfbb8_0    conda-forge
cffi                      1.14.0           py37ha419a9e_0    conda-forge
chardet                   3.0.4           py37hc8dfbb8_1006    conda-forge
configargparse            1.2.3              pyh9f0ad1d_0    conda-forge
cryptography              2.9.2            py37h26f1ce3_0    conda-forge
datrie                    0.8.2            py37h8055547_0    conda-forge
decorator                 4.4.2                      py_0    conda-forge
docutils                  0.16             py37hc8dfbb8_1    conda-forge
gitdb                     4.0.5                      py_0    conda-forge
gitpython                 3.1.2                      py_0    conda-forge
idna                      2.9                        py_1    conda-forge
importlib-metadata        1.6.0            py37hc8dfbb8_0    conda-forge
importlib_metadata        1.6.0                         0    conda-forge
intel-openmp              2020.0                      166
ipython_genutils          0.2.0                      py_1    conda-forge
jsonschema                3.2.0            py37hc8dfbb8_1    conda-forge
jupyter_core              4.6.3            py37hc8dfbb8_1    conda-forge
libblas                   3.8.0                    15_mkl    conda-forge
libcblas                  3.8.0                    15_mkl    conda-forge
liblapack                 3.8.0                    15_mkl    conda-forge
mkl                       2020.0                      166
nbformat                  5.0.6                      py_0    conda-forge
numpy                     1.18.4           py37hae9e721_0    conda-forge
openssl                   1.1.1g               he774522_0    conda-forge
pandas                    1.0.3            py37h3bbf574_1    conda-forge
pip                       20.1               pyh9f0ad1d_0    conda-forge
psutil                    5.7.0            py37h8055547_1    conda-forge
pycparser                 2.20                       py_0    conda-forge
pyopenssl                 19.1.0                     py_1    conda-forge
pyrsistent                0.16.0           py37h8055547_0    conda-forge
pysocks                   1.7.1            py37hc8dfbb8_1    conda-forge
python                    3.7.6           h60c2a47_5_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2020.1             pyh9f0ad1d_0    conda-forge
pywin32                   227              py37hfa6e2cd_0    conda-forge
pyyaml                    5.3.1            py37h8055547_0    conda-forge
ratelimiter               1.2.0                 py37_1000    conda-forge
requests                  2.23.0             pyh8c360ce_2    conda-forge
setuptools                46.3.1           py37hc8dfbb8_0    conda-forge
six                       1.14.0                     py_1    conda-forge
smmap                     3.0.4              pyh9f0ad1d_0    conda-forge
snakemake-minimal         5.17.0                     py_0    bioconda
sqlite                    3.30.1               hfa6e2cd_0    conda-forge
toposort                  1.5                        py_3    conda-forge
traitlets                 4.3.3            py37hc8dfbb8_1    conda-forge
urllib3                   1.25.9                     py_0    conda-forge
vc                        14.1                 h869be7e_1    conda-forge
vs2015_runtime            14.16.27012          h30e32a0_2    conda-forge
wheel                     0.34.2                     py_1    conda-forge
win_inet_pton             1.1.0                    py37_0    conda-forge
wincertstore              0.2                   py37_1003    conda-forge
wrapt                     1.12.1           py37h8055547_1    conda-forge
yaml                      0.2.4                he774522_0    conda-forge
zipp                      3.1.0                      py_0    conda-forge

For 3), I suggest we switch to snakemake-minimal to avoid the large number of dependencies that are required to support cloud computing with Snakemake. We can make installing the snakemake package an optional second step along with cookiecutter.

Native parallelism

SSPS should be capable of generating multiple chains in parallel without relying on external wrappers (e.g., Snakemake).

This could probably be done via Julia's @threads macro.

Reorganize Julia code into a package

Right now the Julia code resides in a directory with an environment defined by *.toml files.

It would be good if we began organizing the code into a proper Julia package.

Tracking SSPS versions

We should add a version to the SSPS codebase. This will help report which version of the code was used in manuscripts and track changes between releases.

I haven't looked into best practices for Julia package versions.

Clarifying installation instructions

I followed the installation instructions on a Windows 10 machine with Git Bash (an environment known to cause issues for some software). I ran into a few Julia dependencies problems and have other install-related comments.

julia> Pkg.instantiate()
ERROR: `BSON` is a direct dependency, but does not appear in the manifest. If you intend `BSON` to be a direct dependency, run `Pkg.resolve()` to populate the manifest. Otherwise, remove `BSON` with `Pkg.rm("BSON")`. Finally, run `Pkg.instantiate()` again.

Following those instructions

julia> Pkg.resolve()
   Cloning default registries into `C:\Users\agitter\.julia`
   Cloning registry from "https://github.com/JuliaRegistries/General.git"
     Added registry `General` to `C:\Users\agitter\.julia\registries\General`
 Resolving package versions...
ERROR: path C:\Users\agitter\.julia\packages\Gen\eQpFO for package Gen no longer exists. Remove the package or `develop` it at a new path

If these could be Windows-related errors, I'll try on a Linux machine instead.

We can also update $ cd graph-ppl/julia-project to $ cd ssps/julia-project (and the 2 other readme instances of graph-ppl).

Because the standalone SSPS run still uses Snakemake, should we insert an optional new step 3 between the current steps 2 and 3 of "Running SSPS"? It would say if you don't have the Python and Snakemake dependencies, see below for installation instructions.

SSPS output format

For one-off SSPS runs: need to convert output JSONs to a tab-delimited file -- one row for each edge.

e.g.,
node1 (tab) node2 (tab) score

Upgrade Julia & package versions

Julia has gone from v1.2 to v1.4 in the months since starting this project.

Many of the packages have also released updates in that time.

It's probably best to update the Julia version and *.toml files.

Compiling SSPS binaries

Making this issue for documentation purposes.

It would be nice to distribute SSPS as a binary. Users wouldn't need to fiddle with Julia dependencies at all.

Windows compatibility

On Windows 10, I was unable to install snakemake through conda. There were not appropriate Windows package versions available in the bioconda and conda-forge channels for all of the dependencies. Switching to snakemake-minimal worked and suffices for local execution. pandas must be explicitly installed to that conda environment as well because it is no longer a dependency.

I should confirm that conda install -c bioconda -c conda-forge snakemake-minimal pandas creates a minimal working environment for SSPS. Then, we can update the readme.

After #8, I no longer have errors with Pkg.instantiate(). The example in run_ssps appears to run correctly and gives the expected output.

Failing Travis CI build

The latest Travis CI job failed. Part of the error message is

rule run_mcmc:
    input: ../SSPS/ssps_wrapper.jl, example_timeseries.csv, example_prior.csv
    output: temp/3.json
    jobid: 6
    wildcards: chain=3
    resources: runtime=70, threads=1, mem_mb=2000
Invoking SSPS on input files:
	example_timeseries.csv
	example_prior.csv
Sampling for 60.0 seconds. (Or 100000 iterations.)
ERROR: LoadError: MethodError: no method matching ##lambda_vec_proposal#450(::Gen.GFProposeState, ::getfield(Main.SSPS, Symbol("##StaticIRTrace_vertex_lambda_dbn_model#445")), ::Int64, ::Float64)
Closest candidates are:
  ##lambda_vec_proposal#450(::Any, ::Any, ::Int64, !Matched::Int64) at /home/travis/build/gitter-lab/ssps/SSPS/src/dbn_proposals.jl:246

Comparing the last successful job with the failed job there are some differences in installed packages:

$ diff sorted_success.txt sorted_failure.txt
1,2c1,3
<  Installed ArgParse ──────────────────── v1.1.0
<  Installed BinDeps ───────────────────── v1.0.1
---
>  Installed ArgParse ──────────────────── v1.1.1
>  Installed Arpack ────────────────────── v0.3.2
>  Installed BinDeps ───────────────────── v1.0.2
5,6c6,7
<  Installed CSV ───────────────────────── v0.7.7
<  Installed CategoricalArrays ─────────── v0.8.1
---
>  Installed CSV ───────────────────────── v0.8.2
>  Installed CategoricalArrays ─────────── v0.8.3
9,11c10,12
<  Installed Compat ────────────────────── v2.2.0
<  Installed DataAPI ───────────────────── v1.3.0
<  Installed DataFrames ────────────────── v0.21.6
---
>  Installed Compat ────────────────────── v2.2.1
>  Installed DataAPI ───────────────────── v1.4.0
>  Installed DataFrames ────────────────── v0.21.8
14,18c15,19
<  Installed DiffResults ───────────────── v1.0.2
<  Installed DiffRules ─────────────────── v1.0.1
<  Installed Distributions ─────────────── v0.23.9
<  Installed FillArrays ────────────────── v0.9.2
<  Installed ForwardDiff ───────────────── v0.10.12
---
>  Installed DiffResults ───────────────── v1.0.3
>  Installed DiffRules ─────────────────── v1.0.2
>  Installed Distributions ─────────────── v0.22.6
>  Installed FillArrays ────────────────── v0.8.14
>  Installed ForwardDiff ───────────────── v0.10.14
22c23
<  Installed Gen ───────────────────────── v0.3.5
---
>  Installed Gen ───────────────────────── v0.4.1
26,27c27,28
<  Installed JSON ──────────────────────── v0.21.0
<  Installed LRUCache ──────────────────── v1.1.0
---
>  Installed JSON ──────────────────────── v0.21.1
>  Installed LRUCache ──────────────────── v1.2.0
29c30
<  Installed LibExpat ──────────────────── v0.5.0
---
>  Installed LibExpat ──────────────────── v0.5.0
31,35c32,36
<  Installed MacroTools ────────────────── v0.5.5
<  Installed Missings ──────────────────── v0.4.3
<  Installed NaNMath ───────────────────── v0.3.4
<  Installed OrderedCollections ────────── v1.3.0
<  Installed PDMats ────────────────────── v0.10.0
---
>  Installed MacroTools ────────────────── v0.5.6
>  Installed Missings ──────────────────── v0.4.4
>  Installed NaNMath ───────────────────── v0.3.5
>  Installed OrderedCollections ────────── v1.3.2
>  Installed PDMats ────────────────────── v0.9.12
37c38,39
<  Installed Parsers ───────────────────── v1.0.10
---
>  Installed Parameters ────────────────── v0.12.1
>  Installed Parsers ───────────────────── v1.0.15
39c41
<  Installed QuadGK ────────────────────── v2.4.0
---
>  Installed QuadGK ────────────────────── v2.4.1
41c43
<  Installed ReverseDiff ───────────────── v1.4.2
---
>  Installed ReverseDiff ───────────────── v1.5.0
43c45
<  Installed SentinelArrays ────────────── v1.2.10
---
>  Installed SentinelArrays ────────────── v1.2.16
46,48c48,51
<  Installed StaticArrays ──────────────── v0.12.4
<  Installed StatsBase ─────────────────── v0.33.0
<  Installed StatsFuns ─────────────────── v0.9.5
---
>  Installed StaticArrays ──────────────── v1.0.1
>  Installed StatsBase ─────────────────── v0.32.2
>  Installed StatsFuns ─────────────────── v0.9.6
>  Installed StructTypes ───────────────── v1.2.1
50c53
<  Installed Tables ────────────────────── v1.0.5
---
>  Installed Tables ────────────────────── v1.2.2
52a56
>  Installed UnPack ────────────────────── v1.0.2

Allow SSPS to "resume" sampling

SSPS is fundamentally an MCMC procedure. It generates samples.

Sometimes it doesn't generate enough samples, but we don't know this until after the fact (i.e., we compute convergence diagnostics and see that it hasn't converged).

It would be nice if SSPS could resume sampling on existing markov chains.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.