Giter VIP home page Giter VIP logo

dcdfg's People

Contributors

romain-lopez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dcdfg's Issues

MAE computation

Hi DCD-FG devs, I really admire your work on this project. I've been briefly trying it out over the past couple of days. From run_perturbseq_linear.py, I can see how to train it and how to compute log likelihoods on held-out expression. But how do I predict expression? Is there a built-in method to maximize eq. 8 with respect to most entries of X, while holding constant a few entries of X (those under intervention), and the graph structure, and the coefficients? EDIT: Or is there a way to sample non-intervened entries of X? I assume this would be needed to compute the MAE. Thanks!

Fig 5 repro attempt: logging details and augmented Lagrangian params

Hi DCD-FG devs! With a coworker, I am trying to reproduce parts of fig. 5 from the DCD-FG paper. Here are our results from the interferon condition.

Bar plot showing negative log likelihood contributions of test-set regimes with wide spread and DCD-FG beating but overlapping linear baselines

(DCD-FG is green, bottom)

The relative performance is the same as fig 5:

Bar plot showing negative log likelihood contributions of test-set regimes with narrow spread and DCD-FG winning cleanly

(DCD-FG is red, top)

But the median NLL is lower, and the spread much higher, than fig 5. We want to check in on two details.

  • Is each dot in fig5 the negative log likelihood for a single cell in the test data, or an average of single-cell NLL's within an interventional condition, or an average of all test set cells across one of many data splits? We're currently logging the second (average within condition).
  • We encountered an error with NaNs during mlplr training. To solve it and finish training, three different strategies have worked for us.
    1. cap the mlplr training at 200 epochs,
    2. uncomment lines affecting the augmented lagrangian param updates (model L150 leads to model L181) or
    3. change to double precision.

My colleague's example figure (above) caps mlplr at 200 epochs, but we will likely next default to option two, uncommenting. What would you recommend?

Thank you very much!

Some installation notes

Hi! I'm not sure where to document this, but I've been trying to run this codebase, and some installation issues + fixes I encountered were:

  1. Python 3.9 seems to be the requirement for downloading the specified version of PyTorch. Python 3.10 gives an error (at least via pip installing the requirements file).
  2. wandb came bundled with protobuf 4.x, which seems to break with Python 3.9. The code threw an error that said to downgrade to protobuf 3.20.x, which fixed the issue.

Running the code:

  1. You must specify --folder when running python make_lowrank_dataset.py; the command does not work without args.
  2. The default root for run_gaussian is data/simulated, to which your specified data_dir is appended. In contrast, run_perturbseq_linear uses data_dir as the root and data_path as the "name."

Will update this if I come up with notes :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.