genentech / dcdfg Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 5.0 85 KB

License: Apache License 2.0

Python 61.62% Jupyter Notebook 38.38%

dcdfg's People

Contributors

Stargazers

Watchers

Forkers

alejandrotl ekernf01 hiremasa marie-minaeva

dcdfg's Issues

MAE computation

Hi DCD-FG devs, I really admire your work on this project. I've been briefly trying it out over the past couple of days. From run_perturbseq_linear.py, I can see how to train it and how to compute log likelihoods on held-out expression. But how do I predict expression? Is there a built-in method to maximize eq. 8 with respect to most entries of X, while holding constant a few entries of X (those under intervention), and the graph structure, and the coefficients? EDIT: Or is there a way to sample non-intervened entries of X? I assume this would be needed to compute the MAE. Thanks!

Fig 5 repro attempt: logging details and augmented Lagrangian params

Hi DCD-FG devs! With a coworker, I am trying to reproduce parts of fig. 5 from the DCD-FG paper. Here are our results from the interferon condition.

(DCD-FG is green, bottom)

The relative performance is the same as fig 5:

(DCD-FG is red, top)

But the median NLL is lower, and the spread much higher, than fig 5. We want to check in on two details.

Is each dot in fig5 the negative log likelihood for a single cell in the test data, or an average of single-cell NLL's within an interventional condition, or an average of all test set cells across one of many data splits? We're currently logging the second (average within condition).
We encountered an error with NaNs during mlplr training. To solve it and finish training, three different strategies have worked for us.
1. cap the mlplr training at 200 epochs,
2. uncomment lines affecting the augmented lagrangian param updates (model L150 leads to model L181) or
3. change to double precision.

My colleague's example figure (above) caps mlplr at 200 epochs, but we will likely next default to option two, uncommenting. What would you recommend?

Thank you very much!

Some installation notes

Hi! I'm not sure where to document this, but I've been trying to run this codebase, and some installation issues + fixes I encountered were:

Python 3.9 seems to be the requirement for downloading the specified version of PyTorch. Python 3.10 gives an error (at least via pip installing the requirements file).
wandb came bundled with protobuf 4.x, which seems to break with Python 3.9. The code threw an error that said to downgrade to protobuf 3.20.x, which fixed the issue.

Running the code:

You must specify --folder when running python make_lowrank_dataset.py; the command does not work without args.
The default root for run_gaussian is data/simulated, to which your specified data_dir is appended. In contrast, run_perturbseq_linear uses data_dir as the root and data_path as the "name."

Will update this if I come up with notes :)

genentech / dcdfg Goto Github PK

dcdfg's People

Contributors

Stargazers

Watchers

Forkers

dcdfg's Issues

MAE computation

Fig 5 repro attempt: logging details and augmented Lagrangian params

Some installation notes

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent