Giter VIP home page Giter VIP logo

dce's Introduction

dce dce logo

lint check-bioc pkgdown BioC status

Compute differential causal effects on (biological) networks. Check out our vignettes for more information.

Publication: https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btab847/6470558

Installation

Install the latest stable version from Bioconductor:

BiocManager::install("dce")

Install the latest development version from GitHub:

remotes::install_github("cbg-ethz/dce")

Project structure

  • .: R package
  • inst/scripts/: Snakemake workflows for all investigations in publication
    • crispr_benchmark: Real-life data validation
    • gtex_validation: Deconfounding validation
    • ovarian_cancer: How does Ovarian Cancer dysregulate pathways?
    • synthetic_benchmark: Synthetic data validation
    • tcga_pipeline: Compute effects for loads of data from TCGA

Development notes

  • Check package locally:
    • Rscript -e "lintr::lint_package()"
    • Rscript -e "devtools::test()"
    • Rscript -e "devtools::check(error_on = 'warning')"
    • R CMD BiocCheck
  • Documentation
    • Build locally: Rscript -e "pkgdown::build_site()"
    • Deploy: Rscript -e "pkgdown::deploy_to_branch(new_process = FALSE)"
  • Bioconductor
    • The bioc branch stores changes specific to Bioconductor releases
    • Update workflow (after git remote add upstream [email protected]:packages/dce.git):
      • git checkout bioc
      • git merge master
      • git push upstream bioc:master

dce's People

Contributors

dcevid avatar kpj avatar martinfxp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

martinfxp dcevid

dce's Issues

Open issues:

Meeting Notes:

  1. better understand difference between separate and joint models (they are actually equivalent: https://books.google.ch/books?id=zyjWBgAAQBAJ&pg=PA137&lpg=PA137&dq=regression+for+to+separate+data+sets+indicator&source=bl&ots=OZiF7M0ShS&sig=ACfU3U3f5mni4Zj7-xY-RdvMsw8eVssHTQ&hl=de&sa=X&ved=2ahUKEwjahYC16e7oAhXM-KQKHR6fCdcQ6AEwAXoECAsQLw#v=onepage&q=regression%20for%20to%20separate%20data%20sets%20indicator&f=false)
  2. try likelihood ratio test model (for single edge)
    • with delta vs without delta (with only one delta?)
  3. log link function may be a better idea
  4. try partial correlation with NB assumption
  5. benchmark: set 90% of ground truth DCEs to 0 (makes setting more biologically relevant, maybe AUC can be used)
  6. where to get DAGs from (KEGG, ...)
  7. simulations: sampling beta and subtracting minimum biases beta (?)

bug in generating graph?

graph1 <- create_random_DAG(n=p, prob=.05, lB=1)
Warning message:
In runif(length(negedges), min = lB[1], max = lB[2]) : NAs produced

In documentation it says lB is lower bound and uB is upper bound. I suppose you want to use that rather than lB[1], lB[2]

Real pathways are not always DAGs

How to deal with this problem?

Ideas:

  • transform pathway to DAG (how?), how to validate
  • adapt method (dynamical bayesian network??)

Random DAG creation

Is the somewhat complicated DAG creation in dce::create_random_DAG more or less similar to this much simpler implementation:

node_num <- 10
edge_prob <- .9
eff_min <- .2
eff_max <- 1.4

tmp <- matrix(rbinom(node_num * node_num, 1, edge_prob), node_num, node_num)
tmp[lower.tri(tmp)] <- 0
tmp[tmp != 0] <- runif(sum(tmp != 0), min = eff_min, max = eff_max)

dce::plot_network(tmp)

`glm.nb` throws warnings and errors when data is naughty

The following code generates warnings and an error if the value flip is applied.

A <- rnbinom(100, size=100, mu=1000)
B <- rnbinom(100, size=100, mu=0.1*A)

# value flip
A[1] <- 20
B[1] <- 2000

glm.nb(B ~ A, link="identity")

The error is no valid set of coefficients has been found: please supply starting values.

A subset of the warnings:

1: In log(y/mu) : NaNs produced
2: step size truncated due to divergence
3: In log(y/mu) : NaNs produced
4: step size truncated due to divergence
5: glm.fit: algorithm did not converge
6: In log(pmax(1, y)/mu) : NaNs produced
7: In log((y + .Theta)/(mu + .Theta)) : NaNs produced
[..]

How to benchmark perturbed ground truth?

Ground truth:
A -> B - > C

Perturbed:
A -> B - > C; A -> C

dce(A,C) is the same in both settings. However, the ground truth, would not compute a dce(A,C), since there is no edge. Is the dce(A,C) a false positive or not?

Simulating negative binomial read counts on DAGs

Idea 1

beta > 0 describes the relative change. 0.5 corresponds to halving and 2 to doubling the expression levels.
This is problematic because it requires a transformation of causal effects which is non-trivial (but possibly somehow doable?).

Idea 2

beta can be both positive and negative. Counts are propagated by multiplying beta with mean-standardized counts and adding noise.
This is problematic because standardizing might introduce artefacts and can lead to mu < 0 (which yields NaN counts).

beta <- -1.2

set.seed(42)
A.nb <- rnbinom(1000, size=10, mu=10)

B.nb <- beta * A.nb + rnbinom(1000, size=10, mu=10) # leads to negative counts
B.nb <- rnbinom(1000, size=10, mu=mean(A.nb) + beta * A.nb) # leads to negative mu, thus NA counts
B.nb <- rnbinom(1000, size=10, mu=10) + beta * scale(A.nb, scale=FALSE) # leads to negative counts
B.nb <- rnbinom(1000, size=10, mu=mean(A.nb) + beta * scale(A.nb, scale=FALSE)) # leads to negative mu, thus NA counts

MASS::glm.nb(B.nb ~ A.nb, link="identity")

Idea 3

Use a mean function for mu of rnbinom. This requires an appropriate link function during the regression.

beta <- -1.2

set.seed(42)
A.nb <- rnbinom(1000, size=10, mu=10)

B.nb <- rnbinom(1000, size=10, mu=exp(log(10) + beta * (A.nb - mean(A.nb)))) # link function keeps mu positive, exp can lead to extreme values

MASS::glm.nb(B.nb ~ A.nb, link="log")
glm(B.nb ~ A.nb, family=MASS::negative.binomial(theta=10, link="log"))
glm2::glm2(B.nb ~ A.nb, family=MASS::negative.binomial(theta=10, link="log"))

Real pathways are incomplete

Real (e.g. KEGG) pathways may be missing true nodes and edges. Alternatively, they might have false ones.

We should investigate their effect using benchmarks.
The choice of valid adjustment set might influence this.

How to control the noise in (large) graphs?

If theta and mu are constant for all genes in rnbinom, the actual theta propagated by the causal effects in he graph will mostly decrease. We can control for that by setting a lower mu and or higher theta for downstream genes.

How to test for enrichment of a pathway?

I.e. produce p-value based on the dces for a pathway P.

Idea:
permutation test as in permute the node labels of the data and compute the differential causal effects. What is the statistic? absolute mean, absmax, abssum, ... dce?

How to generate synthetic expression profiles

A few approaches from literature

Working with DE vs non-DE genes

Working with gene expression values

Other simulation papers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.