The peakpeeker from stjude-biohackathon

peakpeeker's Issues

Utilize async to ensure responsiveness of the app as peak calling runs

Using software that hangs while it's waiting on some intensive process to finish is annoying. async programming is a way around that, as it uses a placeholder as a promise of output when a given bit of code is done running.

So for example, if we're running a peak caller and want stuff to not just freeze up for the 15-30 seconds until it's done, we can utilize a promise to allow R to spawn an additional process to say, "Hey, we'll show this when it's done, but you can continue doing whatever with the UI in the meantime until then."

I don't actually know if this will be necessary, but I suspect it might be, especially when more than a few instances are used/running at once. Another thing I've never used in R, but hey, we're here to learn.

Determine app layout and rough it out with placeholders.

Most likely something like this, as suggested by @jake-steele.

Fairly simple two-pane layout, using JBrowseR to show tracks. One JBrowse session will always be shown to display input/enrichment signal for a region with genes.

Each peakcaller instance will also get it's own JBrowseR session to show its peaks output in.

Add PeakRanger callers

PeakRanger. Includes three different peak callers, so may need different modules, not sure how much overlap there is.

Add MACS module.

Write shiny module for chosen OG peak caller

This is likely the bit that the entire project hinges on, mostly. Shiny Modules should allow us to instance new...instances, I guess, of a peak caller pretty easily, without duplicating code. Being able to say "just pop a new do-hicky in" rather than hard-coding it all will be pretty nice and make adding additional peak callers much easier.

I've never done this, so we'll see how it goes.

Determine which peakcaller to start with and which to add later

There are a whole buncha peak callers, some specialize for different assays. Using something generic/common like MACS2 is likely a good first attempt.

Other options include MACS 1.4.3, PeakRanger, SICER, and SICER2. The SICERs are meant more for broad peaks, while PeakRanger has a few different modes/callers included for different purposes, which is useful. It was also used by modENCODE, so some traction there.

Regardless, start with one, we can try to add others later or if we have time. Comments and additions welcome here.

Add post-call score/p-val/FDR filters

As appropriate for each peak caller.

Implement button to add new instance of peak caller.

By default, there will be no peak callers box shown. We'll need to add a button that does a few things:

Get peakcaller chosen from a selectInput dropdown menu containing all the options (will only have one to start).
Add a new shiny module for the selected peak caller, ensuring temp stuff is named/kept track of appropriately. See the insertUI docs for how this can be done...probably.

Probably more. This is one of the more challenging aspects of the project.

Update README/vignette with basic use info/GIFs of package in use.

Add basic usage directions.
Add logo.
Add hackathon blurb.
Add GIFs of peakPeekeR in action.

De-uglify MACS2 module UI.

Better organize inputs, make collapsible, swap to pretty checkboxes, avoid splitLayout, etc.

Implement and test using basilisk to install peakcaller envs.

I think something like the following will work:

#' @importFrom basilisk BasiliskEnvironment
env_macs2 <- BasiliskEnvironment("env_macs2", pkgname="ClientPackage",
                            packages=c("macs2==2.2.7.1", "python==3.8"),
                            channels=c("bioconda", "conda-forge"))

env_macs <- BasiliskEnvironment("env_macs", pkgname="ClientPackage",
                            packages=c("macs==1.4.3", "python==2.7"),
                            channels=c("bioconda", "conda-forge"))

And then we can use system or system2 to use them, e.g.:

#' Test function
#'
#' Does nothing but test that we can load modules from different virtual environments.
#'
#' @return A list of names of objects exposed in each module.
#' @export
#' @importFrom reticulate import
#' @importFrom basilisk basiliskStart basiliskRun basiliskStop BasiliskEnvironment
#' @importFrom basilisk.utils activateEnvironment deactivateEnvironment

test <- function() {
  cl <- basiliskStart(env_macs2)
  macs2.v <- basiliskRun(cl, function() {
    outtie <- system2('macs2', args = c('--version'), stdout = TRUE)
    return(outtie)
  })
  basiliskStop(cl)

  cl <- basiliskStart(env_macs)
  macs.v <- basiliskRun(cl, function() {
    outtie <- system2('macs', args = c('--version'), stdout = TRUE)
    return(outtie)
  })
  basiliskStop(cl)

  list(macs=macs.v, macs2=macs2.v)
}

Note that we'll have to nest everything in basiliskRun. I don't think this will work on Windows, and it's dependent on the peak callers being available in conda/bioconda. I made sure to add MACS 1.4.3, and most of the others are already available for at least linux and OSX. PeakRanger is only available for linux from conda - this may be something I can fix on the conda side after the fact.

Determine which parameters matter/should be added for peak callers of interest

Many of the peak callers mentioned in #2 have a ton of parameters. Some matter, some don't. Making note of which ones are worth making available to the user and which should just use default settings will be important.

Pretty easy to adjust later, so we can throw the obvious ones up at first and go from there.

MACS2
PeakRanger - Actually has 3 different peak callers in it (ranger, ccat, bcp).
MACS - Gonna have to check the command line for this one, can't even find the manual online anymore.
SICER2 - This actually contains another variant of their peakcaller called RECOGNICER too.

Provided input and signal sorted/indexed bams, subset to specified region and store as temp file.

We want stuff to run fast on a specific region, so we need to figure out how to read in, subset, and save reads for the specified region to a temp file that is easy to track and clean up.

This can likely be achieved with Rsamtools and tempdir() and tempfile().

Gotta test how long this take for an average bam for ~250k to ensure startup isn't insane.

Add SEACR caller

SEACR.

https://github.com/FredHutch/SEACR

Add SPAN peakcaller

SPAN.

Come up with real snazzy name and logo

For them popularity points. Image is half the battle.

Add SICER2 module.

Add JAMM caller

JAMM.

Jeremy Jamm.

Find good toy data (ChIP, ATAC, CUTNRUN) and test region

Obviously need data to test on. Getting both sharp (TF) and broad data (histone marks) and the various assays.

And preferably finding a region with decent peaks across these marks/assays (GAPDH, MYC?)

Actually do peak calling

Two options here. Either create a new peakcalling function for each caller, or use a generic one that does the initial calling and just have caller-specific clean-up/conversion/whatever functions to get the results in an appropriate format for viewing in the browser.

Doesn't really matter which we choose first, but probably cleaner to put it all in the module code as described in #7 eventually.

Do peakcalling.
Convert bed, narrowPeak, etc output to GFF. See #9.
Have browser catch output and display it. Probably need an observer for this.

stjude-biohackathon / peakpeeker Goto Github PK

peakpeeker's People

Contributors

Watchers

peakpeeker's Issues

Recommend Projects

Recommend Topics

Recommend Org