Giter VIP home page Giter VIP logo

tsrchitect's People

Contributors

hpages avatar link-ny avatar lshep avatar nturaga avatar rpolicastro avatar rtraborn avatar vobencha avatar vpbrendel avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

tsrchitect's Issues

mergeSampleData behavior and speed for TSSs

mergeSampleData seems to take a long time to run, and for TSSs it appears to not add TSSs that are overlapping.

The speed of this for TSS merging could be greatly increased, as well as the behavior being more of what is expected, by using the dplyr library.

merged <- rbind(sample.1, sample.2) %>%
   group_by(seq, TSS, strand) %>%
   summarize(nTAGs = sum(nTAGs)) %>%
   arrange(seq, TSS) %>%
   as.data.frame()

Where sample.1 and sample.2 are two hypothetical data frames stored in the @tssCountData slot. This will output a data frame that is position sorted, and where any overlapping TSSs are summed. It should only take a few seconds to run as well.

[suggestion] Add outputDir argument to processTSS and determineTSR

Would be handy to have an explicit argument, such as outputDir, to specify where you want the text files to be written if writeTable = TRUE in either processTSS or determineTSR. It's often a little more convenient than having to change the working directory for each of the commands.

[suggestion] bedgraph file for TSSs

After filtering, TSRchitect provides you a list of TSSs and the number of tags for each position. It would be great to have a convenience function that can turn this information into a bedgraph for visualization in a genome viewer. It should perhaps output a separate bedgraph for both the + and - strand. I've attached example files below from the latest STRIPE-seq run with yeast.

TSS_bedgraphs.zip

[suggestion] bed file of TSRs.

The generated list of TSRs already has the information required to output a bed file for TSR visualization and downstream analysis (such as motif finding). It would be handy to have a function to generate it for you, instead of having to do it by hand every time. You could also perhaps include the score column to shade TSR "strength" in a genome viewer, such as scaling nTAGs between 0-1000.

I've included an example from the list of TSRs Taylor generated for the latest STRIPE-seq run.
TSRs_scored.bed.zip

Many of the TSR attributes are undocumented.

The latest TSR output has a few categories describing the attributes of a TSR. The following don't appear to have documentation explaining what they represent, and how they are derived.

  • tsrPeak
  • tsrTrq
  • tsrMSI

maintenance

The last accepted pull request led to a successful build of our Singularity container, but this was not the easiest of fixes. ngsutils has not been maintained for a few years (in its original package) and getting it to compile needed a peculiar fix (the code was compiled in a virtual python environment, but their latest commit had the necessary cython not be put into that environment, causing fatal compilation errors).

I suggest: 1) to freeze the current Singularity recipe/container with appropriate version labeling; and 2) consider building a new container with updated 3rd party software.

processTSS using large amounts of memory

This report is for for latest github version of TSRchitect.

processTSS seems to be using a large amount of memory. I had 6 fairly small BAM files loaded into a TSRchitect object. When I went to run processTSS using 4 cores I got the following error from the (carbonate) resource manager.

=>> PBS: job killed: vmem 101773303808 exceeded limit 34359738368
Error in serialize(data, node$con, xdr = FALSE) :
  Java called System.exit(143) requesting R to quit - trying to recover

I had given myself 32GB of memory for my interactive session, but it looks like TSRchitect went up to about 100GB before I got booted.

Bam file sizes

-rw-r--r-- 1 rpolicas biol  57M Mar 24 19:25 S288C_diamide_1_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  59M Mar 24 19:25 S288C_diamide_2_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  47M Mar 24 19:25 S288C_diamide_3_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  76M Mar 24 19:25 S288C_WT_1_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  86M Mar 24 19:25 S288C_WT_2_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  76M Mar 24 19:25 S288C_WT_3_Aligned.out_cleaned.bam

Session Info

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default
BLAS/LAPACK: /gpfs/home/r/p/rpolicas/Carbonate/.conda/envs/tsrchitect-dev/lib/R/lib/libRblas.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] TSRchitect_1.8.9

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1                    compiler_3.5.1
 [3] BiocManager_1.30.4            later_0.8.0
 [5] GenomeInfoDb_1.18.1           XVector_0.22.0
 [7] AnnotationHub_2.14.2          bitops_1.0-6
 [9] tools_3.5.1                   zlibbioc_1.28.0
[11] digest_0.6.18                 bit_1.1-14
[13] lattice_0.20-38               RSQLite_2.1.1
[15] memoise_1.1.0                 Matrix_1.2-16
[17] DelayedArray_0.8.0            shiny_1.2.0
[19] DBI_1.0.0                     yaml_2.2.0
[21] parallel_3.5.1                rJava_0.9-10
[23] GenomeInfoDbData_1.2.0        rtracklayer_1.42.1
[25] httr_1.4.0                    gtools_3.8.1
[27] XLConnectJars_0.2-15          Biostrings_2.50.2
[29] S4Vectors_0.20.1              IRanges_2.16.0
[31] grid_3.5.1                    stats4_3.5.1
[33] bit64_0.9-7                   Biobase_2.42.0
[35] R6_2.4.0                      AnnotationDbi_1.44.0
[37] XML_3.98-1.19                 BiocParallel_1.16.2
[39] blob_1.1.1                    magrittr_1.5
[41] Rsamtools_1.34.0              matrixStats_0.54.0
[43] promises_1.0.1                htmltools_0.3.6
[45] BiocGenerics_0.28.0           GenomicRanges_1.34.0
[47] GenomicAlignments_1.18.1      XLConnect_0.2-15
[49] SummarizedExperiment_1.12.0   mime_0.6
[51] interactiveDisplayBase_1.20.0 xtable_1.8-3
[53] httpuv_1.5.0                  RCurl_1.95-4.12

[suggestion] Sample sheet to input experiment information.

At the moment information about the experiments are input manually into loadTSSobj. This can be somewhat inconvenient if you are processing a large number of files. Also, as stated in the documentation, you need to be careful to match the order with the bam files.

A few packages let you upload sample information from a separate tab delimited file or data.frame object. This ensures that sample information matches with the associated bam file, and that its convenient to process a large number of files at the same time.

A good example of this is the DiffBind package.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.