brendelgroup / tsrchitect Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 4.0 101.92 MB

Promoter identification from diverse types of large-scale TSS profiling data

License: GNU General Public License v3.0

R 100.00%

tsrchitect's People

Contributors

Stargazers

Watchers

Forkers

rtraborn vpbrendel rpolicastro cganote

tsrchitect's Issues

mergeSampleData behavior and speed for TSSs

mergeSampleData seems to take a long time to run, and for TSSs it appears to not add TSSs that are overlapping.

The speed of this for TSS merging could be greatly increased, as well as the behavior being more of what is expected, by using the dplyr library.

merged <- rbind(sample.1, sample.2) %>%
   group_by(seq, TSS, strand) %>%
   summarize(nTAGs = sum(nTAGs)) %>%
   arrange(seq, TSS) %>%
   as.data.frame()

Where sample.1 and sample.2 are two hypothetical data frames stored in the @tssCountData slot. This will output a data frame that is position sorted, and where any overlapping TSSs are summed. It should only take a few seconds to run as well.

[suggestion] Add outputDir argument to processTSS and determineTSR

Would be handy to have an explicit argument, such as outputDir, to specify where you want the text files to be written if writeTable = TRUE in either processTSS or determineTSR. It's often a little more convenient than having to change the working directory for each of the commands.

[suggestion] bedgraph file for TSSs

After filtering, TSRchitect provides you a list of TSSs and the number of tags for each position. It would be great to have a convenience function that can turn this information into a bedgraph for visualization in a genome viewer. It should perhaps output a separate bedgraph for both the + and - strand. I've attached example files below from the latest STRIPE-seq run with yeast.

TSS_bedgraphs.zip

[suggestion] bed file of TSRs.

The generated list of TSRs already has the information required to output a bed file for TSR visualization and downstream analysis (such as motif finding). It would be handy to have a function to generate it for you, instead of having to do it by hand every time. You could also perhaps include the score column to shade TSR "strength" in a genome viewer, such as scaling nTAGs between 0-1000.

I've included an example from the list of TSRs Taylor generated for the latest STRIPE-seq run.
TSRs_scored.bed.zip

Demo produces unexpected number of slots in @tssCountDataMerged and @tsrDataMerged

After testing demo-RAMPAGEp.R on the most recent codebase using the IRBB7 data, I found that the resulting tssExperiment object contains 3 (instead of the expected 2) slots in @tssCountDataMerged and @tsrDataMerged. I have not isolated the error yet, but am checking mergeSampleData.R first.

Many of the TSR attributes are undocumented.

The latest TSR output has a few categories describing the attributes of a TSR. The following don't appear to have documentation explaining what they represent, and how they are derived.

tsrPeak
tsrTrq
tsrMSI

maintenance

The last accepted pull request led to a successful build of our Singularity container, but this was not the easiest of fixes. ngsutils has not been maintained for a few years (in its original package) and getting it to compile needed a peculiar fix (the code was compiled in a virtual python environment, but their latest commit had the necessary cython not be put into that environment, causing fatal compilation errors).

I suggest: 1) to freeze the current Singularity recipe/container with appropriate version labeling; and 2) consider building a new container with updated 3rd party software.

processTSS using large amounts of memory

This report is for for latest github version of TSRchitect.

processTSS seems to be using a large amount of memory. I had 6 fairly small BAM files loaded into a TSRchitect object. When I went to run processTSS using 4 cores I got the following error from the (carbonate) resource manager.

=>> PBS: job killed: vmem 101773303808 exceeded limit 34359738368
Error in serialize(data, node$con, xdr = FALSE) :
  Java called System.exit(143) requesting R to quit - trying to recover

I had given myself 32GB of memory for my interactive session, but it looks like TSRchitect went up to about 100GB before I got booted.

Bam file sizes

-rw-r--r-- 1 rpolicas biol  57M Mar 24 19:25 S288C_diamide_1_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  59M Mar 24 19:25 S288C_diamide_2_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  47M Mar 24 19:25 S288C_diamide_3_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  76M Mar 24 19:25 S288C_WT_1_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  86M Mar 24 19:25 S288C_WT_2_Aligned.out_cleaned.bam
-rw-r--r-- 1 rpolicas biol  76M Mar 24 19:25 S288C_WT_3_Aligned.out_cleaned.bam

Session Info

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default
BLAS/LAPACK: /gpfs/home/r/p/rpolicas/Carbonate/.conda/envs/tsrchitect-dev/lib/R/lib/libRblas.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] TSRchitect_1.8.9

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1                    compiler_3.5.1
 [3] BiocManager_1.30.4            later_0.8.0
 [5] GenomeInfoDb_1.18.1           XVector_0.22.0
 [7] AnnotationHub_2.14.2          bitops_1.0-6
 [9] tools_3.5.1                   zlibbioc_1.28.0
[11] digest_0.6.18                 bit_1.1-14
[13] lattice_0.20-38               RSQLite_2.1.1
[15] memoise_1.1.0                 Matrix_1.2-16
[17] DelayedArray_0.8.0            shiny_1.2.0
[19] DBI_1.0.0                     yaml_2.2.0
[21] parallel_3.5.1                rJava_0.9-10
[23] GenomeInfoDbData_1.2.0        rtracklayer_1.42.1
[25] httr_1.4.0                    gtools_3.8.1
[27] XLConnectJars_0.2-15          Biostrings_2.50.2
[29] S4Vectors_0.20.1              IRanges_2.16.0
[31] grid_3.5.1                    stats4_3.5.1
[33] bit64_0.9-7                   Biobase_2.42.0
[35] R6_2.4.0                      AnnotationDbi_1.44.0
[37] XML_3.98-1.19                 BiocParallel_1.16.2
[39] blob_1.1.1                    magrittr_1.5
[41] Rsamtools_1.34.0              matrixStats_0.54.0
[43] promises_1.0.1                htmltools_0.3.6
[45] BiocGenerics_0.28.0           GenomicRanges_1.34.0
[47] GenomicAlignments_1.18.1      XLConnect_0.2-15
[49] SummarizedExperiment_1.12.0   mime_0.6
[51] interactiveDisplayBase_1.20.0 xtable_1.8-3
[53] httpuv_1.5.0                  RCurl_1.95-4.12

Allow sample sheet to also be a data.frame for input.

Right now you only allow a file input for the sample sheet. It would be useful to also allow a data frame as input. This would make it a lot easier to incorporate TSRchitect into magrittr pipes.

[suggestion] Sample sheet to input experiment information.

At the moment information about the experiments are input manually into loadTSSobj. This can be somewhat inconvenient if you are processing a large number of files. Also, as stated in the documentation, you need to be careful to match the order with the bam files.

A few packages let you upload sample information from a separate tab delimited file or data.frame object. This ensures that sample information matches with the associated bam file, and that its convenient to process a large number of files at the same time.

A good example of this is the DiffBind package.

Known issue with assignment of chromosome name to tsrData output

TR to identify and fix this error, which is either in tsrToDF or just 'upstream'.

brendelgroup / tsrchitect Goto Github PK

tsrchitect's People

Contributors

Stargazers

Watchers

Forkers

tsrchitect's Issues

mergeSampleData behavior and speed for TSSs

[suggestion] Add outputDir argument to processTSS and determineTSR

[suggestion] bedgraph file for TSSs

[suggestion] bed file of TSRs.

Demo produces unexpected number of slots in @tssCountDataMerged and @tsrDataMerged

Many of the TSR attributes are undocumented.

maintenance

processTSS using large amounts of memory

Allow sample sheet to also be a data.frame for input.

[suggestion] Sample sheet to input experiment information.

Known issue with assignment of chromosome name to tsrData output

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent