bioconductor / genomicfiles Goto Github PK

View Code? Open in Web Editor NEW

2.0 9.0 6.0 1.02 MB

Distributed computing by file or by range

Home Page: https://bioconductor.org/packages/GenomicFiles

R 100.00%

core-package bioconductor-package

genomicfiles's Introduction

GenomicFiles

Distributed computing by file or by range

genomicfiles's People

Contributors

Stargazers

Watchers

Forkers

liubuntu raoulkam sonali8434 saysayo neurogenomics jakajoko0

genomicfiles's Issues

reduceByRanges summarize behavior with vector-returning MAP

Suppose the MAP returns a 2-vector and summarise=TRUE

The resulting assay looks like

assay(rr2)
[,1] [,2] [,3]
ENSG00000111424 Numeric,2 Numeric,2 Numeric,2
ENSG00000172216 Numeric,2 Numeric,2 Numeric,2
ENSG00000124731 Numeric,2 Numeric,2 Numeric,2

which is quite useful ... the [i,j] element is a 2-vector. But it might be more
useful to have a pair of flat matrices in the assay slot. Do you think that would
take a lot of work?

c(gf, gf) fails

after example(GenomicFiles)

> gf
GenomicFiles object with 3 ranges and 8 files: 
files: ERR127306_chr14.bam, ERR127307_chr14.bam, ..., ERR127304_chr14.bam, ERR127305_chr14.bam 
detail: use files(), rowRanges(), colData(), ... 
> c(gf, gf)
Error in validObject(ans) : 
  invalid class "GenomicFiles" object: 'length(files(object))' must equal 'nrow(colData(object))'
> sessionInfo()
R Under development (unstable) (2020-03-17 r77988)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] RNAseqData.HNRNPC.bam.chr14_0.25.0 GenomicFiles_1.23.1               
 [3] rtracklayer_1.47.0                 Rsamtools_2.3.5                   
 [5] Biostrings_2.55.6                  XVector_0.27.1                    
 [7] SummarizedExperiment_1.17.3        DelayedArray_0.13.7               
 [9] BiocParallel_1.21.2                matrixStats_0.56.0                
[11] Biobase_2.47.3                     GenomicRanges_1.39.2              
[13] GenomeInfoDb_1.23.13               IRanges_2.21.5                    
[15] S4Vectors_0.25.13                  BiocGenerics_0.33.0               
[17] rmarkdown_2.1                     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4               lattice_0.20-40          prettyunits_1.1.1       
 [4] assertthat_0.2.1         digest_0.6.25            BiocFileCache_1.11.4    
 [7] R6_2.4.1                 RSQLite_2.2.0            evaluate_0.14           
[10] httr_1.4.1               pillar_1.4.3             zlibbioc_1.33.1         
[13] rlang_0.4.5              GenomicFeatures_1.39.6   progress_1.2.2          
[16] curl_4.3                 blob_1.2.1               Matrix_1.2-18           
[19] startup_0.14.0           stringr_1.4.0            RCurl_1.98-1.1          
[22] bit_1.1-15.2             biomaRt_2.43.3           compiler_4.0.0          
[25] xfun_0.12                pkgconfig_2.0.3          askpass_1.1             
[28] htmltools_0.4.0          tidyselect_1.0.0         openssl_1.4.1           
[31] tibble_2.1.3             GenomeInfoDbData_1.2.2   codetools_0.2-16        
[34] XML_3.99-0.3             crayon_1.3.4             dplyr_0.8.5             
[37] dbplyr_1.4.2             rappdirs_0.3.1           GenomicAlignments_1.23.1
[40] bitops_1.0-6             grid_4.0.0               DBI_1.1.0               
[43] magrittr_1.5             stringi_1.4.6            vctrs_0.2.4             
[46] tools_4.0.0              bit64_0.9-7              BSgenome_1.55.3         
[49] glue_1.3.2               purrr_0.3.3              hms_0.5.3               
[52] AnnotationDbi_1.49.1     memoise_1.1.0            knitr_1.28              
[55] VariantAnnotation_1.33.0
>

'init' not passed on to `bpiterate` in .reduceByYield_iterate

Hello everyone,

I was trying to use reduceByYield(..., init = DF, iterate = TRUE, parallel = TRUE), but it didn't seem to pass on the init argument to the downstream reduce function. Adapting an example from the documentation, I can show the problem as follows. Below is identical to the example:

suppressPackageStartupMessages({
    library(Rsamtools)
    library(GenomicFiles)
})

fl <- system.file(package="Rsamtools", "extdata", "ex1.bam")
bf <- BamFile(fl, yieldSize=500)

YIELD <- function(X, ...) {
    flag = scanBamFlag(isUnmappedQuery=FALSE)
    param = ScanBamParam(flag=flag, what="seq")
    scanBam(X, param=param, ...)[[1]][['seq']]
}
MAP <- function(value, ...) {
    requireNamespace("Biostrings", quietly=TRUE)
    Biostrings::alphabetFrequency(value, collapse=TRUE)
}
REDUCE <- `+`

Then, we could want to offset every number by +100 and try to do this through the init parameter.

init <- alphabetFrequency(DNAStringSet())
init <- setNames(rep(100, ncol(init)), colnames(init))
print(init)
#>   A   C   G   T   M   R   W   S   Y   K   V   H   D   B   N   -   +   . 
#> 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

When we do this, the output is identical to the output we'd get if we had not set the init argument.

outcome <- reduceByYield(bf, YIELD, MAP, REDUCE, parallel=TRUE, init = init)

print(outcome)
#>     A     C     G     T     M     R     W     S     Y     K     V     H     D 
#> 39904 23195 20477 31681     0     0     0     0     0     0     0     0     0 
#>     B     N     -     +     . 
#>     0    29     0     0     0

The following is the outcome I had expected, and is also the outcome when setting parallel = FALSE.

print(outcome + 100)
#>     A     C     G     T     M     R     W     S     Y     K     V     H     D 
#> 40004 23295 20577 31781   100   100   100   100   100   100   100   100   100 
#>     B     N     -     +     . 
#>   100   129   100   100   100

I think that the line mentioned below doesn't pass on the init parameter to bpiterate, but I don't know if this is intended or not.

GenomicFiles/R/reduceByYield.R

Line 16 in f17056c

result <- bpiterate(ITER, FUN=MAP, REDUCE=REDUCE, ...)

I had assumed this is a bug because I thought changing the parallel parameter shouldn't effect the outcome, but it does, so I thought to report it here.

Thanks for reading!

bioconductor / genomicfiles Goto Github PK

genomicfiles's Introduction

GenomicFiles

genomicfiles's People

Contributors

Stargazers

Watchers

Forkers

genomicfiles's Issues

reduceByRanges summarize behavior with vector-returning MAP

c(gf, gf) fails

'init' not passed on to `bpiterate` in .reduceByYield_iterate

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent