Giter VIP home page Giter VIP logo

genomicfiles's Introduction

GenomicFiles

Distributed computing by file or by range

genomicfiles's People

Contributors

dtenenba avatar hpages avatar jwokaty avatar link-ny avatar liubuntu avatar lshep avatar mtmorgan avatar nturaga avatar sonali-bioc avatar sonali8434 avatar vjcitn avatar vobencha avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genomicfiles's Issues

reduceByRanges summarize behavior with vector-returning MAP

Suppose the MAP returns a 2-vector and summarise=TRUE

The resulting assay looks like

assay(rr2)
[,1] [,2] [,3]
ENSG00000111424 Numeric,2 Numeric,2 Numeric,2
ENSG00000172216 Numeric,2 Numeric,2 Numeric,2
ENSG00000124731 Numeric,2 Numeric,2 Numeric,2

which is quite useful ... the [i,j] element is a 2-vector. But it might be more
useful to have a pair of flat matrices in the assay slot. Do you think that would
take a lot of work?

c(gf, gf) fails

after example(GenomicFiles)

> gf
GenomicFiles object with 3 ranges and 8 files: 
files: ERR127306_chr14.bam, ERR127307_chr14.bam, ..., ERR127304_chr14.bam, ERR127305_chr14.bam 
detail: use files(), rowRanges(), colData(), ... 
> c(gf, gf)
Error in validObject(ans) : 
  invalid class "GenomicFiles" object: 'length(files(object))' must equal 'nrow(colData(object))'
> sessionInfo()
R Under development (unstable) (2020-03-17 r77988)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] RNAseqData.HNRNPC.bam.chr14_0.25.0 GenomicFiles_1.23.1               
 [3] rtracklayer_1.47.0                 Rsamtools_2.3.5                   
 [5] Biostrings_2.55.6                  XVector_0.27.1                    
 [7] SummarizedExperiment_1.17.3        DelayedArray_0.13.7               
 [9] BiocParallel_1.21.2                matrixStats_0.56.0                
[11] Biobase_2.47.3                     GenomicRanges_1.39.2              
[13] GenomeInfoDb_1.23.13               IRanges_2.21.5                    
[15] S4Vectors_0.25.13                  BiocGenerics_0.33.0               
[17] rmarkdown_2.1                     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4               lattice_0.20-40          prettyunits_1.1.1       
 [4] assertthat_0.2.1         digest_0.6.25            BiocFileCache_1.11.4    
 [7] R6_2.4.1                 RSQLite_2.2.0            evaluate_0.14           
[10] httr_1.4.1               pillar_1.4.3             zlibbioc_1.33.1         
[13] rlang_0.4.5              GenomicFeatures_1.39.6   progress_1.2.2          
[16] curl_4.3                 blob_1.2.1               Matrix_1.2-18           
[19] startup_0.14.0           stringr_1.4.0            RCurl_1.98-1.1          
[22] bit_1.1-15.2             biomaRt_2.43.3           compiler_4.0.0          
[25] xfun_0.12                pkgconfig_2.0.3          askpass_1.1             
[28] htmltools_0.4.0          tidyselect_1.0.0         openssl_1.4.1           
[31] tibble_2.1.3             GenomeInfoDbData_1.2.2   codetools_0.2-16        
[34] XML_3.99-0.3             crayon_1.3.4             dplyr_0.8.5             
[37] dbplyr_1.4.2             rappdirs_0.3.1           GenomicAlignments_1.23.1
[40] bitops_1.0-6             grid_4.0.0               DBI_1.1.0               
[43] magrittr_1.5             stringi_1.4.6            vctrs_0.2.4             
[46] tools_4.0.0              bit64_0.9-7              BSgenome_1.55.3         
[49] glue_1.3.2               purrr_0.3.3              hms_0.5.3               
[52] AnnotationDbi_1.49.1     memoise_1.1.0            knitr_1.28              
[55] VariantAnnotation_1.33.0
> 

'init' not passed on to `bpiterate` in .reduceByYield_iterate

Hello everyone,

I was trying to use reduceByYield(..., init = DF, iterate = TRUE, parallel = TRUE), but it didn't seem to pass on the init argument to the downstream reduce function. Adapting an example from the documentation, I can show the problem as follows. Below is identical to the example:

suppressPackageStartupMessages({
    library(Rsamtools)
    library(GenomicFiles)
})

fl <- system.file(package="Rsamtools", "extdata", "ex1.bam")
bf <- BamFile(fl, yieldSize=500)

YIELD <- function(X, ...) {
    flag = scanBamFlag(isUnmappedQuery=FALSE)
    param = ScanBamParam(flag=flag, what="seq")
    scanBam(X, param=param, ...)[[1]][['seq']]
}
MAP <- function(value, ...) {
    requireNamespace("Biostrings", quietly=TRUE)
    Biostrings::alphabetFrequency(value, collapse=TRUE)
}
REDUCE <- `+`

Then, we could want to offset every number by +100 and try to do this through the init parameter.

init <- alphabetFrequency(DNAStringSet())
init <- setNames(rep(100, ncol(init)), colnames(init))
print(init)
#>   A   C   G   T   M   R   W   S   Y   K   V   H   D   B   N   -   +   . 
#> 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

When we do this, the output is identical to the output we'd get if we had not set the init argument.

outcome <- reduceByYield(bf, YIELD, MAP, REDUCE, parallel=TRUE, init = init)

print(outcome)
#>     A     C     G     T     M     R     W     S     Y     K     V     H     D 
#> 39904 23195 20477 31681     0     0     0     0     0     0     0     0     0 
#>     B     N     -     +     . 
#>     0    29     0     0     0

The following is the outcome I had expected, and is also the outcome when setting parallel = FALSE.

print(outcome + 100)
#>     A     C     G     T     M     R     W     S     Y     K     V     H     D 
#> 40004 23295 20577 31781   100   100   100   100   100   100   100   100   100 
#>     B     N     -     +     . 
#>   100   129   100   100   100

I think that the line mentioned below doesn't pass on the init parameter to bpiterate, but I don't know if this is intended or not.

result <- bpiterate(ITER, FUN=MAP, REDUCE=REDUCE, ...)

I had assumed this is a bug because I thought changing the parallel parameter shouldn't effect the outcome, but it does, so I thought to report it here.

Thanks for reading!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.