Distributed computing by file or by range
bioconductor / genomicfiles Goto Github PK
View Code? Open in Web Editor NEWDistributed computing by file or by range
Home Page: https://bioconductor.org/packages/GenomicFiles
Distributed computing by file or by range
Home Page: https://bioconductor.org/packages/GenomicFiles
Suppose the MAP returns a 2-vector and summarise=TRUE
The resulting assay looks like
assay(rr2)
[,1] [,2] [,3]
ENSG00000111424 Numeric,2 Numeric,2 Numeric,2
ENSG00000172216 Numeric,2 Numeric,2 Numeric,2
ENSG00000124731 Numeric,2 Numeric,2 Numeric,2
which is quite useful ... the [i,j] element is a 2-vector. But it might be more
useful to have a pair of flat matrices in the assay slot. Do you think that would
take a lot of work?
after example(GenomicFiles)
> gf
GenomicFiles object with 3 ranges and 8 files:
files: ERR127306_chr14.bam, ERR127307_chr14.bam, ..., ERR127304_chr14.bam, ERR127305_chr14.bam
detail: use files(), rowRanges(), colData(), ...
> c(gf, gf)
Error in validObject(ans) :
invalid class "GenomicFiles" object: 'length(files(object))' must equal 'nrow(colData(object))'
> sessionInfo()
R Under development (unstable) (2020-03-17 r77988)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] RNAseqData.HNRNPC.bam.chr14_0.25.0 GenomicFiles_1.23.1
[3] rtracklayer_1.47.0 Rsamtools_2.3.5
[5] Biostrings_2.55.6 XVector_0.27.1
[7] SummarizedExperiment_1.17.3 DelayedArray_0.13.7
[9] BiocParallel_1.21.2 matrixStats_0.56.0
[11] Biobase_2.47.3 GenomicRanges_1.39.2
[13] GenomeInfoDb_1.23.13 IRanges_2.21.5
[15] S4Vectors_0.25.13 BiocGenerics_0.33.0
[17] rmarkdown_2.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4 lattice_0.20-40 prettyunits_1.1.1
[4] assertthat_0.2.1 digest_0.6.25 BiocFileCache_1.11.4
[7] R6_2.4.1 RSQLite_2.2.0 evaluate_0.14
[10] httr_1.4.1 pillar_1.4.3 zlibbioc_1.33.1
[13] rlang_0.4.5 GenomicFeatures_1.39.6 progress_1.2.2
[16] curl_4.3 blob_1.2.1 Matrix_1.2-18
[19] startup_0.14.0 stringr_1.4.0 RCurl_1.98-1.1
[22] bit_1.1-15.2 biomaRt_2.43.3 compiler_4.0.0
[25] xfun_0.12 pkgconfig_2.0.3 askpass_1.1
[28] htmltools_0.4.0 tidyselect_1.0.0 openssl_1.4.1
[31] tibble_2.1.3 GenomeInfoDbData_1.2.2 codetools_0.2-16
[34] XML_3.99-0.3 crayon_1.3.4 dplyr_0.8.5
[37] dbplyr_1.4.2 rappdirs_0.3.1 GenomicAlignments_1.23.1
[40] bitops_1.0-6 grid_4.0.0 DBI_1.1.0
[43] magrittr_1.5 stringi_1.4.6 vctrs_0.2.4
[46] tools_4.0.0 bit64_0.9-7 BSgenome_1.55.3
[49] glue_1.3.2 purrr_0.3.3 hms_0.5.3
[52] AnnotationDbi_1.49.1 memoise_1.1.0 knitr_1.28
[55] VariantAnnotation_1.33.0
>
Hello everyone,
I was trying to use reduceByYield(..., init = DF, iterate = TRUE, parallel = TRUE)
, but it didn't seem to pass on the init
argument to the downstream reduce function. Adapting an example from the documentation, I can show the problem as follows. Below is identical to the example:
suppressPackageStartupMessages({
library(Rsamtools)
library(GenomicFiles)
})
fl <- system.file(package="Rsamtools", "extdata", "ex1.bam")
bf <- BamFile(fl, yieldSize=500)
YIELD <- function(X, ...) {
flag = scanBamFlag(isUnmappedQuery=FALSE)
param = ScanBamParam(flag=flag, what="seq")
scanBam(X, param=param, ...)[[1]][['seq']]
}
MAP <- function(value, ...) {
requireNamespace("Biostrings", quietly=TRUE)
Biostrings::alphabetFrequency(value, collapse=TRUE)
}
REDUCE <- `+`
Then, we could want to offset every number by +100 and try to do this through the init
parameter.
init <- alphabetFrequency(DNAStringSet())
init <- setNames(rep(100, ncol(init)), colnames(init))
print(init)
#> A C G T M R W S Y K V H D B N - + .
#> 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
When we do this, the output is identical to the output we'd get if we had not set the init
argument.
outcome <- reduceByYield(bf, YIELD, MAP, REDUCE, parallel=TRUE, init = init)
print(outcome)
#> A C G T M R W S Y K V H D
#> 39904 23195 20477 31681 0 0 0 0 0 0 0 0 0
#> B N - + .
#> 0 29 0 0 0
The following is the outcome I had expected, and is also the outcome when setting parallel = FALSE
.
print(outcome + 100)
#> A C G T M R W S Y K V H D
#> 40004 23295 20577 31781 100 100 100 100 100 100 100 100 100
#> B N - + .
#> 100 129 100 100 100
I think that the line mentioned below doesn't pass on the init
parameter to bpiterate
, but I don't know if this is intended or not.
GenomicFiles/R/reduceByYield.R
Line 16 in f17056c
I had assumed this is a bug because I thought changing the parallel
parameter shouldn't effect the outcome, but it does, so I thought to report it here.
Thanks for reading!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.