Giter VIP home page Giter VIP logo

recount's Introduction

recount

Lifecycle: stable Bioc release status Bioc devel status Bioc downloads rank Bioc support Bioc history Bioc last commit Bioc dependencies Codecov test coverage R build status GitHub issues GitHub pulls

Explore and download data from the recount project available at the recount2 website. Using the recount package you can download RangedSummarizedExperiment objects at the gene, exon or exon-exon junctions level, the raw counts, the phenotype metadata used, the urls to the sample coverage bigWig files or the mean coverage bigWig file for a particular study. The RangedSummarizedExperiment objects can be used by different packages for performing differential expression analysis. Using derfinder you can perform annotation-agnostic differential expression analyses with the data from the recount project.

Documentation

For more information about recount check the vignettes through Bioconductor or at the documentation website.

Installation instructions

Get the latest stable R release from CRAN. Then install recount from Bioconductor using the following code:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("recount")

Citation

Below is the citation output from using citation('recount') in R. Please run this yourself to check for any updates on how to cite recount.

print(citation("recount"), bibtex = TRUE)
#> To cite package 'recount' in publications use:
#> 
#>   Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD,
#>   Jaffe AE, Langmead B, Leek JT (2017). "Reproducible RNA-seq analysis
#>   using recount2." _Nature Biotechnology_. doi:10.1038/nbt.3838
#>   <https://doi.org/10.1038/nbt.3838>,
#>   <http://www.nature.com/nbt/journal/v35/n4/full/nbt.3838.html>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {Reproducible RNA-seq analysis using recount2},
#>     author = {Leonardo Collado-Torres and Abhinav Nellore and Kai Kammers and Shannon E. Ellis and Margaret A. Taub and Kasper D. Hansen and Andrew E. Jaffe and Ben Langmead and Jeffrey T. Leek},
#>     year = {2017},
#>     journal = {Nature Biotechnology},
#>     doi = {10.1038/nbt.3838},
#>     url = {http://www.nature.com/nbt/journal/v35/n4/full/nbt.3838.html},
#>   }
#> 
#>   Collado-Torres L, Nellore A, Jaffe AE (2017). "recount workflow:
#>   Accessing over 70,000 human RNA-seq samples with Bioconductor
#>   [version 1; referees: 1 approved, 2 approved with reservations]."
#>   _F1000Research_. doi:10.12688/f1000research.12223.1
#>   <https://doi.org/10.12688/f1000research.12223.1>,
#>   <https://f1000research.com/articles/6-1558/v1>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor [version 1; referees: 1 approved, 2 approved with reservations]},
#>     author = {Leonardo Collado-Torres and Abhinav Nellore and Andrew E. Jaffe},
#>     year = {2017},
#>     journal = {F1000Research},
#>     doi = {10.12688/f1000research.12223.1},
#>     url = {https://f1000research.com/articles/6-1558/v1},
#>   }
#> 
#>   Ellis SE, Collado-Torres L, Jaffe AE, Leek JT (2018). "Improving the
#>   value of public RNA-seq expression data by phenotype prediction."
#>   _Nucl. Acids Res._. doi:10.1093/nar/gky102
#>   <https://doi.org/10.1093/nar/gky102>,
#>   <https://doi.org/10.1093/nar/gky102>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {Improving the value of public RNA-seq expression data by phenotype prediction},
#>     author = {Shannon E. Ellis and Leonardo Collado-Torres and Andrew E. Jaffe and Jeffrey T. Leek},
#>     year = {2018},
#>     journal = {Nucl. Acids Res.},
#>     doi = {10.1093/nar/gky102},
#>     url = {https://doi.org/10.1093/nar/gky102},
#>   }
#> 
#>   Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD,
#>   Jaffe AE, Langmead B, Leek JT (2023). _Explore and download data from
#>   the recount project_. doi:10.18129/B9.bioc.recount
#>   <https://doi.org/10.18129/B9.bioc.recount>,
#>   https://github.com/leekgroup/recount - R package version 1.27.0,
#>   <http://www.bioconductor.org/packages/recount>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {Explore and download data from the recount project},
#>     author = {Leonardo Collado-Torres and Abhinav Nellore and Kai Kammers and Shannon E. Ellis and Margaret A. Taub and Kasper D. Hansen and Andrew E. Jaffe and Ben Langmead and Jeffrey T. Leek},
#>     year = {2023},
#>     url = {http://www.bioconductor.org/packages/recount},
#>     note = {https://github.com/leekgroup/recount - R package version 1.27.0},
#>     doi = {10.18129/B9.bioc.recount},
#>   }
#> 
#>   Frazee AC, Langmead B, Leek JT (2011). "ReCount: A multi-experiment
#>   resource of analysis-ready RNA-seq gene count datasets." _BMC
#>   Bioinformatics_. doi:10.1186/1471-2105-12-449
#>   <https://doi.org/10.1186/1471-2105-12-449>,
#>   <https://doi.org/10.1186/1471-2105-12-449>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets},
#>     author = {Alyssa C. Frazee and Ben Langmead and Jeffrey T. Leek},
#>     year = {2011},
#>     journal = {BMC Bioinformatics},
#>     doi = {10.1186/1471-2105-12-449},
#>     url = {https://doi.org/10.1186/1471-2105-12-449},
#>   }
#> 
#>   Razmara A, Ellis SE, Sokolowski DJ, Davis S, Wilson MD, Leek JT,
#>   Jaffe AE, Collado-Torres L (2019). "recount-brain: a curated
#>   repository of human brain RNA-seq datasets metadata." _bioRxiv_.
#>   doi:10.1101/618025 <https://doi.org/10.1101/618025>,
#>   <https://doi.org/10.1101/618025>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {recount-brain: a curated repository of human brain RNA-seq datasets metadata},
#>     author = {Ashkaun Razmara and Shannon E. Ellis and Dustin J. Sokolowski and Sean Davis and Michael D. Wilson and Jeffrey T. Leek and Andrew E. Jaffe and Leonardo Collado-Torres},
#>     year = {2019},
#>     journal = {bioRxiv},
#>     doi = {10.1101/618025},
#>     url = {https://doi.org/10.1101/618025},
#>   }
#> 
#>   Imada E, Sanchez DF, Collado-Torres L, Wilks C, Matam T, Dinalankara
#>   W, Stupnikov A, Lobo-Pereira F, Yip C, Yasuzawa K, Kondo N, Itoh M,
#>   Suzuki H, Kasukawa T, Hon CC, de Hoon MJ, Shin JW, Carninci P, Jaffe
#>   AE, Leek JT, Favorov A, Franco GR, Langmead B, Marchionni L (2020).
#>   "Recounting the FANTOM CAGE–Associated Transcriptome." _Genome
#>   Research_. doi:10.1101/gr.254656.119
#>   <https://doi.org/10.1101/gr.254656.119>,
#>   <https://doi.org/10.1101/gr.254656.119>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {Recounting the FANTOM CAGE–Associated Transcriptome},
#>     author = {Eddie-Luidy Imada and Diego Fernando Sanchez and Leonardo Collado-Torres and Christopher Wilks and Tejasvi Matam and Wikum Dinalankara and Aleksey Stupnikov and Francisco Lobo-Pereira and Chi-Wai Yip and Kayoko Yasuzawa and Naoto Kondo and Masayoshi Itoh and Harukazu Suzuki and Takeya Kasukawa and Chung Chau Hon and Michiel JL {de Hoon} and Jay W Shin and Piero Carninci and Andrew E. Jaffe and Jeffrey T. Leek and Alexander Favorov and Glória R Franco and Ben Langmead and Luigi Marchionni},
#>     year = {2020},
#>     journal = {Genome Research},
#>     doi = {10.1101/gr.254656.119},
#>     url = {https://doi.org/10.1101/gr.254656.119},
#>   }

Please note that the recount was only made possible thanks to many other R and bioinformatics software authors, which are cited either in the vignettes and/or the paper(s) describing this package.

Code of Conduct

Please note that the recount project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Development tools

For more details, check the dev directory.

This package was developed using biocthis.

Teams involved

<script async src="https://www.googletagmanager.com/gtag/js?id=UA-78422749-1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'UA-78422749-1'); </script>

recount's People

Contributors

hpages avatar jwokaty avatar lcolladotor avatar nturaga avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recount's Issues

Remove package installation code

Hi Leonardo, @lcolladotor
Please remove package installation code from recount. Assume that the package is installed.
If said package is in the Suggests fields and it is required for a function, check using requireNamespace, then stop if the package is not available, and ask the user to install it first. Do not install the package for the user.

recount/R/utils.R

Lines 14 to 32 in 8da982b

.load_install <- function(pkg, quietly = TRUE) {
attemptName <- requireNamespace(pkg, quietly = quietly)
if(!attemptName) {
attemptInstall <- tryCatch(BiocManager::install(pkg,
suppressUpdates = quietly),
warning = function(w) 'failed')
if(attemptInstall == 'failed') stop(paste('Failed to install', pkg))
attemptName <- requireNamespace(pkg, quietly = quietly)
}
if(attemptName) {
if(quietly) {
suppressPackageStartupMessages(library(package = pkg,
character.only = TRUE))
} else {
library(package = pkg, character.only = TRUE)
}
}
return(invisible(NULL))
}

Best,
Marcel

Issue downloading the recount package with my version of R

Hi there,

I have read the instructions for downloading the recount R package, and it says first to upgrade R to version 3.3.0. At the CRAN website, I recently upgraded my version of R to 3.3.1; however, it seems that biocLite is unable to download recount under this newer version. Am I doing something wrong?

Here is the error:

biocLite('recount')
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.3 (BiocInstaller 1.22.3), R 3.3.1 (2016-06-21).
Installing package(s) ‘recount’
Warning message:
package ‘recount’ is not available (for R version 3.3.1)

Thanks for your help in advance.

package ‘recount’ is not available for Bioconductor version '3.19'

Hello,

Thanks in advance!

I'm having issues trying to install recount in R (Ubuntu 22.04 host) that I've not experienced before. Below is the terminal output and session info;

> if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
Bioconductor version 3.19 (BiocManager 1.30.22), R 4.4.0 (2024-04-24)

> BiocManager::install("recount")
'getOption("repos")' replaces Bioconductor standard repositories, see
'help("repositories", package = "BiocManager")' for details.
Replacement repositories:
    CRAN: https://cloud.r-project.org
Bioconductor version 3.19 (BiocManager 1.30.22), R 4.4.0 (2024-04-24)
Installing package(s) 'recount'
Warning message:
package ‘recount’ is not available for Bioconductor version '3.19'

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages 

> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

time zone: Australia/Adelaide
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocManager_1.30.22

loaded via a namespace (and not attached):
[1] compiler_4.4.0 tools_4.4.0   

Package recount3 installs without issues as do many other packages however I still require recount.

As can be seen in the error, it has to do with the very recent release of the latest version of R and BiocManager.

I am looking for any ideas about how I can overcome this incompatibility issue?

rse_gene/rse_exon have too many counts by several orders of magnitude?

Hi all,

Not sure if this is a problem of the project I decided to work with (SRP058740), or could be more general. First, let me state that according to the SRA website, I would expect around 30-60 million counts per sample. For instance, https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR2040575 has 73M spots). For this project, I downloaded (both directly from website, and using the recount bioC package, same result) the rse_tx, rse_gene and rse_exon Rdata files, and loaded them in R.

colSums(assays(rse_gene)$counts)["SRR2040575"]/1e6
SRR2040575
13788.46
colSums(assays(rse_exon)$counts)["SRR2040575"]/1e6
SRR2040575
13788.46
colSums(assays(rse_tx)$fragments, na.rm=TRUE)["SRR2040575"]/1e6
SRR2040575
63.74166

Only the last one makes any sense. Am I missing something obvious?

Output of sessionInfo() below.

Thanks,

Cei

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] recount_1.12.1 SummarizedExperiment_1.16.1 DelayedArray_0.12.2 BiocParallel_1.20.1 matrixStats_0.56.0
[6] Biobase_2.46.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.1 IRanges_2.20.2 S4Vectors_0.24.3
[11] BiocGenerics_0.32.0

loaded via a namespace (and not attached):
[1] colorspace_1.4-1 qvalue_2.18.0 htmlTable_1.13.3 XVector_0.26.0 base64enc_0.1-3
[6] rstudioapi_0.11 bit64_0.9-7 AnnotationDbi_1.48.0 xml2_1.2.5 codetools_0.2-16
[11] splines_3.6.2 knitr_1.28 Formula_1.2-3 jsonlite_1.6.1 Rsamtools_2.2.3
[16] cluster_2.1.0 dbplyr_1.4.2 png_0.1-7 rentrez_1.2.2 readr_1.3.1
[21] compiler_3.6.2 httr_1.4.1 backports_1.1.5 assertthat_0.2.1 Matrix_1.2-18
[26] limma_3.42.2 acepack_1.4.1 htmltools_0.4.0 prettyunits_1.1.1 tools_3.6.2
[31] gtable_0.3.0 glue_1.3.2 GenomeInfoDbData_1.2.2 reshape2_1.4.3 dplyr_0.8.5
[36] rappdirs_0.3.1 doRNG_1.8.2 Rcpp_1.0.4 bumphunter_1.28.0 vctrs_0.2.4
[41] Biostrings_2.54.0 rtracklayer_1.46.0 iterators_1.0.12 xfun_0.12 stringr_1.4.0
[46] lifecycle_0.2.0 rngtools_1.5 XML_3.99-0.3 zlibbioc_1.32.0 scales_1.1.0
[51] BSgenome_1.54.0 VariantAnnotation_1.32.0 hms_0.5.3 GEOquery_2.54.1 derfinderHelper_1.20.0
[56] RColorBrewer_1.1-2 curl_4.3 memoise_1.1.0 gridExtra_2.3 ggplot2_3.3.0
[61] downloader_0.4 biomaRt_2.42.1 rpart_4.1-15 latticeExtra_0.6-29 stringi_1.4.6
[66] RSQLite_2.2.0 foreach_1.5.0 checkmate_2.0.0 GenomicFeatures_1.38.2 rlang_0.4.5
[71] pkgconfig_2.0.3 GenomicFiles_1.22.0 bitops_1.0-6 lattice_0.20-40 purrr_0.3.3
[76] GenomicAlignments_1.22.1 htmlwidgets_1.5.1 bit_1.1-15.2 tidyselect_1.0.0 plyr_1.8.6
[81] magrittr_1.5 R6_2.4.1 Hmisc_4.4-0 DBI_1.1.0 pillar_1.4.3
[86] foreign_0.8-76 survival_3.1-11 RCurl_1.98-1.1 nnet_7.3-13 tibble_2.1.3
[91] crayon_1.3.4 derfinder_1.20.0 BiocFileCache_1.10.2 jpeg_0.1-8.1 progress_1.2.2
[96] locfit_1.5-9.4 grid_3.6.2 data.table_1.12.8 blob_1.2.1 digest_0.6.25
[101] tidyr_1.0.2 openssl_1.4.1 munsell_0.5.0 askpass_1.1

expressed_regions() is not working with the new IDIES and AWS bigWig file locations

Hi,

Currently in BioC release (3.16) and devel (3.17), recount is failing. That's because neither the new IDIES location nor AWS are allowing us to read the BigWig files from the web. I manually edited a local clone of recount to try with the IDIES location.

You can test this on AWS (through duffel) with:

regions <- expressed_regions("SRP002001", "chrY", cutoff = 5)

from

regions <- expressed_regions("SRP002001", "chrY", cutoff = 5)
.

This is the type of warning we get:

2023-02-20 12:51:10 loadCoverage: loading BigWig file http://sciserver.org/public-data/recount2/data/SRP002001/bw/mean_SRP002001.bw
In addition: Warning messages:
1: In seqinfo(con) :
  No openssl available in netConnectHttps for sciserver.org : 443
2: In seqinfo(con) :
  No openssl available in netConnectHttps for sciserver.org : 443
3: In seqinfo(con) :
  No openssl available in netConnectHttps for sciserver.org : 443
> traceback()
8: stop(conditionMessage(output))
7: FUN(X[[i]], ...)
6: lapply(as.list(X), match.fun(FUN), ...)
5: lapply(as.list(X), match.fun(FUN), ...)
4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr, 
       verbose = verbose)
3: lapply(bList, .loadCoverageBigWig, range = which, chr = chr, 
       verbose = verbose)
2: derfinder::loadCoverage(files = meanFile, chr = chr, chrlen = chrlen) at expressed_regions.R#121
1: expressed_regions("SRP002001", "chrY", cutoff = 5)
2023-02-20 12:36:04 loadCoverage: loading BigWig file http://duffel.rail.bio/recount/SRP002001/bw/mean_SRP002001.bw
In addition: Warning messages:
1: In seqinfo(con) :
  No openssl available in netConnectHttps for recount-opendata.s3.amazonaws.com : 443
2: In seqinfo(con) :
  No openssl available in netConnectHttps for recount-opendata.s3.amazonaws.com : 443
3: In seqinfo(con) :
  No openssl available in netConnectHttps for recount-opendata.s3.amazonaws.com : 443

I'm not sure what to do @nellore @ChristopherWilks.

I can try to provide a smaller test, digging into .loadCoverageBigWig() https://github.com/lcolladotor/derfinder/blob/5c1cbd412c5787bf2d2d778977e38dd6ae64976d/R/loadCoverage.R#L384 and well, ultimately rtracklayer.

Best,
Leo

BiocFileCache integration

Have you considered integrating with BiocFileCache to locally cache downloads, e.g., from recount::download_study()? I just found myself re-downloading the same study (in the same directory) for the n+1 time (my fault for not running code before thinking).

This would require a bit of work but might save on downloads (and perhaps make for a good student coding project?).

How to get annotation for exon?

Hi so there is an option to download rse-exon, however I don't understand how the annotation works. For example, here is how KRAS which only has 6 exons look like.
edit: specifically I'm intereted in reconstructing from version two if this is even possible?

download_study("DRP001219", type = "rse-exon", outdir="./data/")
load(file.path( "./data/rse_exon.Rdata"))
count = assays(rse_exon)$counts
count[ grepl("ENSG00000133703", row.names(count)), , drop=F]

                   DRR014113
ENSG00000133703.11     78483
ENSG00000133703.11      4095
ENSG00000133703.11      5518
ENSG00000133703.11       388
ENSG00000133703.11      2350
ENSG00000133703.11      5462
ENSG00000133703.11      1972
ENSG00000133703.11      2725
ENSG00000133703.11       235
ENSG00000133703.11      1108
ENSG00000133703.11      2495
ENSG00000133703.11         0
ENSG00000133703.11         0

So how would I know what is for exon 1, 2 or 3? There is an object call recount_exon but that only gives the coordinates.

[BUG] Severe Slowdown due to rowRanges 'symbol' column in RSE datasets

Hello,

I am usually a happy camper with recount2 and very thankful for all the work you all have put into this tool! That being said, I have noticed a bug recently in which using the RangedSummarizedExperiment objects from recount in R causes severe slowdowns to occur. I have tested this on multiple machines now and find this only happens with the RSE objects from recount and not with typical seq datasets that I analyze.

After a little while of digging into the objects, I noticed that the slowdown appears to be caused by the rowRanges of the recount2 objects, specifically the symbol column. Please see my example here:

library(recount)
library(DESeq2)
library(SummarizedExperiment)
rse_gene <- rse_gene_SRP009615

# Timer without ranges
system.time({
  rse <- SummarizedExperiment(
    assays =  assays(rse_gene),
    colData = colData(rse_gene), 
    # rowRanges = rowRanges(rse_gene)
  )
  rse$condition <- gsub(res$title, pattern = ".+ targeting ([a-zA-Z0-9]+) gene.+", replacement = "sh\\1")
  dds <- DESeqDataSet(rse, design = ~condition)
  dds <- DESeq(dds)
})

# Timer with ranges
system.time({
  rse <- SummarizedExperiment(
    assays =  assays(rse_gene),
    colData = colData(rse_gene), 
    rowRanges = rowRanges(rse_gene)
  )
  rse$condition <- gsub(res$title, pattern = ".+ targeting ([a-zA-Z0-9]+) gene.+", replacement = "sh\\1")
  dds <- DESeqDataSet(rse, design = ~condition)
  dds <- DESeq(dds)
})

# Timer with ranges but NULL symbol col
system.time({
  rse <- SummarizedExperiment(
    assays =  assays(rse_gene),
    colData = colData(rse_gene), 
    rowRanges = rowRanges(rse_gene)
  )
  rse$condition <- gsub(res$title, pattern = ".+ targeting ([a-zA-Z0-9]+) gene.+", replacement = "sh\\1")
  rse@rowRanges$symbol <- NULL
  dds <- DESeqDataSet(rse, design = ~condition)
  dds <- DESeq(dds)
})

And here is the output of running this in the console:

> system.time({
+   rse <- SummarizedExperiment(
+     assays =  assays(rse_gene),
+     colData = colData(rse_gene), 
+     # rowRanges = rowRanges(rse_gene)
+   )
+   rse$condition <- gsub(res$title, pattern = ".+ targeting ([a-zA-Z0-9]+) gene.+", replacement = "sh\\1")
+   dds <- DESeqDataSet(rse, design = ~condition)
+   dds <- DESeq(dds)
+ })
converting counts to integer mode
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
   user  system elapsed 
 30.018   0.040  30.070 
Warning message:
In DESeqDataSet(rse, design = ~condition) :
  some variables in design formula are characters, converting to factors
> system.time({
+   rse <- SummarizedExperiment(
+     assays =  assays(rse_gene),
+     colData = colData(rse_gene), 
+     rowRanges = rowRanges(rse_gene)
+   )
+   rse$condition <- gsub(res$title, pattern = ".+ targeting ([a-zA-Z0-9]+) gene.+", replacement = "sh\\1")
+   dds <- DESeqDataSet(rse, design = ~condition)
+   dds <- DESeq(dds)
+ })
converting counts to integer mode
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
   user  system elapsed 
745.284   0.810 746.438 
Warning message:
In DESeqDataSet(rse, design = ~condition) :
  some variables in design formula are characters, converting to factors
> # Timer with ranges but NULL symbol col
> system.time({
+   rse <- SummarizedExperiment(
+     assays =  assays(rse_gene),
+     colData = colData(rse_gene), 
+     rowRanges = rowRanges(rse_gene)
+   )
+   rse$condition <- gsub(res$title, pattern = ".+ targeting ([a-zA-Z0-9]+) gene.+", replacement = "sh\\1")
+   rse@rowRanges$symbol <- NULL
+   dds <- DESeqDataSet(rse, design = ~condition)
+   dds <- DESeq(dds)
+ })
converting counts to integer mode
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
   user  system elapsed 
 29.840   0.028  29.890 
Warning message:
In DESeqDataSet(rse, design = ~condition) :
  some variables in design formula are characters, converting to factors

As you can see, with the rowRanges included, the code took ~25x longer to finish running. However, simply NULLing the symbol column of the rowRanges was able to prevent this slowdown. I think the issue is that the symbol column was a CharacterList which may perform inefficiently for these application. Anyways, hopefully this helps -- thanks again for all the work you and your team does!!

Session info:

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] recount_1.18.0              forcats_0.5.1               stringr_1.4.0               dplyr_1.0.7                
 [5] purrr_0.3.4                 readr_1.4.0                 tidyr_1.1.3                 tibble_3.1.2               
 [9] ggplot2_3.3.5               tidyverse_1.3.1             DESeq2_1.32.0               SummarizedExperiment_1.22.0
[13] MatrixGenerics_1.4.0        matrixStats_0.59.0          EnsDb.Hsapiens.v86_2.99.0   ensembldb_2.16.2           
[17] AnnotationFilter_1.16.0     GenomicFeatures_1.44.0      AnnotationDbi_1.54.1        Biobase_2.52.0             
[21] GenomicRanges_1.44.0        GenomeInfoDb_1.28.1         IRanges_2.26.0              S4Vectors_0.30.0           
[25] BiocGenerics_0.38.0         tximport_1.20.0            

loaded via a namespace (and not attached):
  [1] readxl_1.3.1             backports_1.2.1          Hmisc_4.5-0              BiocFileCache_2.0.0     
  [5] plyr_1.8.6               lazyeval_0.2.2           splines_4.1.0            BiocParallel_1.26.1     
  [9] digest_0.6.27            foreach_1.5.1            htmltools_0.5.1.1        fansi_0.5.0             
 [13] magrittr_2.0.1           checkmate_2.0.0          memoise_2.0.0            BSgenome_1.60.0         
 [17] cluster_2.1.2            limma_3.48.1             Biostrings_2.60.1        annotate_1.70.0         
 [21] modelr_0.1.8             prettyunits_1.1.1        jpeg_0.1-8.1             colorspace_2.0-2        
 [25] blob_1.2.1               rvest_1.0.0              rappdirs_0.3.3           haven_2.4.1             
 [29] xfun_0.24                crayon_1.4.1             RCurl_1.98-1.3           jsonlite_1.7.2          
 [33] genefilter_1.74.0        GEOquery_2.60.0          iterators_1.0.13         survival_3.2-11         
 [37] VariantAnnotation_1.38.0 glue_1.4.2               gtable_0.3.0             zlibbioc_1.38.0         
 [41] XVector_0.32.0           DelayedArray_0.18.0      rentrez_1.2.3            scales_1.1.1            
 [45] rngtools_1.5             DBI_1.1.1                derfinderHelper_1.26.0   derfinder_1.26.0        
 [49] Rcpp_1.0.7               xtable_1.8-4             progress_1.2.2           htmlTable_2.2.1         
 [53] bumphunter_1.34.0        foreign_0.8-81           bit_4.0.4                Formula_1.2-4           
 [57] htmlwidgets_1.5.3        httr_1.4.2               RColorBrewer_1.1-2       ellipsis_0.3.2          
 [61] pkgconfig_2.0.3          XML_3.99-0.6             farver_2.1.0             nnet_7.3-16             
 [65] dbplyr_2.1.1             locfit_1.5-9.4           utf8_1.2.1               reshape2_1.4.4          
 [69] tidyselect_1.1.1         labeling_0.4.2           rlang_0.4.11             munsell_0.5.0           
 [73] cellranger_1.1.0         tools_4.1.0              cachem_1.0.5             downloader_0.4          
 [77] cli_3.0.0                generics_0.1.0           RSQLite_2.2.7            broom_0.7.8             
 [81] fastmap_1.1.0            yaml_2.2.1               knitr_1.33               bit64_4.0.5             
 [85] fs_1.5.0                 KEGGREST_1.32.0          doRNG_1.8.2              xml2_1.3.2              
 [89] biomaRt_2.48.2           compiler_4.1.0           rstudioapi_0.13          filelock_1.0.2          
 [93] curl_4.3.2               png_0.1-7                reprex_2.0.0             geneplotter_1.70.0      
 [97] stringi_1.7.3            GenomicFiles_1.28.0      lattice_0.20-44          ProtGenerics_1.24.0     
[101] Matrix_1.3-4             vctrs_0.3.8              pillar_1.6.1             lifecycle_1.0.0         
[105] data.table_1.14.0        bitops_1.0-7             qvalue_2.24.0            rtracklayer_1.52.0      
[109] R6_2.5.0                 BiocIO_1.2.0             latticeExtra_0.6-29      gridExtra_2.3           
[113] codetools_0.2-18         assertthat_0.2.1         rjson_0.2.20             withr_2.4.2             
[117] GenomicAlignments_1.28.0 Rsamtools_2.8.0          GenomeInfoDbData_1.2.6   hms_1.1.0               
[121] grid_4.1.0               rpart_4.1-15             lubridate_1.7.10         base64enc_0.1-3         
[125] restfulr_0.0.13         

column names of colData of RangedSummarizedExperiment inconsistent?

Hi, I noticed that the names of the columns in the row data table for a RangedSummarizedExperiment object seem to be inconsistent with the data in the columns of the table, unless I am misunderstanding something (I am an R novice).

I download a RangedSummarizedExperiment as follows:

url <- download_study('SRP009615')
load(file.path('SRP009615', 'rse_gene.Rdata'))
rowData(rse_gene)

I get a DataFrame with 21 columns. The order of the names of the columns does not seem to coincide with the data in that column. For example, the first column name is "project"; however, the first column seems to contain the run accession. Is this a bug? Or is there another way I am supposed to find the name of each column?

Thanks!

Download rse for a tissue from GTEx/TCGA

Previously @jtleek has requested a function for loading just the RSE files for a given tissue from GTEx or TCGA. At least two people mentioned something similar at BioC2019.

This is a reminder for myself to implement this functionality :P

Plot base level coverage

I've received feedback from users interested in being able to plot the base-level coverage for a set of samples in recount2.

One way of doing this is via derfinderPlot::plotRegionCoverage(). I could write a function where the user specifies:

  • (1) a list of sample ids (run for SRA and GTEx, gdc_file_id for TCGA),
  • (2) a GRanges with the regions of interest (which would have to be short -- or the user would be warned) and
  • (3) colors and labels for each sample (ideally representing different groups).

Then internally run derfinder::loadCoverage() accessing the data from either local bigwigs (if at JHPCE, SciServer or if the user downloaded them) or via the web. Something that could be a bit of a bottle neck is the annotation needed for derfinderPlot::plotRegionCoverage(). That would normally involve creating a genomic state object from Gencode v25 hg38 which takes time even for one chromosome. I could pre-make them split by chr and have the function download the files from the web behind the scenes. In any case, right now this approach would have to deal with
rafalab/bumphunter#15 since derfinderPlot::plotRegionCoverage() requires the output of bumphunter::matchGenes().

Another option would be to open the data in the UCSC genome browser or something like that.

A third option could be to use epivizr http://bioconductor.org/packages/release/bioc/vignettes/epivizr/inst/doc/IntroToEpivizr.html. Although I'm not sure that it can read bigwig files from the web.

Thoughts @andrewejaffe @jtleek ?

Right now, users can get the list of URLs from recount::recount_url and load them into IGV or similar programs.

NA's in rse_tx_brain

Hi,
I downloaded the rse_tx_brain from https://jhubiostatistics.shinyapps.io/recount/ for both TCGA and GTEX. I only renamed them as "rse_tx_brain_TCGA.RData" and "rse_tx_brain_GTEX.RData" ( otherwise both objects would be called "rse_tx_brain.RData"

When I load both the objects, I notice a lot of NA's for a lot of transcripts - I was wondering why that is ? The NA's most likely make getRPKM() fail too.

> rm(list=ls())
> library(SummarizedExperiment)
> setwd("~/HollandLabShared/Sonali/recount_brain")
> gtex = get(load("rse_tx_brain_GTEX.Rdata"))
> tcga = get(load("rse_tx_brain_TCGA.RData"))
> 
> tcga_rpkm = getRPKM(tcga)
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) : 
  'x' must be an array of at least two dimensions
> gtex_rpkm = getRPKM(gtex)
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) : 
  'x' must be an array of at least two dimensions

I was under the impression that maybe the same transcript is NA for all samples - but that is not the case.

> length(which(is.na(assay(gtex[,1]))))
[1] 522
> length(which(is.na(assay(gtex[,151]))))
[1] 436
> 
> length(which(is.na(assay(gtex[,20]))))
[1] 892
> length(which(is.na(assay(gtex[,251]))))
[1] 436

> summary(as.numeric(assay(tcga[grep("ENST00000304053.10", rownames(tcga)), ])))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      0       0       0       0       0       0     543 

Kindly advise.
Thanks,
Sonali.

TCGA metadata issue in column age_at_initial_pathologic_diagnosis

Hi,

@ShanEllis found a bug in the TCGA metadata. Basically, a column has a mixture of data. She also found that by re-running https://github.com/leekgroup/recount-website/blob/master/metadata/tcga_prep/tcga_clinical.R the problematic column gets fixed. It looks like cgc_case_age_at_diagnosis has the data that is weird in age_at_initial_pathologic_diagnosis.

For now, this will be a known issue while I update the TCGA files in https://github.com/leekgroup/recount-website.

Best,
Leo

Unevaluated code

library('recount')
library('devtools')

md <- all_metadata('TCGA')
table(md$xml_age_at_initial_pathologic_diagnosis)
md <- recount::all_metadata('TCGA')
weird <- which(md$xml_age_at_initial_pathologic_diagnosis %in% c('Trigone', 'Wall Anterior', 'Wall Lateral', 'Wall NOS', 'Wall Posterior'))
md[weird, colnames(md)[grep('age', colnames(md))]]

## Reproducibility information
print('Reproducibility information:')
Sys.time()
proc.time()
options(width = 120)
session_info()

Evaluated code

> library('recount')
> library('devtools')
> 
> md <- all_metadata('TCGA')
2017-02-24 13:28:09 downloading the metadata to /var/folders/cx/n9s558kx6fb7jf5z_pgszgb80000gn/T//RtmpSiDQCg/metadata_clean_tcga.Rdata
trying URL 'https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_tcga.Rdata?raw=true'
Content type 'application/octet-stream' length 16351229 bytes (15.6 MB)
==================================================
downloaded 15.6 MB

> table(md$xml_age_at_initial_pathologic_diagnosis)

             0             14             15             16             17             18             19             20             21             22 
            25              1              4              2              4              8             11             17             14             11 
            23             24             25             26             27             28             29             30             31             32 
            22             29             22             28             25             33             40             50             46             46 
            33             34             35             36             37             38             39             40             41             42 
            53             72             67             62             66             89             81            102            100            107 
            43             44             45             46             47             48             49             50             51             52 
           124             99            147            138            159            176            158            165            235            184 
            53             54             55             56             57             58             59             60             61             62 
           219            227            226            240            261            275            274            302            292            296 
            63             64             65             66             67             68             69             70             71             72 
           290            265            276            273            258            270            262            247            231            208 
            73             74             75             76             77             78             79             80             81             82 
           230            229            202            163            161            140            144            114             95             81 
            83             84             85             86             87             88             89             90        Trigone  Wall Anterior 
            64             74             51             28             35             24             10             49              1              6 
  Wall Lateral       Wall NOS Wall Posterior 
            11              1              9 
> md <- recount::all_metadata('TCGA')
2017-02-24 13:28:19 downloading the metadata to /var/folders/cx/n9s558kx6fb7jf5z_pgszgb80000gn/T//RtmpSiDQCg/metadata_clean_tcga.Rdata
trying URL 'https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_tcga.Rdata?raw=true'
Content type 'application/octet-stream' length 16351229 bytes (15.6 MB)
==================================================
downloaded 15.6 MB

> weird <- which(md$xml_age_at_initial_pathologic_diagnosis %in% c('Trigone', 'Wall Anterior', 'Wall Lateral', 'Wall NOS', 'Wall Posterior'))
> md[weird, colnames(md)[grep('age', colnames(md))]]
DataFrame with 28 rows and 39 columns
    gdc_cases.diagnoses.tumor_stage gdc_cases.diagnoses.age_at_diagnosis cgc_case_age_at_diagnosis cgc_case_clinical_stage cgc_case_pathologic_stage
                        <character>                            <numeric>                 <integer>             <character>               <character>
1                          stage ii                                25672                        70                      NA                  Stage II
2                          stage iv                                23236                        63                      NA                  Stage IV
3                          stage iv                                26893                        73                      NA                  Stage IV
4                          stage iv                                26874                        73                      NA                  Stage IV
5                          stage ii                                28204                        77                      NA                  Stage II
...                             ...                                  ...                       ...                     ...                       ...
24                         stage iv                                27963                        76                      NA                  Stage IV
25                        stage iii                                25185                        68                      NA                 Stage III
26                        stage iii                                28328                        77                      NA                 Stage III
27                         stage iv                                27816                        76                      NA                  Stage IV
28                         stage iv                                21196                        58                      NA                  Stage IV
    xml_primary_pathology_age_at_initial_pathologic_diagnosis xml_age_at_initial_pathologic_diagnosis xml_stage_event_system_version
                                                    <integer>                             <character>                    <character>
1                                                          NA                            Wall Lateral                            7th
2                                                          NA                            Wall Lateral                            7th
3                                                          NA                            Wall Lateral                            7th
4                                                          NA                            Wall Lateral                            7th
5                                                          NA                            Wall Lateral                            7th
...                                                       ...                                     ...                            ...
24                                                         NA                          Wall Posterior                            7th
25                                                         NA                            Wall Lateral                            7th
26                                                         NA                          Wall Posterior                            7th
27                                                         NA                           Wall Anterior                            7th
28                                                         NA                           Wall Anterior                            7th
    xml_stage_event_clinical_stage xml_stage_event_pathologic_stage xml_stage_event_tnm_categories xml_stage_event_psa xml_stage_event_gleason_grading
                       <character>                      <character>                    <character>         <character>                       <integer>
1                               NA                         Stage II                        T2aN0MX                  NA                              NA
2                               NA                         Stage IV                       T2T3N2MX                  NA                              NA
3                               NA                         Stage IV                      T2T3aN2MX                  NA                               6
4                               NA                         Stage IV                      T2T4bN1MX                  NA                               7
5                               NA                         Stage II                        T2aN0MX                  NA                              NA
...                            ...                              ...                            ...                 ...                             ...
24                              NA                         Stage IV                        T3bN2M0                  NA                               6
25                              NA                        Stage III                      T2T3bN0MX                  NA                              NA
26                              NA                        Stage III                      T1T3aN0MX                  NA                               7
27                              NA                         Stage IV                        T3bN3M1                  NA                               6
28                              NA                         Stage IV                        T3bN2MX                  NA                               6
    xml_stage_event_ann_arbor xml_stage_event_serum_markers xml_stage_event_igcccg_stage xml_stage_event_masaoka_stage xml_asbestos_exposure_age
                  <character>                   <character>                  <character>                   <character>                 <integer>
1                          NA                            NA                           NA                            NA                        NA
2                          NA                            NA                           NA                            NA                        NA
3                          NA                            NA                           NA                            NA                        NA
4                          NA                            NA                           NA                            NA                        NA
5                          NA                            NA                           NA                            NA                        NA
...                       ...                           ...                          ...                           ...                       ...
24                         NA                            NA                           NA                            NA                        NA
25                         NA                            NA                           NA                            NA                        NA
26                         NA                            NA                           NA                            NA                        NA
27                         NA                            NA                           NA                            NA                        NA
28                         NA                            NA                           NA                            NA                        NA
    xml_asbestos_exposure_age_last xml_birth_control_pill_history_usage_category xml_age_began_smoking_in_years xml_axillary_lymph_node_stage_method_type
                         <integer>                                   <character>                      <integer>                               <character>
1                               NA                                            NA                             12                                        NA
2                               NA                                            NA                             NA                                        NA
3                               NA                                            NA                             NA                                        NA
4                               NA                                            NA                             NA                                        NA
5                               NA                                            NA                             18                                        NA
...                            ...                                           ...                            ...                                       ...
24                              NA                                            NA                             15                                        NA
25                              NA                                            NA                             25                                        NA
26                              NA                                            NA                             NA                                        NA
27                              NA                                            NA                             NA                                        NA
28                              NA                                            NA                             NA                                        NA
    xml_axillary_lymph_node_stage_other_method_descriptive_text xml_er_level_cell_percentage_category xml_history_of_esophageal_cancer
                                                    <character>                           <character>                      <character>
1                                                            NA                                    NA                               NA
2                                                            NA                                    NA                               NA
3                                                            NA                                    NA                               NA
4                                                            NA                                    NA                               NA
5                                                            NA                                    NA                               NA
...                                                         ...                                   ...                              ...
24                                                           NA                                    NA                               NA
25                                                           NA                                    NA                               NA
26                                                           NA                                    NA                               NA
27                                                           NA                                    NA                               NA
28                                                           NA                                    NA                               NA
    xml_primary_pathology_esophageal_tumor_cental_location xml_primary_pathology_esophageal_tumor_involvement_sites
                                               <character>                                              <character>
1                                                       NA                                                       NA
2                                                       NA                                                       NA
3                                                       NA                                                       NA
4                                                       NA                                                       NA
5                                                       NA                                                       NA
...                                                    ...                                                      ...
24                                                      NA                                                       NA
25                                                      NA                                                       NA
26                                                      NA                                                       NA
27                                                      NA                                                       NA
28                                                      NA                                                       NA
    xml_primary_pathology_tumor_infiltrating_macrophages xml_cumulative_agent_total_dose xml_hydroxyurea_agent_administered_day_count
                                             <character>                       <integer>                                    <integer>
1                                                     NA                              NA                                           NA
2                                                     NA                              NA                                           NA
3                                                     NA                              NA                                           NA
4                                                     NA                              NA                                           NA
5                                                     NA                              NA                                           NA
...                                                  ...                             ...                                          ...
24                                                    NA                              NA                                           NA
25                                                    NA                              NA                                           NA
26                                                    NA                              NA                                           NA
27                                                    NA                              NA                                           NA
28                                                    NA                              NA                                           NA
    xml_person_history_nonmedical_leukemia_causing_agent_type xml_lab_procedure_blast_cell_outcome_percentage_value
                                                  <character>                                             <integer>
1                                                          NA                                                    NA
2                                                          NA                                                    NA
3                                                          NA                                                    NA
4                                                          NA                                                    NA
5                                                          NA                                                    NA
...                                                       ...                                                   ...
24                                                         NA                                                    NA
25                                                         NA                                                    NA
26                                                         NA                                                    NA
27                                                         NA                                                    NA
28                                                         NA                                                    NA
    xml_prior_tamoxifen_administered_usage_category xml_radiosensitizing_agent_administered_indicator
                                        <character>                                       <character>
1                                                NA                                                NA
2                                                NA                                                NA
3                                                NA                                                NA
4                                                NA                                                NA
5                                                NA                                                NA
...                                             ...                                               ...
24                                               NA                                                NA
25                                               NA                                                NA
26                                               NA                                                NA
27                                               NA                                                NA
28                                               NA                                                NA
    xml_person_concomitant_prostate_carcinoma_pathologic_t_stage xml_first_diagnosis_age_asth_ecz_hay_fev_mold_dust xml_first_diagnosis_age_of_food_allergy
                                                     <character>                                        <character>                             <character>
1                                                             NA                                                 NA                                      NA
2                                                             NA                                                 NA                                      NA
3                                                             NA                                                 NA                                      NA
4                                                             NA                                                 NA                                      NA
5                                                             NA                                                 NA                                      NA
...                                                          ...                                                ...                                     ...
24                                                            NA                                                 NA                                      NA
25                                                            NA                                                 NA                                      NA
26                                                            NA                                                 NA                                      NA
27                                           7thStage IVT3bN3M16                                                 NA                                      NA
28                                           7thStage IVT3bN2MX6                                                 NA                                      NA
    xml_first_diagnosis_age_of_animal_insect_allergy xml_undescended_testis_corrected_age
                                         <character>                          <character>
1                                                 NA                                   NA
2                                                 NA                                   NA
3                                                 NA                                   NA
4                                                 NA                                   NA
5                                                 NA                                   NA
...                                              ...                                  ...
24                                                NA                                   NA
25                                                NA                                   NA
26                                                NA                                   NA
27                                                NA                                   NA
28                                                NA                                   NA
> 
> ## Reproducibility information
> print('Reproducibility information:')
[1] "Reproducibility information:"
> Sys.time()
[1] "2017-02-24 13:28:27 EST"
> proc.time()
   user  system elapsed 
 20.786   1.699  28.800 
> options(width = 120)
> session_info()
Session info -----------------------------------------------------------------------------------------------------------
 setting  value                                             
 version  R Under development (unstable) (2016-10-26 r71594)
 system   x86_64, darwin13.4.0                              
 ui       AQUA                                              
 language (EN)                                              
 collate  en_US.UTF-8                                       
 tz       America/New_York                                  
 date     2017-02-24                                        

Packages ---------------------------------------------------------------------------------------------------------------
 package              * version  date       source                            
 acepack                1.4.1    2016-10-29 CRAN (R 3.4.0)                    
 AnnotationDbi          1.37.3   2017-02-09 Bioconductor                      
 assertthat             0.1      2013-12-06 CRAN (R 3.4.0)                    
 backports              1.0.5    2017-01-18 CRAN (R 3.4.0)                    
 base64enc              0.1-3    2015-07-28 CRAN (R 3.4.0)                    
 Biobase              * 2.35.1   2017-02-23 Bioconductor                      
 BiocGenerics         * 0.21.3   2017-01-12 Bioconductor                      
 BiocParallel           1.9.5    2017-01-24 Bioconductor                      
 biomaRt                2.31.4   2017-01-13 Bioconductor                      
 Biostrings             2.43.4   2017-02-02 Bioconductor                      
 bitops                 1.0-6    2013-08-17 CRAN (R 3.4.0)                    
 BSgenome               1.43.5   2017-02-02 Bioconductor                      
 bumphunter             1.15.0   2016-10-23 Bioconductor                      
 checkmate              1.8.2    2016-11-02 CRAN (R 3.4.0)                    
 cluster                2.0.5    2016-10-08 CRAN (R 3.4.0)                    
 codetools              0.2-15   2016-10-05 CRAN (R 3.4.0)                    
 colorspace             1.3-2    2016-12-14 CRAN (R 3.4.0)                    
 data.table             1.10.4   2017-02-01 CRAN (R 3.4.0)                    
 DBI                    0.5-1    2016-09-10 CRAN (R 3.4.0)                    
 DelayedArray         * 0.1.7    2017-02-17 Bioconductor                      
 derfinder              1.9.6    2017-01-13 Bioconductor                      
 derfinderHelper        1.9.3    2016-11-29 Bioconductor                      
 devtools             * 1.12.0   2016-12-05 CRAN (R 3.4.0)                    
 digest                 0.6.12   2017-01-27 CRAN (R 3.4.0)                    
 doRNG                  1.6      2014-03-07 CRAN (R 3.4.0)                    
 downloader             0.4      2015-07-09 CRAN (R 3.4.0)                    
 foreach                1.4.3    2015-10-13 CRAN (R 3.4.0)                    
 foreign                0.8-67   2016-09-13 CRAN (R 3.4.0)                    
 Formula                1.2-1    2015-04-07 CRAN (R 3.4.0)                    
 GenomeInfoDb         * 1.11.9   2017-02-08 Bioconductor                      
 GenomeInfoDbData       0.99.0   2017-02-14 Bioconductor                      
 GenomicAlignments      1.11.9   2017-02-02 Bioconductor                      
 GenomicFeatures        1.27.8   2017-02-11 Bioconductor                      
 GenomicFiles           1.11.3   2016-11-29 Bioconductor                      
 GenomicRanges        * 1.27.22  2017-02-02 Bioconductor                      
 GEOquery               2.41.0   2016-10-25 Bioconductor                      
 ggplot2                2.2.1    2016-12-30 CRAN (R 3.4.0)                    
 gridExtra              2.2.1    2016-02-29 CRAN (R 3.4.0)                    
 gtable                 0.2.0    2016-02-26 CRAN (R 3.4.0)                    
 Hmisc                  4.0-2    2016-12-31 CRAN (R 3.4.0)                    
 htmlTable              1.9      2017-01-26 CRAN (R 3.4.0)                    
 htmltools              0.3.5    2016-03-21 CRAN (R 3.4.0)                    
 htmlwidgets            0.8      2016-11-09 CRAN (R 3.4.0)                    
 httr                   1.2.1    2016-07-03 CRAN (R 3.4.0)                    
 IRanges              * 2.9.18   2017-02-02 Bioconductor                      
 iterators              1.0.8    2015-10-13 CRAN (R 3.4.0)                    
 jsonlite               1.2      2016-12-31 CRAN (R 3.4.0)                    
 knitr                  1.15.1   2016-11-22 CRAN (R 3.4.0)                    
 lattice                0.20-34  2016-09-06 CRAN (R 3.4.0)                    
 latticeExtra           0.6-28   2016-02-09 CRAN (R 3.4.0)                    
 lazyeval               0.2.0    2016-06-12 CRAN (R 3.4.0)                    
 locfit                 1.5-9.1  2013-04-20 CRAN (R 3.4.0)                    
 magrittr               1.5      2014-11-22 CRAN (R 3.4.0)                    
 Matrix                 1.2-8    2017-01-20 CRAN (R 3.4.0)                    
 matrixStats          * 0.51.0   2016-10-09 CRAN (R 3.4.0)                    
 memoise                1.0.0    2016-01-29 CRAN (R 3.4.0)                    
 munsell                0.4.3    2016-02-13 CRAN (R 3.4.0)                    
 nnet                   7.3-12   2016-02-02 CRAN (R 3.4.0)                    
 pkgmaker               0.22     2014-05-14 CRAN (R 3.4.0)                    
 plyr                   1.8.4    2016-06-08 CRAN (R 3.4.0)                    
 qvalue                 2.7.0    2016-10-23 Bioconductor                      
 R6                     2.2.0    2016-10-05 CRAN (R 3.4.0)                    
 RColorBrewer           1.1-2    2014-12-07 CRAN (R 3.4.0)                    
 Rcpp                   0.12.9   2017-01-14 CRAN (R 3.4.0)                    
 RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.4.0)                    
 recount              * 1.1.18   2017-02-22 Github (leekgroup/recount@ced5db4)
 registry               0.3      2015-07-08 CRAN (R 3.4.0)                    
 rentrez                1.0.4    2016-10-26 CRAN (R 3.4.0)                    
 reshape2               1.4.2    2016-10-22 CRAN (R 3.4.0)                    
 rngtools               1.2.4    2014-03-06 CRAN (R 3.4.0)                    
 rpart                  4.1-10   2015-06-29 CRAN (R 3.4.0)                    
 Rsamtools              1.27.12  2017-01-24 Bioconductor                      
 RSQLite                1.1-2    2017-01-08 CRAN (R 3.4.0)                    
 rtracklayer            1.35.6   2017-02-19 cran (@1.35.6)                    
 S4Vectors            * 0.13.15  2017-02-14 cran (@0.13.15)                   
 scales                 0.4.1    2016-11-09 CRAN (R 3.4.0)                    
 stringi                1.1.2    2016-10-01 CRAN (R 3.4.0)                    
 stringr                1.2.0    2017-02-18 CRAN (R 3.4.0)                    
 SummarizedExperiment * 1.5.7    2017-02-23 Bioconductor                      
 survival               2.40-1   2016-10-30 CRAN (R 3.4.0)                    
 tibble                 1.2      2016-08-26 CRAN (R 3.4.0)                    
 VariantAnnotation      1.21.17  2017-02-12 Bioconductor                      
 withr                  1.0.2    2016-06-20 CRAN (R 3.4.0)                    
 XML                    3.98-1.5 2016-11-10 CRAN (R 3.4.0)                    
 xtable                 1.8-2    2016-02-05 CRAN (R 3.4.0)                    
 XVector                0.15.2   2017-02-02 Bioconductor                      
 zlibbioc               1.21.0   2016-10-23 Bioconductor  

POU5F1 and genes on more than one chromosome are currently not in recount

I'm migrating the discussion on this topic to the issues page.


Original issue with POU5F1

About a week ago https://gist.github.com/ronstewart reported that POU5F1 is missing from recount in an email to me. That lead to my reply involving the following gist https://gist.github.com/lcolladotor/374a0de6be5c202bbf216295989e534a:

I looked into the issue you reported and indeed POU5F1 is not part of
recount. It's because it's not part of
TxDb.Hsapiens.UCSC.hg38.knownGene which is the annotation we used.
However, you can still use the recount package to compute the coverage
matrix at the exon level (the gene level is just the sum of the exon
level) by modifying the code at
http://bioconductor.org/packages/release/bioc/vignettes/recount/inst/doc/recount-quickstart.html#using-anothernewer-annotation.

Here are the actual details of what I did
https://gist.github.com/lcolladotor/374a0de6be5c202bbf216295989e534a.

Actual issue

Ron replied with a comment on the gist where he is concerned that other genes might be missing in recount.

In recount::reproduce_ranges() we use GenomicFeatures::genes() which by default has the argument has the argument single.strand.genes.only set to FALSE. As seen in the documentation, this argument also drops genes that are on two different chromosomes.

single.strand.genes.only	
TRUE or FALSE. If TRUE (the default), then genes that have exons located on both strands of the same chromosome or on two different chromosomes are dropped. In that case, the genes are returned in a GRanges object. Otherwise, all genes are returned in a GRangesList object with the columns specified thru the columns argument set as top level metadata columns. (Please keep in mind that the top level metadata columns of a GRangesList object are not displayed by the show method.)

This explains why POU5F1 is missing as shown below because it is present in chr6 (a canonical chr) and a few alternative chromosomes.

> library('GenomicFeatures')
> library('TxDb.Hsapiens.UCSC.hg38.knownGene')
> g <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene, single.strand.genes.only = FALSE)
> g['5460']
GRangesList object of length 1:
$5460 
GRanges object with 7 ranges and 0 metadata columns:
                 seqnames               ranges strand
                    <Rle>            <IRanges>  <Rle>
  [1]                chr6 [31164337, 31180731]      -
  [2] chr6_GL000251v2_alt [ 2646761,  2653135]      -
  [3] chr6_GL000252v2_alt [ 2423653,  2430027]      -
  [4] chr6_GL000253v2_alt [ 2474838,  2481213]      -
  [5] chr6_GL000254v2_alt [ 2508452,  2514827]      -
  [6] chr6_GL000255v2_alt [ 2422370,  2428743]      -
  [7] chr6_GL000256v2_alt [ 2467753,  2474125]      -

-------
seqinfo: 455 sequences (1 circular) from hg38 genome
> 
> 
> 
> g2 <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene, single.strand.genes.only = TRUE)
> g2['5460']
Error: subscript contains invalid names
> 

Status

For now, we are discussing internally what we'll do. There are several options and in any case it's possible to compute the coverage matrix for genes like POU5F1 that are present in a canonical chromosome and several non-canonical ones; these are genes researches like Ron Stewart might be interested in.

Best,
Leo

Transcript level matrix for thousands of experiments

See original at #4.


@amadeusX posted this message:

Hi,
Congratulations to the wonderful recount package and the huge dataset you compiled!
We would like to use the normalized (or, with a lot more effort, we can normalize) gene expression compendium. Say, the rows are the genes and columns are experiments . Hence the (i,j) element of the matrix is the transcript level of gene i in experiment j. We would need that for the identification of generally co-expressed pairs of genes, and for the negative set, independently expressed gene pairs.

Thank you so much and Happy Holidays,
Steve
Istvan Ladunga,
University of Nebraska-Lincoln

inconsistent pheno tables between gene and transcript datasets

Dear recount team,

First of all, congratulations this great resource, which I am regularly using for training and research.

I have a problem with several studies (e.g. SRP042620), where the structure of the pheno table extracted from RData files downloaded from recount differs between transcript and gene datasets.

For the gene dataset, the characteristics field is a CompressedCharacteristics (as expected)

> phenoTable <- colData(rse_gene) ## phenotype per run
> class(phenoTable$characteristics)
[1] "CompressedCharacterList"
attr(,"package")
[1] "IRanges"

For the transcript dataset, it is a vector

> phenoTable <- colData(rse_tx) ## phenotype per run
> class(phenoTable$characteristics)
[1] "character"

Is there a way to fix this ?

Many thanks,

Jacques van Helden

Results for TCGA samples having all "NA" values

Hi there,
I was following the vignette and found that TCGA samples do not have any data points. Please help.
The code is below:

Find a project of interest

project_info <- abstract_search("TCGA")

Download the gene-level RangedSummarizedExperiment data

download_study(project_info$project[7])

Load the data

load(file.path(project_info$project[7], "rse_gene.Rdata"))

Delete it if you don't need it anymore

#unlink(project_info$project[7], recursive = TRUE)

Before adding predictions

dim(colData(rse_gene))##11284 864

Add the predictions

rse_gene <- add_predictions(rse_gene)

After adding the predictions

dim(colData(rse_gene))#11284 876

Explore the variables

x<-colData(rse_gene)[, 865:ncol(colData(rse_gene))]
x[1:50,1:5]

session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.0.3 (2020-10-10)
os Red Hat Enterprise Linux Server 7.5 (Maipo)
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Chicago
date 2021-07-06

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────
! package * version date lib source
annotate 1.68.0 2020-10-27 [1] Bioconductor
AnnotationDbi * 1.52.0 2020-10-27 [1] Bioconductor
AnnotationFilter 1.14.0 2020-10-27 [1] Bioconductor
AnnotationHub * 2.22.1 2021-04-16 [1] Bioconductor
askpass 1.1 2019-01-13 [1] CRAN (R 4.0.3)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.3)
backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.3)
Biobase * 2.50.0 2020-10-27 [1] Bioconductor
BiocFileCache * 1.14.0 2020-10-27 [1] Bioconductor
BiocGenerics * 0.36.1 2021-04-16 [1] Bioconductor
BiocManager 1.30.15 2021-05-11 [1] CRAN (R 4.0.3)
BiocParallel 1.24.1 2020-11-06 [1] Bioconductor
BiocStyle 2.18.1 2020-11-24 [1] Bioconductor
BiocVersion 3.12.0 2020-04-27 [1] Bioconductor
biomaRt 2.46.3 2021-02-09 [1] Bioconductor
Biostrings 2.58.0 2020-10-27 [1] Bioconductor
biovizBase 1.38.0 2020-10-27 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.3)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.3)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.0.3)
blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.3)
broom 0.7.6 2021-04-05 [1] CRAN (R 4.0.3)
BSgenome 1.58.0 2020-10-27 [1] Bioconductor
bumphunter * 1.32.0 2020-10-27 [1] Bioconductor
cachem 1.0.5 2021-05-15 [1] CRAN (R 4.0.3)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.0.3)
caTools 1.18.2 2021-03-28 [1] CRAN (R 4.0.3)
CCA * 1.2.1 2021-03-01 [1] CRAN (R 4.0.3)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.3)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.3)
chron 2.3-56 2020-08-18 [1] CRAN (R 4.0.3)
cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.3)
cluster 2.1.0 2019-06-19 [2] CRAN (R 4.0.3)
clusterProfiler * 3.18.1 2021-02-09 [1] Bioconductor
codetools 0.2-16 2018-12-24 [2] CRAN (R 4.0.3)
colorspace 2.0-1 2021-05-04 [1] CRAN (R 4.0.3)
cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.0.3)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
curl 4.3.1 2021-04-30 [1] CRAN (R 4.0.3)
data.table * 1.14.0 2021-02-21 [1] CRAN (R 4.0.3)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
dbplyr * 2.1.1 2021-04-06 [1] CRAN (R 4.0.3)
DEFormats 1.18.0 2020-10-27 [1] Bioconductor
DelayedArray 0.16.3 2021-03-24 [1] Bioconductor
derfinder * 1.24.2 2020-12-18 [1] Bioconductor
derfinderHelper 1.24.1 2020-12-18 [1] Bioconductor
derfinderPlot * 1.24.1 2020-12-18 [1] Bioconductor
desc 1.3.0 2021-03-05 [1] CRAN (R 4.0.3)
DESeq2 * 1.30.1 2021-02-19 [1] Bioconductor
devtools 2.4.2 2021-06-07 [1] CRAN (R 4.0.3)
dichromat 2.0-0 2013-01-24 [1] CRAN (R 4.0.3)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
DO.db 2.9 2021-07-06 [1] Bioconductor
doRNG 1.8.2 2020-01-27 [1] CRAN (R 4.0.3)
DOSE 3.16.0 2020-10-27 [1] Bioconductor
dotCall64 * 1.0-1 2021-02-11 [1] CRAN (R 4.0.3)
downloader 0.4 2015-07-09 [1] CRAN (R 4.0.3)
dplyr * 1.0.6 2021-05-05 [1] CRAN (R 4.0.3)
edgeR * 3.32.1 2021-01-14 [1] Bioconductor
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.3)
enrichplot 1.10.2 2021-01-28 [1] Bioconductor
ensembldb 2.14.1 2021-04-19 [1] Bioconductor
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.3)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.3)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.3)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.3)
fastmatch 1.1-0 2017-01-28 [1] CRAN (R 4.0.3)
fda * 5.1.9 2020-12-16 [1] CRAN (R 4.0.3)
fds * 1.8 2018-10-31 [1] CRAN (R 4.0.3)
fgsea 1.16.0 2020-10-27 [1] Bioconductor
fields * 12.3 2021-05-17 [1] CRAN (R 4.0.3)
forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.0.3)
foreach * 1.5.1 2020-10-15 [1] CRAN (R 4.0.3)
foreign 0.8-80 2020-05-24 [2] CRAN (R 4.0.3)
Formula 1.2-4 2020-10-16 [1] CRAN (R 4.0.3)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3)
genefilter 1.72.1 2021-01-21 [1] Bioconductor
geneplotter 1.68.0 2020-10-27 [1] Bioconductor
generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
GenomeInfoDb * 1.26.7 2021-04-08 [1] Bioconductor
GenomeInfoDbData 1.2.4 2021-06-08 [1] Bioconductor
GenomicAlignments 1.26.0 2020-10-27 [1] Bioconductor
GenomicFeatures 1.42.3 2021-04-01 [1] Bioconductor
GenomicFiles 1.26.0 2020-10-27 [1] Bioconductor
GenomicRanges * 1.42.0 2020-10-27 [1] Bioconductor
GenomicState * 0.99.9 2021-07-06 [1] Bioconductor
GEOquery 2.58.0 2020-10-27 [1] Bioconductor
GGally 2.1.2 2021-06-21 [1] CRAN (R 4.0.3)
ggbio 1.38.0 2020-10-27 [1] Bioconductor
ggforce 0.3.3 2021-03-05 [1] CRAN (R 4.0.3)
V ggplot2 * 3.3.3 2021-06-25 [1] CRAN (R 4.0.3)
ggraph 2.0.5 2021-02-23 [1] CRAN (R 4.0.3)
ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.0.3)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.3)
GO.db 3.12.1 2021-06-18 [1] Bioconductor
GOSemSim 2.16.1 2020-10-29 [1] Bioconductor
gplots * 3.1.1 2020-11-28 [1] CRAN (R 4.0.3)
graph 1.68.0 2020-10-27 [1] Bioconductor
graphlayouts 0.7.1 2020-10-26 [1] CRAN (R 4.0.3)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.3)
gsubfn * 0.7 2018-03-16 [1] CRAN (R 4.0.3)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.3)
gtools 3.9.2 2021-06-06 [1] CRAN (R 4.0.3)
haven 2.4.1 2021-04-23 [1] CRAN (R 4.0.3)
hdrcde 3.4 2021-01-18 [1] CRAN (R 4.0.3)
Hmisc 4.5-0 2021-02-28 [1] CRAN (R 4.0.3)
hms 1.1.0 2021-05-17 [1] CRAN (R 4.0.3)
htmlTable 2.2.1 2021-05-18 [1] CRAN (R 4.0.3)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.0.3)
httpuv 1.6.1 2021-05-07 [1] CRAN (R 4.0.3)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.3)
igraph 1.2.6 2020-10-06 [1] CRAN (R 4.0.3)
interactiveDisplayBase 1.28.0 2020-10-27 [1] Bioconductor
IRanges * 2.24.1 2020-12-12 [1] Bioconductor
iterators * 1.0.13 2020-10-15 [1] CRAN (R 4.0.3)
janitor * 2.1.0 2021-01-05 [1] CRAN (R 4.0.3)
jpeg 0.1-8.1 2019-10-24 [1] CRAN (R 4.0.3)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3)
KernSmooth 2.23-17 2020-04-26 [2] CRAN (R 4.0.3)
knitr 1.33 2021-04-24 [1] CRAN (R 4.0.3)
knitrBootstrap 1.0.2 2018-05-24 [1] CRAN (R 4.0.3)
ks 1.13.1 2021-06-01 [1] CRAN (R 4.0.3)
later 1.2.0 2021-04-23 [1] CRAN (R 4.0.3)
lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.3)
latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.0.3)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.3)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.3)
limma * 3.46.0 2020-10-27 [1] Bioconductor
locfit * 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.3)
lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.0.3)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
maps 3.3.0 2018-04-03 [1] CRAN (R 4.0.3)
markdown 1.1 2019-08-07 [1] CRAN (R 4.0.3)
MASS * 7.3-53 2020-09-09 [2] CRAN (R 4.0.3)
Matrix * 1.3-4 2021-06-01 [1] CRAN (R 4.0.3)
matrixcalc * 1.0-4 2021-06-03 [1] CRAN (R 4.0.3)
MatrixGenerics * 1.2.1 2021-01-30 [1] Bioconductor
matrixStats * 0.59.0 2021-06-01 [1] CRAN (R 4.0.3)
mclust 5.4.7 2020-11-20 [1] CRAN (R 4.0.3)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.3)
mime 0.10 2021-02-13 [1] CRAN (R 4.0.3)
modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.3)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.3)
mvtnorm 1.1-2 2021-06-07 [1] CRAN (R 4.0.3)
nnet 7.3-14 2020-04-26 [2] CRAN (R 4.0.3)
openssl 1.4.4 2021-04-30 [1] CRAN (R 4.0.3)
org.Hs.eg.db * 3.12.0 2021-06-29 [1] Bioconductor
OrganismDbi 1.32.0 2020-10-27 [1] Bioconductor
pcaPP * 1.9-74 2021-04-23 [1] CRAN (R 4.0.3)
pillar 1.6.1 2021-05-16 [1] CRAN (R 4.0.3)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.3)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.3)
pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.0.3)
plyr * 1.8.6 2020-03-03 [1] CRAN (R 4.0.3)
PMA * 1.2.1 2020-02-03 [1] CRAN (R 4.0.3)
png 0.1-7 2013-12-03 [1] CRAN (R 4.0.3)
polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.0.3)
pracma 2.3.3 2021-01-23 [1] CRAN (R 4.0.3)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.3)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.0.3)
progress 1.2.2 2019-05-16 [1] CRAN (R 4.0.3)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.0.3)
ProtGenerics 1.22.0 2020-10-27 [1] Bioconductor
proto * 1.0.0 2016-10-29 [1] CRAN (R 4.0.3)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.3)
purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.3)
qvalue 2.22.0 2020-10-27 [1] Bioconductor
R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.3)
R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.3)
R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.3)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
rainbow * 3.6 2019-01-29 [1] CRAN (R 4.0.3)
rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.0.3)
RBGL 1.66.0 2020-10-27 [1] Bioconductor
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.3)
Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.3)
RCurl * 1.98-1.3 2021-03-16 [1] CRAN (R 4.0.3)
readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.3)
readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.3)
recount * 1.16.1 2020-12-18 [1] Bioconductor
RefManageR 1.3.0 2020-11-13 [1] CRAN (R 4.0.3)
regionReport * 1.24.2 2020-12-18 [1] Bioconductor
remotes 2.4.0 2021-06-02 [1] CRAN (R 4.0.3)
rentrez 1.2.3 2020-11-10 [1] CRAN (R 4.0.3)
reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.3)
reshape * 0.8.8 2018-10-23 [1] CRAN (R 4.0.3)
reshape2 * 1.4.4 2020-04-09 [1] CRAN (R 4.0.3)
rlang * 0.4.11 2021-04-30 [1] CRAN (R 4.0.3)
rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.0.3)
rngtools 1.5 2020-01-23 [1] CRAN (R 4.0.3)
rpart 4.1-15 2019-04-12 [2] CRAN (R 4.0.3)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3)
Rsamtools 2.6.0 2020-10-27 [1] Bioconductor
RSQLite * 2.2.7 2021-04-22 [1] CRAN (R 4.0.3)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3)
rtracklayer 1.50.0 2020-10-27 [1] Bioconductor
rvcheck 0.1.8 2020-03-01 [1] CRAN (R 4.0.3)
rvest 1.0.0 2021-03-09 [1] CRAN (R 4.0.3)
S4Vectors * 0.28.1 2020-12-09 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.3)
scatterpie 0.1.6 2021-04-23 [1] CRAN (R 4.0.3)
sessioninfo * 1.1.1 2018-11-05 [1] CRAN (R 4.0.3)
shadowtext 0.0.8 2021-04-23 [1] CRAN (R 4.0.3)
shiny 1.6.0 2021-01-25 [1] CRAN (R 4.0.3)
snakecase 0.11.0 2019-05-25 [1] CRAN (R 4.0.3)
spam * 2.6-0 2020-12-14 [1] CRAN (R 4.0.3)
sqldf * 0.4-11 2017-06-28 [1] CRAN (R 4.0.3)
statmod * 1.4.36 2021-05-10 [1] CRAN (R 4.0.3)
stringi 1.6.2 2021-05-17 [1] CRAN (R 4.0.3)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.3)
SummarizedExperiment * 1.20.0 2020-10-27 [1] Bioconductor
survival 3.2-7 2020-09-28 [2] CRAN (R 4.0.3)
TCGAbiolinks * 2.18.0 2020-10-27 [1] Bioconductor
TCGAbiolinksGUI.data 1.10.0 2020-10-29 [1] Bioconductor
testthat 3.0.2 2021-02-14 [1] CRAN (R 4.0.3)
tibble * 3.1.2 2021-05-16 [1] CRAN (R 4.0.3)
tidygraph 1.2.0 2020-05-12 [1] CRAN (R 4.0.3)
tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.0.3)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.3)
tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.0.3)
tweenr 1.0.2 2021-03-23 [1] CRAN (R 4.0.3)
usethis 2.0.1 2021-02-10 [1] CRAN (R 4.0.3)
utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.3)
VariantAnnotation 1.36.0 2020-10-27 [1] Bioconductor
vctrs * 0.3.8 2021-04-29 [1] CRAN (R 4.0.3)
viridis * 0.6.1 2021-05-11 [1] CRAN (R 4.0.3)
viridisLite * 0.4.0 2021-04-13 [1] CRAN (R 4.0.3)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.3)
xfun 0.23 2021-05-15 [1] CRAN (R 4.0.3)
XML 3.99-0.6 2021-03-16 [1] CRAN (R 4.0.3)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.3)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.3)
XVector 0.30.0 2020-10-27 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.3)
zlibbioc 1.36.0 2020-10-27 [1] Bioconductor

snaptron_query NA coercion error

I'm running into an error with snaptron_query. It appears to be related to processing snaptron return values, i.e. 18 columns instead of the expected 19? Either the 'samples' or 'read_coverage_by_sample' column is missing and everything appears shifted by one. Have tried troubleshooting everything on my end, not sure what the issue is.

Warning messages:
1: In matrix(strsplit(jxs, "\t")[[1]], ncol = 19) :
  data length [18] is not a sub-multiple or multiple of the number of columns [19]
2: In matrix(strsplit(jxs, "\t")[[1]], ncol = 19) :
  data length [18] is not a sub-multiple or multiple of the number of columns [19]
3: In matrix(strsplit(jxs, "\t")[[1]], ncol = 19) :
  data length [18] is not a sub-multiple or multiple of the number of columns [19]
4: In (S4Vectors:::coercerToClass(element.type))(v, ...) :
  NAs introduced by coercion
5: In snaptron_query(junctions) : NAs introduced by coercion

Unable to access data set to reproduce analysis

I have been trying to download the following GSE73721 dataset (a dataset that features on the recount website) but cannot using recount library.

Below is the list of relevant commands I ran in the session :

library(recount)
project_info <- abstract_search('GSE32465')
project_info
number_samples species
340 12 human

project_info <- abstract_search('GSE73721')
project_info
[1] number_samples species abstract project
<0 rows> (or 0-length row.names)

project_info <- abstract_search('SRP064454')
project_info
[1] number_samples species abstract project
<0 rows> (or 0-length row.names)

The data is there on recount website. I would be grateful for your help.
(below is the csv file from the recount website)

accession number of samples species abstract
SRP064454 41 human Astrocytes were purified from fetal and adult human brain tissue using an immunopanning method with the HepaCAM antibody. Samples were taken from otherwise 'healthy' pieces of tissue, unless otherwise specified. Overall design: 6 fetal astrocyte samples, 12 adult astrocyte samples, 8 GBM or sclerotic hippocampal samples, 4 whole human cortex samples, 4 adult mouse astrocyte samples, and 11 human samples of other purified CNS cell types

Thanks,

> dev.tools::sessionInfo()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.1 (2016-06-21)
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
tz
date 2016-12-19

Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
acepack 1.4.1 2016-10-29 CRAN (R 3.3.1)
AnnotationDbi 1.34.4 2016-10-06 Bioconductor
assertthat 0.1 2013-12-06 CRAN (R 3.3.1)
Biobase * 2.32.0 2016-05-16 Bioconductor
BiocGenerics * 0.18.0 2016-05-16 Bioconductor
BiocParallel 1.6.6 2016-12-02 Bioconductor
biomaRt 2.28.0 2016-09-03 Bioconductor
Biostrings 2.40.2 2016-08-10 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.3.0)
BSgenome 1.40.1 2016-12-02 Bioconductor
bumphunter 1.12.0 2016-05-16 Bioconductor
cluster 2.0.5 2016-10-08 CRAN (R 3.3.1)
codetools 0.2-15 2016-10-05 CRAN (R 3.3.1)
colorspace 1.3-1 2016-11-18 CRAN (R 3.3.1)
data.table 1.9.8 2016-11-25 CRAN (R 3.3.1)
DBI 0.5-1 2016-09-10 CRAN (R 3.3.1)
derfinder 1.8.0 2016-12-18 Bioconductor
derfinderHelper 1.6.3 2016-05-17 Bioconductor
devtools 1.12.0 2016-06-24 CRAN (R 3.3.1)
digest 0.6.10 2016-08-02 CRAN (R 3.1.0)
doRNG 1.6 2014-03-07 CRAN (R 3.3.0)
downloader 0.4 2015-07-09 CRAN (R 3.3.0)
foreach 1.4.3 2015-10-13 CRAN (R 3.3.0)
foreign 0.8-67 2016-09-13 CRAN (R 3.3.1)
Formula 1.2-1 2015-04-07 CRAN (R 3.3.0)
GenomeInfoDb * 1.8.7 2016-12-02 Bioconductor
GenomicAlignments 1.8.4 2016-12-02 Bioconductor
GenomicFeatures 1.24.5 2016-12-02 Bioconductor
GenomicFiles 1.8.0 2016-05-12 Bioconductor
GenomicRanges * 1.24.3 2016-12-02 Bioconductor
GEOquery 2.38.4 2016-05-17 Bioconductor
ggplot2 2.2.0 2016-11-11 CRAN (R 3.3.1)
gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.0)
gtable 0.2.0 2016-02-26 CRAN (R 3.1.0)
Hmisc 4.0-0 2016-11-01 CRAN (R 3.3.1)
htmlTable 1.7 2016-10-19 CRAN (R 3.3.1)
htmltools 0.3.5 2016-03-21 CRAN (R 3.3.1)
httr 1.2.1 2016-07-03 CRAN (R 3.3.1)
IRanges * 2.6.1 2016-12-02 Bioconductor
iterators 1.0.8 2015-10-13 CRAN (R 3.3.0)
jsonlite 1.1 2016-09-14 CRAN (R 3.3.1)
knitr 1.15.1 2016-11-22 CRAN (R 3.3.1)
lattice 0.20-34 2016-09-06 CRAN (R 3.3.1)
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.0)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.1.0)
locfit 1.5-9.1 2013-04-20 CRAN (R 3.3.0)
magrittr 1.5 2014-11-22 CRAN (R 3.1.0)
Matrix 1.2-7.1 2016-09-01 CRAN (R 3.3.1)
matrixStats 0.51.0 2016-10-09 CRAN (R 3.3.1)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.1.0)
nnet 7.3-12 2016-02-02 CRAN (R 3.3.1)
pkgmaker 0.22 2014-05-14 CRAN (R 3.3.0)
plyr 1.8.4 2016-06-08 CRAN (R 3.3.1)
qvalue 2.4.2 2016-05-17 Bioconductor
R6 2.2.0 2016-10-05 CRAN (R 3.3.1)
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.1.0)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.3.1)
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.1)
recount * 1.0.6 2016-12-18 Bioconductor
registry 0.3 2015-07-08 CRAN (R 3.3.0)
rentrez 1.0.4 2016-10-26 CRAN (R 3.3.1)
reshape2 1.4.2 2016-10-22 CRAN (R 3.3.1)
rngtools 1.2.4 2014-03-06 CRAN (R 3.3.0)
rpart 4.1-10 2015-06-29 CRAN (R 3.3.0)
Rsamtools 1.26.1 2016-12-18 Bioconductor
RSQLite 1.1 2016-11-27 CRAN (R 3.3.1)
rtracklayer 1.34.1 2016-12-18 Bioconductor
S4Vectors * 0.10.3 2016-09-27 Bioconductor
scales 0.4.1 2016-11-09 CRAN (R 3.3.1)
stringi 1.1.2 2016-10-01 CRAN (R 3.3.1)
stringr 1.1.0 2016-08-19 CRAN (R 3.3.1)
SummarizedExperiment * 1.2.3 2016-12-02 Bioconductor
survival 2.40-1 2016-10-30 CRAN (R 3.3.1)
tibble 1.2 2016-08-26 CRAN (R 3.3.1)
VariantAnnotation 1.18.7 2016-12-02 Bioconductor
withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
XML 3.98-1.5 2016-11-10 CRAN (R 3.3.1)
xtable 1.8-2 2016-02-05 CRAN (R 3.1.0)
XVector 0.12.1 2016-12-02 Bioconductor
zlibbioc 1.18.0 2016-05-16 Bioconductor

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.