Giter VIP home page Giter VIP logo

annotationdbi's Introduction

AnnotationDbi is an R/Bioconductor package that implements a user-friendly interface for querying SQLite-based annotation data packages.

See https://bioconductor.org/packages/AnnotationDbi for more information including how to install the release version of the package (please refrain from installing directly from GitHub).

annotationdbi's People

Contributors

dtenenba avatar dvantwisk avatar hpages avatar jmacdon avatar jwokaty avatar kayla-morrell avatar link-ny avatar lshep avatar mtmorgan avatar nturaga avatar sonali-bioc avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

annotationdbi's Issues

TXSTART metadata ignores strand directionality?

Hello,

Love the package. Saw an issue I want to discuss.

Okay so I was using the Mus.Musculus AnnotateDbi database package to make a genomic ranges object containing all of the promoters and because I have access to your wonderful package through the AnnotateDbi framework I can now slap all that wonderful metdata on to this object in one go instead of having to merge 3 different databases together via ENTRZID.

Here was the code I ran.

# Package setup
BiocManager::install("OrganismDbi")
library(OrganismDbi)
BiocManager::install("Mus.musculus")
library(Mus.musculus)
BiocManager::install("GenomicFeatures")
library(GenomicFeatures)

# Making this object just for comparison
Mm_gene <- transcriptsBy(Mus.musculus, by="gene", columns=c("SYMBOL", "ENTREZID", "TXCHROM",  "TXSTART", "TXSTRAND", "CDSSTART"))
Mm_gene

# Here is the promoter object. You can see I'm calling 1500 bp upstream of the transcription start and 500 bp downstream of the transcription start site my "promoter region" for this analysis.
Mm_gene_promoters <- promoters(transcriptsBy(Mus.musculus, by="gene", columns=c("SYMBOL", "ENTREZID", "TXCHROM",  "TXSTART", "TXSTRAND", "CDSSTART")), upstream = 1500, downstream = 500)
Mm_gene_promoters

Below are screenshots of the outputs.
Mm_gene
image

Even though the transcripts are on the minus strand the database is calling that start of the transcript as the first base pair from the genomic range object.

image

Here you can see that the promoters() function from genomicFeatures gets it right and assigns my promoter region as 1500 bases upstream and 500 downstream to the transcription start site for Zglp1 which is coming from the minus strand and should be adding 1500 bp to the last bp of the genomic ranges and then subtracting 500 bp to get the correct ranges.

This is something I saw and was curious if the TXSTART metadata coming from the Mus.Musculus package was just being scraped from the first base pair of the genomic ranges. This would be super simple to add in an "if loop" and have it grab the last base pair in the ranges instead for transcripts on the minus strand. Otherwise this is going to lead to some confusion from people trying to use this metadata and not knowing where these numbers are coming from.

Remove Deprecated AnnotationDbi.Rnw

https://github.com/Bioconductor/AnnotationDbi/blob/devel/vignettes/AnnotationDbi.Rnw is noted as deprecated; however

From slack with @hpages

Maybe we should keep and ignore that vignette as proposed by Vince. It's true that users are no longer supposed to use the 'bimaps' interface but we don't know whether other Bioconductor packages are still using it or not. So we might want to keep the vignette around until we know for sure that all packages have migrated to the 'select' interface (this is the replacement for the 'bimaps' interface). Thx

multiVals for select?

In my analysis code, I have not-uncommon occurrences of:

library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
anno <- select(ens.mm.v97, keys=rownames(se), 
    keytype="GENEID", columns=c("SYMBOL", "SEQNAME"))
rowData(se) <- anno[match(rownames(se), anno$GENEID),]

It would be nice to do something like:

anno <- select(ens.mm.v97, keys=rownames(se), multiVals="first",
    keytype="GENEID", columns=c("SYMBOL", "SEQNAME"))

... and save myself an extra line of code (and improve robustness to changes to the annotation object). Sort of like how I get an integer vector if I ask for findOverlaps(..., select="first").

Error in installed.packages()["AnnotationDbi", "Version"]: subscript out of bounds

Hi,

I'm trying to make an organism package from annotations using makeOrgPackage() according to the Bioconductor vignette.

Unfortunately, when using the code example from the vignette, I receive the following error flagging a problem with AnnotationDbi:

Error in installed.packages()["AnnotationDbi", "Version"] : 
  subscript out of bounds

The final package isn't built, yet the rest looks kind of working fine:

Populating genes table:
genes table filled
Populating gene_info table:
gene_info table filled
Populating chromosome table:
chromosome table filled
Populating go table:
go table filled
table metadata filled
'select()' returned many:1 mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
Populating go_bp table:
go_bp table filled
Populating go_cc table:
go_cc table filled
Populating go_mf table:
go_mf table filled
'select()' returned many:1 mapping between keys and columns
Populating go_bp_all table:
go_bp_all table filled
Populating go_cc_all table:
go_cc_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_all table:
go_all table filled
Error in installed.packages()["AnnotationDbi", "Version"] : 
  subscript out of bounds
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
...

I would appreciate any help how to solve the error.

Many thanks in advance!

Jan

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] AnnotationForge_1.28.0  GenomeInfoDb_1.22.1     biomaRt_2.42.1          GO.db_3.10.0           
 [5] org.Pf.plasmo.db_3.10.0 pkgconfig_2.0.3         AnnotationDbi_1.48.0    IRanges_2.20.2         
 [9] S4Vectors_0.24.4        Biobase_2.46.0          BiocGenerics_0.32.0    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6           pillar_1.4.3           compiler_3.6.3         dbplyr_1.4.2          
 [5] bitops_1.0-6           prettyunits_1.1.1      tools_3.6.3            progress_1.2.2        
 [9] digest_0.6.25          bit_1.1-15.2           lifecycle_0.2.0        RSQLite_2.2.0         
[13] memoise_1.1.0          BiocFileCache_1.10.2   tibble_3.0.0           rlang_0.4.5           
[17] cli_2.0.2              DBI_1.1.0              curl_4.3               GenomeInfoDbData_1.2.2
[21] stringr_1.4.0          httr_1.4.1             dplyr_0.8.5            rappdirs_0.3.1        
[25] vctrs_0.2.4            askpass_1.1            hms_0.5.3              tidyselect_1.0.0      
[29] bit64_0.9-7            glue_1.4.0             R6_2.4.1               fansi_0.4.1           
[33] XML_3.99-0.3           purrr_0.3.3            blob_1.2.1             magrittr_1.5          
[37] ellipsis_0.3.0         assertthat_0.2.1       stringi_1.4.6          RCurl_1.98-1.1        
[41] openssl_1.4.1          crayon_1.3.4         

hom.Rn.inp.db is required but removed from bioconductor release

I am trying to convert rat genes to human gene orthologs using idConverter() however when running the code I get an error that loading of hom.Rn.inp.db and is required, however this has been removed from bioconductor and I would like to make sure that my ortholog retrieval is up to date, trustworthy, and replicable so I don't want to be installing later deprecated packages.

Code:

orthologs=idConverter(ids=allrats_sigs$NOG,
                      srcSpecies = "RATNO",
                      destSpecies = "HOMSA",
                      srcIDType ="ENSEMBL" )

Error:

Loading required package: hom.Rn.inp.db
Error in get(paste0("hom.", srcSpcAbrv, ".inp", destSpecies)) : 
  object 'hom.Rn.inpHOMSA' not found
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'hom.Rn.inp.db'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.