bioconductor / annotationdbi Goto Github PK

Manipulation of SQLite-based annotations in Bioconductor

Home Page: https://bioconductor.org/packages/AnnotationDbi

R 100.00%

annotationdbi's Introduction

AnnotationDbi is an R/Bioconductor package that implements a user-friendly interface for querying SQLite-based annotation data packages.

See https://bioconductor.org/packages/AnnotationDbi for more information including how to install the release version of the package (please refrain from installing directly from GitHub).

annotationdbi's People

Contributors

Stargazers

Watchers

Forkers

jmacdon siyangming leaseep yongming-duan berylkanali michaelchirico jwokaty sonali8434 shivangsharma polyomica

annotationdbi's Issues

TXSTART metadata ignores strand directionality?

Hello,

Love the package. Saw an issue I want to discuss.

Okay so I was using the Mus.Musculus AnnotateDbi database package to make a genomic ranges object containing all of the promoters and because I have access to your wonderful package through the AnnotateDbi framework I can now slap all that wonderful metdata on to this object in one go instead of having to merge 3 different databases together via ENTRZID.

Here was the code I ran.

# Package setup
BiocManager::install("OrganismDbi")
library(OrganismDbi)
BiocManager::install("Mus.musculus")
library(Mus.musculus)
BiocManager::install("GenomicFeatures")
library(GenomicFeatures)

# Making this object just for comparison
Mm_gene <- transcriptsBy(Mus.musculus, by="gene", columns=c("SYMBOL", "ENTREZID", "TXCHROM",  "TXSTART", "TXSTRAND", "CDSSTART"))
Mm_gene

# Here is the promoter object. You can see I'm calling 1500 bp upstream of the transcription start and 500 bp downstream of the transcription start site my "promoter region" for this analysis.
Mm_gene_promoters <- promoters(transcriptsBy(Mus.musculus, by="gene", columns=c("SYMBOL", "ENTREZID", "TXCHROM",  "TXSTART", "TXSTRAND", "CDSSTART")), upstream = 1500, downstream = 500)
Mm_gene_promoters

Below are screenshots of the outputs.
Mm_gene

Even though the transcripts are on the minus strand the database is calling that start of the transcript as the first base pair from the genomic range object.

Here you can see that the promoters() function from genomicFeatures gets it right and assigns my promoter region as 1500 bases upstream and 500 downstream to the transcription start site for Zglp1 which is coming from the minus strand and should be adding 1500 bp to the last bp of the genomic ranges and then subtracting 500 bp to get the correct ranges.

This is something I saw and was curious if the TXSTART metadata coming from the Mus.Musculus package was just being scraped from the first base pair of the genomic ranges. This would be super simple to add in an "if loop" and have it grab the last base pair in the ranges instead for transcripts on the minus strand. Otherwise this is going to lead to some confusion from people trying to use this metadata and not knowing where these numbers are coming from.

replacing previous import ‘utils::findMatches’ by ‘S4Vectors::findMatches’ when loading ‘AnnotationDbi’

In Bioconductor 3.17, when I call library(AnnotationDbi), or loading a package that imports AnnotationDbi, I get this warning:

Warning message:
replacing previous import ‘utils::findMatches’ by ‘S4Vectors::findMatches’ when loading ‘AnnotationDbi’

I'm running R 4.3.0 on MacOS 13.3.1

Remove Deprecated AnnotationDbi.Rnw

https://github.com/Bioconductor/AnnotationDbi/blob/devel/vignettes/AnnotationDbi.Rnw is noted as deprecated; however

From slack with @hpages

Maybe we should keep and ignore that vignette as proposed by Vince. It's true that users are no longer supposed to use the 'bimaps' interface but we don't know whether other Bioconductor packages are still using it or not. So we might want to keep the vignette around until we know for sure that all packages have migrated to the 'select' interface (this is the replacement for the 'bimaps' interface). Thx

multiVals for select?

In my analysis code, I have not-uncommon occurrences of:

library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
anno <- select(ens.mm.v97, keys=rownames(se), 
    keytype="GENEID", columns=c("SYMBOL", "SEQNAME"))
rowData(se) <- anno[match(rownames(se), anno$GENEID),]

It would be nice to do something like:

anno <- select(ens.mm.v97, keys=rownames(se), multiVals="first",
    keytype="GENEID", columns=c("SYMBOL", "SEQNAME"))

... and save myself an extra line of code (and improve robustness to changes to the annotation object). Sort of like how I get an integer vector if I ask for findOverlaps(..., select="first").

Error in installed.packages()["AnnotationDbi", "Version"]: subscript out of bounds

Hi,

I'm trying to make an organism package from annotations using makeOrgPackage() according to the Bioconductor vignette.

Unfortunately, when using the code example from the vignette, I receive the following error flagging a problem with AnnotationDbi:

Error in installed.packages()["AnnotationDbi", "Version"] : 
  subscript out of bounds

The final package isn't built, yet the rest looks kind of working fine:

Populating genes table:
genes table filled
Populating gene_info table:
gene_info table filled
Populating chromosome table:
chromosome table filled
Populating go table:
go table filled
table metadata filled
'select()' returned many:1 mapping between keys and columns
Dropping GO IDs that are too new for the current GO.db
Populating go table:
go table filled
Populating go_bp table:
go_bp table filled
Populating go_cc table:
go_cc table filled
Populating go_mf table:
go_mf table filled
'select()' returned many:1 mapping between keys and columns
Populating go_bp_all table:
go_bp_all table filled
Populating go_cc_all table:
go_cc_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_all table:
go_all table filled
Error in installed.packages()["AnnotationDbi", "Version"] : 
  subscript out of bounds
In addition: There were 50 or more warnings (use warnings() to see the first 50)

> warnings()
Warning messages:
1: In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
...

I would appreciate any help how to solve the error.

Many thanks in advance!

Jan

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] AnnotationForge_1.28.0  GenomeInfoDb_1.22.1     biomaRt_2.42.1          GO.db_3.10.0           
 [5] org.Pf.plasmo.db_3.10.0 pkgconfig_2.0.3         AnnotationDbi_1.48.0    IRanges_2.20.2         
 [9] S4Vectors_0.24.4        Biobase_2.46.0          BiocGenerics_0.32.0    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6           pillar_1.4.3           compiler_3.6.3         dbplyr_1.4.2          
 [5] bitops_1.0-6           prettyunits_1.1.1      tools_3.6.3            progress_1.2.2        
 [9] digest_0.6.25          bit_1.1-15.2           lifecycle_0.2.0        RSQLite_2.2.0         
[13] memoise_1.1.0          BiocFileCache_1.10.2   tibble_3.0.0           rlang_0.4.5           
[17] cli_2.0.2              DBI_1.1.0              curl_4.3               GenomeInfoDbData_1.2.2
[21] stringr_1.4.0          httr_1.4.1             dplyr_0.8.5            rappdirs_0.3.1        
[25] vctrs_0.2.4            askpass_1.1            hms_0.5.3              tidyselect_1.0.0      
[29] bit64_0.9-7            glue_1.4.0             R6_2.4.1               fansi_0.4.1           
[33] XML_3.99-0.3           purrr_0.3.3            blob_1.2.1             magrittr_1.5          
[37] ellipsis_0.3.0         assertthat_0.2.1       stringi_1.4.6          RCurl_1.98-1.1        
[41] openssl_1.4.1          crayon_1.3.4

Reference to undefined function

AnnotationDbi/R/createAnnObjs.ORGANISM_DB.R

Line 140 in 50cba36

seeds <- c(seeds, makeAnnDbMapSeeds())

makeAnnDbMapSeeds() is defined in a comment in the same R file, but I don't see it defined anywhere else.

Should this be removed?

Unable to find an inherited method for function ‘species’ for signature ‘"character"’

Updated R to 4.0 and now am getting an error with the following code:

OrgDb = 'org.Hs.eg.db'
res@organism <- AnnotationDbi::species(OrgDb)
Unable to find an inherited method for function ‘species’ for signature ‘"character"’

Guessing it has something to do with stringtofactor changes?

mapIds(..., multiVals="asNA") returns logical() rather than character() when all values map multiply

> class(AnnotationDbi::mapIds(TxDb.Hsapiens.UCSC.hg38.knownGene, "1", "TXID", "GENEID", multiVals="asNA"))
'select()' returned 1:many mapping between keys and columns
[1] "logical"

One solution is to change line

AnnotationDbi/R/methods-geneCentricDbs.R

Line 1142 in 00cc5c5

unlist(data)

as.character(unlist(data))

or better, in the line above, explicitly use NA_character_

hom.Rn.inp.db is required but removed from bioconductor release

I am trying to convert rat genes to human gene orthologs using idConverter() however when running the code I get an error that loading of hom.Rn.inp.db and is required, however this has been removed from bioconductor and I would like to make sure that my ortholog retrieval is up to date, trustworthy, and replicable so I don't want to be installing later deprecated packages.

Code:

orthologs=idConverter(ids=allrats_sigs$NOG,
                      srcSpecies = "RATNO",
                      destSpecies = "HOMSA",
                      srcIDType ="ENSEMBL" )

Error:

Loading required package: hom.Rn.inp.db
Error in get(paste0("hom.", srcSpcAbrv, ".inp", destSpecies)) : 
  object 'hom.Rn.inpHOMSA' not found
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called 'hom.Rn.inp.db'

bioconductor / annotationdbi Goto Github PK

annotationdbi's Introduction

annotationdbi's People

Contributors

Stargazers

Watchers

Forkers

annotationdbi's Issues

TXSTART metadata ignores strand directionality?

replacing previous import ‘utils::findMatches’ by ‘S4Vectors::findMatches’ when loading ‘AnnotationDbi’

Remove Deprecated AnnotationDbi.Rnw

multiVals for select?

Error in installed.packages()["AnnotationDbi", "Version"]: subscript out of bounds

Reference to undefined function

Unable to find an inherited method for function ‘species’ for signature ‘"character"’

mapIds(..., multiVals="asNA") returns logical() rather than character() when all values map multiply

hom.Rn.inp.db is required but removed from bioconductor release

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent