Giter VIP home page Giter VIP logo

organism.dplyr's Introduction

The package creates an on disk sqlite database to hold data of an organism combined from an 'org' package (e.g., org.Hs.eg.db) and agenome coordinate functionality of the 'TxDb' package (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene). It aims to provide an integrated presentation of identifiers and genomic coordinates.

organism.dplyr's People

Contributors

dvantwisk avatar hpages avatar jwokaty avatar lshep avatar mtmorgan avatar nturaga avatar vobencha avatar yubocheng avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

organism.dplyr's Issues

Filters

There are filter concepts in S4Vectors, ensembldb, and now here. Shouldn't we have just one? One thing that drove us to implement our own filters rather than re-using ensembldb was the ability to easily generate them programmatically, whereas these are all 'hand-crafted' in EnsemblDb.

Appropriate table structure

Organism.dplyr simplifies the bimap and table structure of org and TxDb packages to a small number of tables, but what are the optimal arrangement and membership of tables? Already Organism.dplyr is much more user-friendly than the org / TxDb / Homo.sapiens packages, so is valuable for that reason alone. Note that the genes(), transcripts(), exons(), and cds() verbs are already contracted to return a GRanges; we have genes_tbl() etc returning tibbles.

Bioconductor BBS: Organism.dplyr / BioC 3.18, 10/27/93

Hi Organism.dplyr maintainer,

According to the Multiple platform build/check report for BioC 3.18,
the Organism.dplyr package has the following problem(s):

o ERROR for 'R CMD build' on nebbiolo2. See the details here:
https://master.bioconductor.org/checkResults/3.18/bioc-LATEST/Organism.dplyr/nebbiolo2-buildsrc.html

Please take the time to address this by committing and pushing
changes to your package at git.bioconductor.org

Notes:

  • This was the status of your package at the time this email was sent to you.
    Given that the online report is updated daily (in normal conditions) you
    could see something different when you visit the URL(s) above, especially if
    you do so several days after you received this email.

  • It is possible that the problems reported in this report are false positives,
    either because another package (from CRAN or Bioconductor) breaks your
    package (if yours depends on it) or because of a Build System problem.
    If this is the case, then you can ignore this email.

  • Please check the report again 24h after you've committed your changes to the
    package and make sure that all the problems have gone.

  • If you have questions about this report or need help with the
    maintenance of your package, please use the Bioc-devel mailing list:

https://bioconductor.org/help/support/

(all package maintainers are requested to subscribe to this list)

For immediate notification of package build status, please
subscribe to your package's RSS feed. Information is at:

https://bioconductor.org/developers/rss-feeds/

Thanks for contributing to the Bioconductor project!

persistent cache for result of src_organism()?

From the current vignette:

Running src_organism() without a given path will save the sqlite file to a tempdir():

...

It might be more convenient if the default behavior was to write the sqlite into a folder
like the one used for AnnotationHub, with src_organism checking for a relevant database
when invoked. It takes a bit of time to build the database, and if I understand the default behavior correctly, it will be lost when the session ends.

remove redundant sqlite file

I created two sqlite data bases for testing, and somehow missed example.sqlite! I'll fix this, probably removing example.sqlite and updating vignette / documentation

`fiveUTRsBYTranscript(, filter=)` does not return values for all transcripts

> transcripts_tbl(src, filter=list(SymbolFilter("ADA")))
Joining, by = "entrez"
Source:   query [?? x 7]
Database: sqlite 3.11.1 [/home/mtmorgan/a/Organism.dplyr/inst/extdata/light.hg38.knownGene.sqlite]

  tx_chrom tx_start   tx_end tx_strand  tx_id    tx_name symbol
     <chr>    <int>    <int>     <chr>  <int>      <chr>  <chr>
1    chr20 44619522 44626491         - 169786 uc061xfj.1    ADA
2    chr20 44619522 44651742         - 169787 uc002xmj.4    ADA
3    chr20 44619810 44651691         - 169789 uc061xfl.1    ADA
> fiveUTRsByTranscript(src, filter = list(SymbolFilter("ADA")))
Joining, by = "entrez"
Joining, by = "entrez"
GRangesList object of length 1:
$169787 
GRanges object with 1 range and 5 metadata columns:
      seqnames               ranges strand |     tx_id   exon_id   exon_name
         <Rle>            <IRanges>  <Rle> | <integer> <integer> <character>
  [1]    chr20 [44651608, 44651742]      - |    169787    501401        <NA>
      exon_rank      symbol
      <integer> <character>
  [1]         1         ADA

-------

src_organism not working

Hello every time I run:

src<- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")

I get the error:

Error in collect(): ! Failed to collect lazy table. Caused by error in db_collect(): ! Arguments in ... must be used. ✖ Problematic argument: • ..1 = Inf ℹ Did you misspell an argument name?

Not sure how to resolve this.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS; LAPACK version 3.10.1

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] tools stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] tinytex_0.45 viridis_0.6.5 viridisLite_0.4.2
[4] rtracklayer_1.60.0 tidylog_1.0.2 data.table_1.15.4
[7] janitor_2.2.0 stringr_1.5.0 stringi_1.7.12
[10] forcats_1.0.0 readODS_2.2.0 patchwork_1.2.0
[13] ggrepel_0.9.3 ggplot2_3.5.0 RColorBrewer_1.1-3
[16] karyoploteR_1.26.0 regioneR_1.32.0 DOSE_3.26.1
[19] TxDb.Hsapiens.UCSC.hg38.knownGene_3.17.0 GenomicFeatures_1.52.1 GenomicRanges_1.52.0
[22] GenomeInfoDb_1.36.1 AnnotationDbi_1.62.2 IRanges_2.34.1
[25] S4Vectors_0.38.1 Biobase_2.60.0 BiocGenerics_0.46.0
[28] Organism.dplyr_1.28.0 AnnotationFilter_1.24.0 dplyr_1.1.4
[31] biomaRt_2.56.1 BiocManager_1.30.22

loaded via a namespace (and not attached):
[1] rstudioapi_0.14 magrittr_2.0.3 rmarkdown_2.22 BiocIO_1.10.0
[5] zlibbioc_1.46.0 vctrs_0.6.5 memoise_2.0.1 Rsamtools_2.16.0
[9] RCurl_1.98-1.12 base64enc_0.1-3 htmltools_0.5.5 S4Arrays_1.0.4
[13] progress_1.2.3 curl_5.2.1 Formula_1.2-5 htmlwidgets_1.6.2
[17] plyr_1.8.9 lubridate_1.9.3 cachem_1.0.8 GenomicAlignments_1.36.0
[21] lifecycle_1.0.3 pkgconfig_2.0.3 Matrix_1.6-5 R6_2.5.1
[25] fastmap_1.1.1 snakecase_0.11.1 GenomeInfoDbData_1.2.10 MatrixGenerics_1.12.2
[29] digest_0.6.31 colorspace_2.1-0 pkgload_1.3.2 bezier_1.1.2
[33] Hmisc_5.1-0 RSQLite_2.3.1 org.Hs.eg.db_3.17.0 filelock_1.0.2
[37] timechange_0.3.0 fansi_1.0.4 httr_1.4.6 compiler_4.3.1
[41] withr_2.5.0 bit64_4.0.5 htmlTable_2.4.1 backports_1.4.1
[45] BiocParallel_1.34.2 DBI_1.2.2 rappdirs_0.3.3 DelayedArray_0.26.6
[49] rjson_0.2.21 HDO.db_0.99.1 foreign_0.8-84 zip_2.3.0
[53] nnet_7.3-19 glue_1.6.2 restfulr_0.0.15 GOSemSim_2.26.0
[57] grid_4.3.1 checkmate_2.3.1 cluster_2.1.4 reshape2_1.4.4
[61] fgsea_1.26.0 generics_0.1.3 gtable_0.3.4 BSgenome_1.68.0
[65] tidyr_1.3.1 ensembldb_2.24.0 hms_1.1.3 xml2_1.3.4
[69] utf8_1.2.3 XVector_0.40.0 pillar_1.9.0 splines_4.3.1
[73] BiocFileCache_2.8.0 lattice_0.22-6 bit_4.0.5 biovizBase_1.48.0
[77] tidyselect_1.2.1 GO.db_3.17.0 Biostrings_2.68.1 knitr_1.46
[81] gridExtra_2.3 ProtGenerics_1.32.0 SummarizedExperiment_1.30.2 xfun_0.43
[85] matrixStats_1.3.0 lazyeval_0.2.2 yaml_2.3.7 evaluate_0.21
[89] codetools_0.2-20 tibble_3.2.1 qvalue_2.32.0 cli_3.6.1
[93] rpart_4.1.23 munsell_0.5.1 dichromat_2.0-0.1 Rcpp_1.0.10
[97] dbplyr_2.5.0 png_0.1-8 XML_3.99-0.14 parallel_4.3.1
[101] blob_1.2.4 prettyunits_1.1.1 bitops_1.0-7 VariantAnnotation_1.46.0
[105] scales_1.3.0 openxlsx_4.2.5.2 purrr_1.0.1 crayon_1.5.2
[109] clisymbols_1.2.0 bamsignals_1.32.0 rlang_1.1.1 cowplot_1.1.1
[113] fastmatch_1.1-3 KEGGREST_1.40.0`

support GRangesFilter() without being in a list

The following should work

src = src_organism(dbpath=hg38light())
filter = GRangesFilter(GenomicRanges::GRanges("chr8:18391245-18401218"))
exons(src, filter)

but currently requires

exons(src, list(filter))

Error regarding new dbplyr (v1.3.0.9000) changes

There is an error that is occurring in the tests due to new changes to dbplyr. The tests give the following:

         ERROR
        Running the tests in ‘tests/testthat.R’ failed.
        Last 13 lines of output:
          [32] 95456 - 95461 == -5
          [33] 95456 - 95461 == -5
          [34] 95456 - 95461 == -5
          [35] 95456 - 95461 == -5
          [47] 95461 - 95456 ==  5
          ...
   
          ══ testthat results
    ═════════════════════════════════════════════════════════════════════════
          OK: 200 SKIPPED: 0 FAILED: 1
          1. Failure: select (@test-src_organism-select.R#44)
   
          Error: testthat unit tests failed
          In addition: Warning message:
          call dbDisconnect() when finished working with a connection
          Execution halted

table id_go_all has unappealing field names

it is unpleasant to have to modify select statements when switching to
id_go_all (which has goall, evidenceall, etc.) from id_go (which has go, evidence...)

i'd propose using the same simple field names for both tables

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.