Love the package. Saw an issue I want to discuss.
Okay so I was using the Mus.Musculus AnnotateDbi database package to make a genomic ranges object containing all of the promoters and because I have access to your wonderful package through the AnnotateDbi framework I can now slap all that wonderful metdata on to this object in one go instead of having to merge 3 different databases together via ENTRZID.
Here was the code I ran.
# Package setup
BiocManager::install("OrganismDbi")
library(OrganismDbi)
BiocManager::install("Mus.musculus")
library(Mus.musculus)
BiocManager::install("GenomicFeatures")
library(GenomicFeatures)
# Making this object just for comparison
Mm_gene <- transcriptsBy(Mus.musculus, by="gene", columns=c("SYMBOL", "ENTREZID", "TXCHROM", "TXSTART", "TXSTRAND", "CDSSTART"))
Mm_gene
# Here is the promoter object. You can see I'm calling 1500 bp upstream of the transcription start and 500 bp downstream of the transcription start site my "promoter region" for this analysis.
Mm_gene_promoters <- promoters(transcriptsBy(Mus.musculus, by="gene", columns=c("SYMBOL", "ENTREZID", "TXCHROM", "TXSTART", "TXSTRAND", "CDSSTART")), upstream = 1500, downstream = 500)
Mm_gene_promoters
Below are screenshots of the outputs.
Mm_gene
![image](https://user-images.githubusercontent.com/84940857/205757744-bed21df5-6e35-4cd2-8f64-44ffa64ab3ad.png)
Even though the transcripts are on the minus strand the database is calling that start of the transcript as the first base pair from the genomic range object.
Here you can see that the promoters() function from genomicFeatures gets it right and assigns my promoter region as 1500 bases upstream and 500 downstream to the transcription start site for Zglp1 which is coming from the minus strand and should be adding 1500 bp to the last bp of the genomic ranges and then subtracting 500 bp to get the correct ranges.
This is something I saw and was curious if the TXSTART metadata coming from the Mus.Musculus package was just being scraped from the first base pair of the genomic ranges. This would be super simple to add in an "if loop" and have it grab the last base pair in the ranges instead for transcripts on the minus strand. Otherwise this is going to lead to some confusion from people trying to use this metadata and not knowing where these numbers are coming from.