dzhang32 / ggtranscript Goto Github PK
View Code? Open in Web Editor NEWVisualizing transcript structure and annotation using ggplot2
Home Page: https://dzhang32.github.io/ggtranscript/
License: Other
Visualizing transcript structure and annotation using ggplot2
Home Page: https://dzhang32.github.io/ggtranscript/
License: Other
I realize this involves some complex design/implementation changes and it is a feature more suitable for a full genome browser which is not the goal of this package.
For cases where many smaller non-overlapping transfrags are generated during transcript assembly, it would be very helpful to be able to pack some of those on the same horizontal line/slot, thus reducing the overall height of the plot needed to show a rather large number of transcripts whenever such partial transcript fragments are present.
Of course this involves taking the transcript labels off the y axis and placing them next to each transcript (above or below the left-most exon, or centered?). This is clearly a major change in ggtranscript's current design and I do not expect to be implemented or even paid attention to, at the moment.
Currently, if users wanted to plot CDS (differentiating UTRs) they would be unable to use shorten_gaps()
.
Hi,
Is there an easy way to inverse the orientation for transcripts that are on the minus strand (to also go from left to right)?
Kind regards,
Tabea
Given just how much information can be gleaned from these plots it would be incredibly useful if plots could be interactive (to allow for zooming, moving along a transcript structure, etc.). plotly
enables some ggplot2 geoms to be made interactive via ggplotly()
, however, as ggtranscript
introduces new geoms these are not implemented.
Hi, this is not an issue, just a report about a change in ggplot2 affecting your package.
Best, Nicco
# devtools::install_github("dzhang32/ggtranscript")
library(dplyr)
library(ggplot2)
library(ggtranscript)
Running the first example
# extract exons
sod1_exons <- sod1_annotation |> filter(type == "exon")
sod1_exons |>
ggplot(aes( xstart = start, xend = end, y = transcript_name ) ) +
geom_range( aes(fill = transcript_biotype) ) +
geom_intron( data = to_intron(sod1_exons, "transcript_name"),
aes(strand = strand) )
Prints this warning from ggplot2
Warning message:
Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Session info:
packageVersion('ggplot2')
[1] ‘3.4.1’
packageVersion('ggtranscript')
[1] ‘0.99.9’
R.version.string
[1] "R version 4.1.2 (2021-11-01)"
When plotting the SOD1 transcripts, as in the package tutorial (https://dzhang32.github.io/ggtranscript/articles/ggtranscript.html):
sod1_exons %>% ggplot(aes(xstart = start, xend = end, y = transcript_name)) +
geom_range(aes(fill = transcript_biotype)) +
geom_intron(data = to_intron(sod1_exons, "transcript_name"), aes(strand = strand))
I am getting the following warning:
"Using the size
aesthetic in this geom was deprecated in ggplot2 3.4.0.
ℹ Please use linewidth
in the default_aes
field and elsewhere instead."
This aes is present in both the geom_range() and geom_intron() functions.
Thank you for this excellent work - I have been looking for something like this for a while as I often needed to explore transcript assemblies within a gene or genomic region while comparing them with reference annotation etc.
I think it would be really useful for the community (and increase the popularity/adoption of the package) to provide a step-by-step workflow example (perhaps adding a few high-level convenience functions) for a use case like this:
as.data.frame(rtracklayer::import("user.gtf"))
A genomic range would of course be required from the start (could be also used to subset the transcripts from the user provided file), with some common-sense checks (if not already implemented) to limit the genomic region width, the maximum number of transcripts etc.
Hi,
Firstly, thanks for this R package; it's already really useful to me, and it's quite user friendly :)
However, I'm having an issue with the geom_junction_label_repel 'geom' and getting different results from the examples in the README. I was expecting the lines coming from the labels to connect to the junction lines, but instead, they connect to the gene models. I have tried this both with the SOD gene example in the GitHub instructions and with my own data.
I have copied the resulting plot from copying the given code into R below.
sod1_201_exons %>%
ggplot(aes(
xstart = start,
xend = end,
y = transcript_name
)) +
geom_range(
fill = "white",
height = 0.25
) +
geom_range(
data = sod1_201_cds
) +
geom_intron(
data = to_intron(sod1_201_exons, "transcript_name")
) +
geom_junction(
data = sod1_junctions,
junction.y.max = 0.5
) +
geom_junction_label_repel(
data = sod1_junctions,
aes(label = round(mean_count, 2)),
junction.y.max = 0.5
)
My session info is below:
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.2 stringr_1.4.1 purrr_0.3.5 readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 tidyverse_1.3.2 patchwork_1.1.2.9000
[9] ggsci_2.9 ggplot2_3.4.0 rtracklayer_1.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0 S4Vectors_0.32.4 BiocGenerics_0.40.0
[17] dplyr_1.0.10 ggtranscript_0.99.9
loaded via a namespace (and not attached):
[1] bitops_1.0-7 matrixStats_0.62.0 fs_1.5.2 lubridate_1.9.0 bit64_4.0.5 httr_1.4.4
[7] tools_4.1.2 backports_1.4.1 utf8_1.2.2 R6_2.5.1 DBI_1.1.3 colorspace_2.0-3
[13] withr_2.5.0 tidyselect_1.2.0 bit_4.0.4 compiler_4.1.2 textshaping_0.3.6 cli_3.4.1
[19] rvest_1.0.3 Biobase_2.54.0 xml2_1.3.3 DelayedArray_0.20.0 labeling_0.4.2 scales_1.2.1
[25] systemfonts_1.0.4 Rsamtools_2.10.0 XVector_0.34.0 pkgconfig_2.0.3 MatrixGenerics_1.6.0 dbplyr_2.2.1
[31] rlang_1.0.6 readxl_1.4.1 rstudioapi_0.14 BiocIO_1.4.0 generics_0.1.3 farver_2.1.1
[37] jsonlite_1.8.3 BiocParallel_1.28.3 vroom_1.6.0 googlesheets4_1.0.1 RCurl_1.98-1.9 magrittr_2.0.3
[43] GenomeInfoDbData_1.2.7 Matrix_1.5-1 Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3 lifecycle_1.0.3
[49] stringi_1.7.8 yaml_2.3.6 SummarizedExperiment_1.24.0 zlibbioc_1.40.0 grid_4.1.2 parallel_4.1.2
[55] ggrepel_0.9.2 crayon_1.5.2 lattice_0.20-45 Biostrings_2.62.0 haven_2.5.1 hms_1.1.2
[61] knitr_1.40 pillar_1.8.1 rjson_0.2.21 reprex_2.0.2 XML_3.99-0.12 glue_1.6.2
[67] modelr_0.1.9 vctrs_0.5.0 tzdb_0.3.0 cellranger_1.1.0 gtable_0.3.1 assertthat_0.2.1
[73] xfun_0.34 broom_1.0.1 restfulr_0.0.15 ragg_1.2.4 googledrive_2.0.0 gargle_1.2.1
[79] GenomicAlignments_1.30.0 timechange_0.1.1 ellipsis_0.3.2
Could you please advise what to do?
Thanks!
Is there a way to add a read coverage track above (or below) the transcripts?
Either from a bigwig or bam file?
For transcripts with many exons it would useful to have the option to display the exon order numbers inside the exon (or above/below when the exon height is variable or too small?).
Perhaps a dedicated boolean option to just enable/disable the automatic drawing of exon order numbers for each transcript, with another option for its placement?
A more generic solution would be mapping such exon labels to some GTF exon attribute, like cov
or exon_number
as found in StringTie output -- maybe a label
option can be added to geom_range()
or its aesthetics. However in many cases the exon_number
attribute is missing so a helper function could be added to generate that automatically in that case..
As for labeling junctions, I suppose a labeling option could be added to geom_junction()
to enable showing the numeric coverage values (supporting reads) for each junction, above the junction curve for top curves, or below for bottom ones.
Hi,
I like the package a lot but I can't figure out how to use facet_wrap to show transcript expression at different timepoints. I can get one plot to work easily but when I try to wrap them the arrows work correctly only for the first leftmost plot.
prop_plot_introns <- ggtranscript::to_intron(prop_plot, "feature_id")
p <- prop_plot %>%
ggplot2::ggplot(ggplot2::aes(
xstart = start,
xend = end,
y = feature_id
)) +
ggplot2::facet_grid(cols = vars(DAI)) +
ggtranscript::geom_range(
ggplot2::aes(fill = Proportion)
) +
ggtranscript::geom_intron(
data = prop_plot_introns,
ggplot2::aes(strand = strand)
) +
th +
ggplot2::scale_fill_gradient2(
high = "darkorchid",
low = "white",
mid = "darkorange",
midpoint = 0.5
# midpoint = min(prop_plot$Proportion) + ((max(prop_plot$Proportion) - min(prop_plot$Proportion)) / 2)
) +
ggplot2::guides(fill = ggplot2::guide_legend("Proportion"))
print(p)
where prop_plot
:
gene_id feature_id chr start end strand DAI TPM Proportion
AT5G20240 AT5G20240.1 Chr5 6828904 6829457 + 0 109.705848163636 0.201793165193519
AT5G20240 AT5G20240.1 Chr5 6828904 6829457 + 4 519.364511065168 0.164112890096364
AT5G20240 AT5G20240.1 Chr5 6828904 6829457 + 8 515.17501570544 0.12987240173574
AT5G20240 AT5G20240.1 Chr5 6830455 6830516 + 0 109.705848163636 0.201793165193519
AT5G20240 AT5G20240.1 Chr5 6830455 6830516 + 4 519.364511065168 0.164112890096364
AT5G20240 AT5G20240.1 Chr5 6830455 6830516 + 8 515.17501570544 0.12987240173574
AT5G20240 AT5G20240.1 Chr5 6830637 6830736 + 0 109.705848163636 0.201793165193519
AT5G20240 AT5G20240.1 Chr5 6830637 6830736 + 4 519.364511065168 0.164112890096364
AT5G20240 AT5G20240.1 Chr5 6830637 6830736 + 8 515.17501570544 0.12987240173574
AT5G20240 AT5G20240.1 Chr5 6830809 6830838 + 0 109.705848163636 0.201793165193519
AT5G20240 AT5G20240.1 Chr5 6830809 6830838 + 4 519.364511065168 0.164112890096364
AT5G20240 AT5G20240.1 Chr5 6830809 6830838 + 8 515.17501570544 0.12987240173574
AT5G20240 AT5G20240.1 Chr5 6830927 6830971 + 0 109.705848163636 0.201793165193519
AT5G20240 AT5G20240.1 Chr5 6830927 6830971 + 4 519.364511065168 0.164112890096364
AT5G20240 AT5G20240.1 Chr5 6830927 6830971 + 8 515.17501570544 0.12987240173574
AT5G20240 AT5G20240.1 Chr5 6831074 6831515 + 0 109.705848163636 0.201793165193519
AT5G20240 AT5G20240.1 Chr5 6831074 6831515 + 4 519.364511065168 0.164112890096364
AT5G20240 AT5G20240.1 Chr5 6831074 6831515 + 8 515.17501570544 0.12987240173574
AT5G20240 AT5G20240.2 Chr5 6828987 6829457 + 0 433.949077207243 0.798206834806481
AT5G20240 AT5G20240.2 Chr5 6828987 6829457 + 4 2645.31384393917 0.835887109903636
AT5G20240 AT5G20240.2 Chr5 6828987 6829457 + 8 3451.60321292623 0.87012759826426
AT5G20240 AT5G20240.2 Chr5 6830455 6830516 + 0 433.949077207243 0.798206834806481
AT5G20240 AT5G20240.2 Chr5 6830455 6830516 + 4 2645.31384393917 0.835887109903636
AT5G20240 AT5G20240.2 Chr5 6830455 6830516 + 8 3451.60321292623 0.87012759826426
AT5G20240 AT5G20240.2 Chr5 6830637 6830838 + 0 433.949077207243 0.798206834806481
AT5G20240 AT5G20240.2 Chr5 6830637 6830838 + 4 2645.31384393917 0.835887109903636
AT5G20240 AT5G20240.2 Chr5 6830637 6830838 + 8 3451.60321292623 0.87012759826426
AT5G20240 AT5G20240.2 Chr5 6830927 6830971 + 0 433.949077207243 0.798206834806481
AT5G20240 AT5G20240.2 Chr5 6830927 6830971 + 4 2645.31384393917 0.835887109903636
AT5G20240 AT5G20240.2 Chr5 6830927 6830971 + 8 3451.60321292623 0.87012759826426
AT5G20240 AT5G20240.2 Chr5 6831074 6831515 + 0 433.949077207243 0.798206834806481
AT5G20240 AT5G20240.2 Chr5 6831074 6831515 + 4 2645.31384393917 0.835887109903636
AT5G20240 AT5G20240.2 Chr5 6831074 6831515 + 8 3451.60321292623 0.87012759826426
Do you have any idea what goes wrong here? I suspect facet_wrap doesn't play well with the fact that geom_intron is passed it's own dataset.
Thank you,
Andrea
The to_diff()
function is great for showing differences between exons, but sometimes it is very useful to highlight differences between introns and individual splice sites.
Not sure if it wouldn't make more sense to make it a dedicated to_jdiff()
helper function for this , since I can imagine this might be a bit more complex as we could be looking at highlighting per-splice site differences - i.e. highlighting alternate donor vs. alternate acceptor vs. both (wholly novel introns)
Is there a way to create a plot with shortened gaps that still shows exon vs CDS regions?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.