selkamand / ggoncoplot Goto Github PK
View Code? Open in Web Editor NEWEasily Create Interactive Oncoplots
Home Page: https://selkamand.github.io/ggoncoplot/
License: Other
Easily Create Interactive Oncoplots
Home Page: https://selkamand.github.io/ggoncoplot/
License: Other
Should be powered by the rank package.
There's a commented section that indicates where sample sorting code should go (right before refactor of clinical & mutational dataframes. No need to use inbuilt sorting functionality of gg1d package
When collapsing multiple mutations in the same gene down to 1 row - when all mutations have the same classification do we classify as multiple or just the classification itself
Or potentially add greater margins around legend so that its forced away from edges of drawing screen
To the right side of an oncoplot, we should optionally plot a barplot showing # of samples with gene mutated (fill colour based on mutation type)
I need to be able to unit test the data transformation code required to plot an oncoplot.
Currently data transformation code is packaged in the same function as ggoncoplot. I should pull out data transformation into a separate
function e.g. ggoncoplot_data_prep
- then i can unit test that separately to the visualisation code
Originally discovered with unpublished data, will need to find a public / simulated dataset to replicate
Currently mutations are droped if col_mutation_type is NA. This is horribly incorrect behaviour. Add a unit test to detect it, then fix it
Oncoplot gene rankings are inverted. Tests should have picked this up.
fix unit tests to pick up gene ranking order appropriately
Current tests
# Check dataframe has required names
expect_named(prepped_df, expected = c('Sample', 'Gene', 'MutationType', 'Tooltip'), ignore.order = TRUE)
expect_named(prepped_df_no_mutation_type, expected = c('Sample', 'Gene', 'MutationType', 'Tooltip'), ignore.order = TRUE)
Should we add test for data_id column
CRAN is the best place for this package, but currently package size is > 5mb limit.
Whats blowing out our size:
Both of these have nothing to do with the actual package functionality, so should be super solvable
2 is the easiest to solve - just move MAF csv files / R dataframes that we're using for testing into its own github R package with functions that stream the data. We can then install this package and since the data is only used for testing and docs we can add as a suggests (not an import a.k.a required dependency)
1 is a little trickier. Its probably all the interactive plots in the vignette. storing these will require some space. Rendering static plots would save the space but really take away from the documentation. Best solution would be to keep docs big but decouple from the R package. Not sure the best way to do this without causing too much pain long term. Its just so convenient to use vignettes and CI workflows. Will need more thought
gbm_df |>
[ggoncoplot](https://selkamand.github.io/ggoncoplot/reference/ggoncoplot.html)(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode',
#col_mutation_type = 'Variant_Classification',
# topn = 10,
# interactive = TRUE
)
to
gbm_df |>
[ggoncoplot](https://selkamand.github.io/ggoncoplot/reference/ggoncoplot.html)(
col_genes = 'Hugo_Symbol',
col_samples = 'Tumor_Sample_Barcode'
)
This will power linkage of distinct interactive plots and is the most important feature to make this a useful feature for multiomics data visualisation / exploration
should default to false.
Reason for implementing:
It allows col_mutation_type to relate to anything -e.g. colour by pathways
Think about:
gg_tmb_height
and gg_gene_width
are currently ignored by combine_plots
similarly, there is no option gg_metadata_height
option yet.
We need to
add gg_metadata_height as a user-configurable paramater
rework combine_plots to respect gg_tmb_height
, gg_gene_width
and gg_metadata_height
ggiraph supports running javascript on click events (without shiny)
See below for details
https://davidgohel.github.io/ggiraph/articles/offcran/using_ggiraph.html#using-onclick-1
One typically annoying thing about oncoplots is seeing interesting samples and having to copy out sample IDs. It would be more convenient to just click the id and automatically copy the sample name.
In javascript, you can write text to clipboard
navigator.clipboard.writeText('text')
Could fire this on an onclick event
Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
i Please use all_of(var)
(or any_of(var)
) instead of .data[[var]]
Now that we've created datasets for testing oncoplots that covers all the edge-cases I can think of, we should recreate our unit tests using just this smaller dataset
title is wrong and code doesn't show both TMB and gene barplots
Building a general function to do this will be useful in segmenting testing logic but also re-usable for other plots
To get the grey tiles on umutated squares we render a base tile layer of gray, then render colour over that
This may lead to nontrivial increases in render time for large cohorts (untested).
Lets change this to only render grey on the tiles that won't have mutations present - basically means we filter the data first
opts_selection(only_shiny = FALSE, type = "single", css = "stroke:yellow;")
Maybe should do multiple selection. Would be useful if you could select a bunch of samples, so that you could see where they fall on a matched RNAseq tsne, for example
gg1d is complex enough that it warrants ...
argument support
[X] samples with metadata have no mutations. Fitering these out
โน To keep these samples, set metadata_require_mutations = FALSE
. To view them in the oncoplot ensure you additionally set show_all_samples = TRUE
Then lists all samples - even if there are hundreds.
If theres > 10 samples missing, just print the number
Problem:
To maximise flexibility of ggoncoplot, we don't force the mutation types defined by col_mutation_type
to align to any ontology. The end-user can use whatever mutation types they like. The problem with this is that this makes it difficult to automatically choose colours for these different mutation types in a manner thats consistent across different datasets.
Currently, we use an RColorBrewer palette and decide which colour is attached to each mutation type based on the frequency of the mutation types. To demonstrate why this is not ideal lets go through an example. Say you produce an oncoplot for two different cohorts, one of which is dominated by missense mutatons, the other by silent mutations. In one of these oncoplots missense mutations will be the same colour as silent mutations in the other. This would be extremely confusing.
potential solutions
Force users to use some ontology for 'mutation_type'. Then we'll know all the possible mutation types in advanced and can make a single manual palette that maps each value to a colour consistently no matter what data is input. Major downside is the lack of choice for the end user. It may also be a lot of work for end-users to convert their mutation_type ontology to whatever we enforce. What ontology should be enforced? Should we try and guess at the mapping based on names of mutation_type? we might be able to provide mappings from one ontology to another to help users streamline data preprocessing
We force users to define a mapping of mutation_types to colours. We make sure they have accounted for every value in their dataset. We could help with this by providing users with a basic example palette they should supply ggoncoplot
. ggoncoplot would error unless user supplied this mapping.
Both -- force an ontology UNLESS user supplies a palette mapping all mutation_types to colours. Best of both worlds
Each potential solution has its benefits and drawbacks. 1 is more work for the end-user but will make it easier to integrate ggoncoplot in shiny apps and pipelines. 2 is easier and more flexible for end-user, and allows domain-specific mutation_types to be used (e.g. there'd be the option to colour mutations based on germline/somatic origins in cancer data visualisation). 3 Is more work for me, and adds some complexity to the usage BUT with some careful info/warning messages sent to cli we could probably make this quite intuitive for end-users
Plan of attack
As we add sample annotations, each needing their own legends (for non-interactive plots at least), it might be better to collect all legends, on the right side of the plot:
example below
Original Source:
How I found it
https://www.biostars.org/p/9473274/
This solution is clear, easy to implement and solves the core problem
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.