cidgoh / virus-mvp Goto Github PK

VirusMVP is an interactive heatmap-centric app that integrates viral genomic mutations, lineage information and curated functional impact to study the spread and evolution of viruses in Canada and globally.

Home Page: https://virusmvp.org/

License: MIT License

Python 92.91% JavaScript 6.77% Dockerfile 0.29% Shell 0.04%

bioinformatics covid-19 dash plotly sars-cov-2 visualization

virus-mvp's People

Contributors

Stargazers

Watchers

Forkers

anwarmz cancogen-v

virus-mvp's Issues

Improved footer

Footer was made pretty quickly prior to first official release.

Make it look more professional, and also include link to source code (github) and contact email.

Add an option to download Surveillance report

Somewhere on the visualization, need to add a button that can be used for downloading surveillance report
Surveillance report will be generated during the genomics workflow when the user uploads an input file

Tabular view at bottom of application

Good place for table, mutation details currently in popup, and more

Toggling switch on/off -> improper heatmap sizing

Toggling clade defining mutations switch on and off, and then resizing the window, resizes the heatmap cells. The bars should stay the same size at all times, with scrollable overflow.

Good place to start investigating would be the clade defining mutations switch callback. There is something different about how the heatmap is first rendered on application launch, outside the callback, because this bug does not happen without toggling the switch.

Change dp==0 to N=1

`test_data` files need to be updated

Missing sample size attr. May be more problems.

Add nested checkboxes to select lineages modal

e.g,

Epsilon
- B.1.427
- B.1.429

You can select all Epsilon lineages by clicking the header, or individual ones.

Add note on adjusting browser zoom in help box

Distinguish VOC and VOI

Encode binary VOC and VOI attributes in heatmap

Ability to filter lineages by mutations

Some end users have expressed interest in being able to filter lineages by mutations.

Add single genome input option

Enhance input options by adding single genome vcf file for high-throughput analysis.

Loading indicators

Good to have loading indicators for operations that may take a while

e.g., regenerating the heatmap cells figs when there are a lot of strains or running the pipeline

Simplify hover display

Should round off alt freqs and think of a better way to summarize mutations instead of listing as many as we can fit into the hover box

Limit number of visible rows in heatmap

Heatmaps are pretty good at visualizing a large number of rows and columns, but our cells are pretty big and resource intensive due to marker annotations, hover boxes, heterozygosity spacing considerations, etc.

We might need to limit the number of rows a user can see, while still allowing user to select from a pool of rows exceeding that number from the select lineages modal. Another consideration is whether we want to make the heatmap cells and y-axis scrollable, and then limit the number of rows visible based solely on performance considerations--not screen real estate. Scrolling is easier than having to select lineages.

Add data from VirusSeq data portal

Add gvf & surveillance files from VirusSeq data portal for the main release

Display sample size

Encode sample sizes. One idea is to have a column to the right of the heatmap.

Display single genome inputs

Related #56

We will need to display these without colour since frequency is not a factor.

Things to consider for distinguishing single genomes from lineages:

All black or white filled squares
Different shaped markers?

Uploaded file not immediately displaying after selecting lineages

Change lineage order of hide lineages, upload file, and rendering error

Refresh--rendered just fine

Improve documentation

Started in #44

Things to do:

Instructions for interacting with submodule
Instructions for windows users

Not changing strain order still reloads data

This is a regression after #118

Open select lineages modal. Do not change anything. Click "OK". Data reloads.

Expected behaviour: data does not reload

Thicker heatmap cell borders

Some end users have expressed difficulty parsing the borders on their monitors

Encode silent vs non-silent mutations

Display a subset of lineages on start

After #66 and #75 are settled, we may want to display only a subset of lineages to begin with. With all lineages displayed, the heatmap cells callback takes about 1 second to run, which slows things down a bit.

Deploying on website without user upload option

We need to have a separate version of COVID-MVP (possibly on website branch) that doesnt have user upload option
We will still need surveillance report downloading option for the data we will be hosting - VirusSeq Data portal

More stuff in graph legend

Bold letters in x-axis == intergenic

Plus == insertion

X == deletion

Indicate cells with functional annotations

One idea: thicker borders

Process uploaded FASTA files

Keep help box open

Perhaps move it to the top of the page

Clicking cell shows heatmap loading indicator

Investigate the cause of this

Add CIDGOH logo and contact info to interface

Replace n/a in hover with more accurate term

N/A could imply we did not look for a function for a mutation or that we are missing data. We should use something more specific, that more strongly implies a function has not yet been discovered for a mutation.

Add non-GISAID data input option

Enhance data input options to accommodate non-GISAID (no metadata) sequences or .fasta sequences and list of ids as .txt file

Linked to #53

Change favicon and title to something more appropriate

Improve COVID-MVP performance

Getting slow now with ~700 lineages

Possibilities:

Greater parallelization of viz generation. Maybe one diagram per heatmap row?
Improve data parsing efficiency when only slight changes are made, or split up data parsing into parallel code for each lineage

"Jump to" fn to supplement scrolling

Scrolling to a position along the x-axis may get annoying as the number of mutations increases. Need a way to jump to a position directly.

One possible solution: search fn that allows you to jump to a specific nucleotide position.

Annotate problematic sites

Display fns as normal but let users know sites are problematic

Started in #48

Heatmap gene bar, aa axis, and nt pos axis font size decreases when resizing resolution

Heatmap div not resizing after deleting samples

When a user deletes a user-uploaded sample, the outer div surrounding the heatmap does not resize as needed.

Split ORF1ab into ORF1a and ORF1b

try issue from slack

https://www.nature.com/articles/s41396-023-01368-2

Slack Message

Select/unselect all lineages button in select lineages modal

Smaller font size and indel markers

Further improvements to caching

Cache does not seem to be re-accessed on page reload--fix this
Some of the other intensive operations could be cached (e.g., heatmap generation)

Integrate Nextflow commands into visualization module

Currently, data_parser.py is converting VCF to GVF which is then parsed for visualization. Lets move towards integrating nextflow commands now which will require following updates

Allow user to upload .vcf or .fasta or .tsv file.
Need a check to confirm the file type based on which this nextflow command will be structured

nextflow run nf-ncov-voc/main.nf -profile < conda | singularity | docker > --prefix < unique_user_id > --mode user --input_type < vcf | tsv | fasta > --userfile < user_uploaded_file > --outdir < output directory where files will be generated >

Hide strain
Upload strain with same name
Expect: error; Result: success