nextstrain / nextclade Goto Github PK

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement

Home Page: https://clades.nextstrain.org

License: MIT License

JavaScript 2.03% TypeScript 45.01% Python 1.56% Dockerfile 1.03% Shell 4.03% SCSS 0.92% Makefile 0.10% TeX 0.59% Rust 43.65% Perl 0.27% MDX 0.41% Jupyter Notebook 0.38%

clade clades coronavirus covid covid-19 dna ncov neherlab nextstrain research

nextclade's Issues

Add ability to cancel current task

We want to allow users to stop the currently running task.

For example, there might be a button, which would trigger this action. The implementation should terminate all tasks in worker pools (both, parser and analysis) and return the app into the idling state.

The partial results should probably be discarded. However we may also consider keeping them, especially if #30 is implemented. One nuance to watch out is the export button. We may or may not allow exporting the incomplete results.

In this case we may also add a button to clear the current results, which would return the app to pristine state.

This feature will be handy for users if the run takes too long and they are not willing to wait or if they are running the wrong file by accident. It will be a cleaner solution than just page refresh.

QC: score-based output

We might want to extend the current QC algorithms to not only produce binary Yes/No output, but also provide more fine-grained results.

For example, for every rule we may produce a numeric score.

These scores can then be used to sort rows by severity of issues and to provide user with more fine-grained message text, upon which it will be easier to take an action.

It is preferable for scores for different rules to be numerically comparable for easier sorting and filtering. That means that more severe issues should have higher weights and produce higher "badness" (or lower "goodness") score. We may allow user to tweak these weights for their needs, to adjust desired QC strictness.

Add user's giude

We want to explain our users the basics of how our app operates and what they are seeing.

We want to cover particularly carefully the places where we use some conventions (color, notation, terminology, etc.), no matter how well-known they are.

Add landing page

Add landing page which will contain the basic information about the application as well as file upload box to launch the algorithm. After algorithm is triggered, the app should navigate to another page with results.

Deduce genome length from root sequence length

Currently it's hardcoded in a constant.
However we want to support multiple viruses in the future, so this should be dynamic.
The best way is probably to take the length of the root (reference) sequence.

Prettify footer

Currently the Footer component is a complete disaster.

We need to sort out what things we want there.

Vercel logo+link must be present on the page (in footer or otherwise)

scrollable container for table #35

Add status panel showing current filtering and sorting criteria and number of sequences displayed out of total

When filtering and/or sorting is applied to the results table(i.e. not all data is shown or shown in different order), we want to give user a visual reminder, for example in the form of a status panel or status bar with text showing what filters and what sort exactly are in use.

We may also display the number of rows shown and total number.

This is to prevent confusion, especially about hidden data rows.

Attach gene map to the bottom-most sequence

Currently the gene map is firmly attached to the bottom of the available table space. This creates a large gap between sequences and gene map when there aren't many sequences.

We want to attach the gene map right after the last sequence in the table, bringing it closer. This should facilitate tracking and comparison.

It might be trickier that usual, because we are using a custom flexbox layout and react-window to manage the table markup and style, especially the positioning.

There might be a decent javascript solution - the decision where to put gene map is made depending on whether sum of heights of rows is greater than the available height (calculated with already present autosizer wrapper or otherwise). There might also be a CSS solution, using position: sticky, or a mixture, using css-in-js (we use styled-components).

#98 Add marker showing current mouse position in sequences and gene map

Remember previous results

We want user to be able to access previous runs.

We may add a table on resutls page that would list previous results and would allow to switch current results table to one of the previous ones.

We might also save the results in local storage (watch out the size)

Add "Rerun" button

Add a button on results page that triggers analysis of the current data all over again, exactly the way it happens after "uploading" a file on main page.

Detect and report input file parsing errors

Currently input files are not being fully verified during parsing.
If I upload a random text or binary file the app treats it as a file containing 1 sequence. This sequence is then being sent to the analysis, which, of course, fails immediately at the alignment stage. Even worse, empty files seem to be driving the app into infinite loop.

We might want to detect if the data resembles sequencing data and tell user early if it's not.

Despite the lack of data sanitation and validation, there is no obvious security issue here because the entire processing is happening on the client side and the users themselves are responsible for feeding the incorrect files. However, the UX would be better if we could help user to spot and report the incorrect "upload". For example, user may accidentally feed compressed sequence file.

Add more data in clade makrs tooltips: list all nucleotides and clades for this position

The format is as follows

Position 8072
'C' : 20A, 20B, 20C
'T': 19A, 19B

🌍 Translation. We are looking for translators!

We are looking to extend and improve our translations.

We are using i18next and react-i18next modules in the app.
The extraction of English strings is performed with yarn i18n.
The extracted strings are in packages/web/src/i18n/resources

We use GitLocalize service to perform the translation:

Step-by-step guide:

Register with GitLocalize, for example using your GitHub account: https://gitlocalize.com
Go to our GitLocalize repo: https://gitlocalize.com/repo/4819
Click a language name from the table (other than English)
Navigate to packages web/src/i18n/resources/en/common.json file or use these direct links for some of the languages:

If your language is missing, please write a message here and we will add it

Translate the strings: look at the string in the leftmost column of the table and write the translated result into the rightmost column. Do not modify names inside double braces {{ }} - these are used in the program to substitute values, such as numbers and other strings. These names could give you a hint as to what will be substituted.
Note, there might be multiple pages in this table (the pagination is in the bottom of the page)
Note that there's a "Machine translation" button to get you started with some basics, but the most often automatic translation requires manual correction and rephrasing
After you are done, Click "Create review request"
If you have this capability, click "Create Pull Request" (otherwise write in this issue and one of the maintainers could grant you permissions)
GitLocalize will create the pull request with translated strings so that we can integrate them into the app in the next release

Export of results

The current export returns a json structure which contains the results, along with a bunch of other things (like processing status, redundant clade definitions, etc). Most users will want to export a file they can view, share, and analyze in Excel or similar.

I think a tsv table with the following columns would address most needs:

sequenceName
clade
alignmentStart
alignmentEnd
numberOfMutations
numberOfNonACGTN
numberOfN
mutations: C242T,C14408T,A24304G,..
deletions: 567-572,23304-20423,...
insertions: 678:ACG,23565:GTGTCG,...
missing: 16121-16525,27343-27565,...
QC-Fail/Pass
QC-flags: 'too many mutations','SNP clusters'

Some of the fields are themselves arrays (like the mutations) and hence a table is not ideal. But if we provide json export as well, I think we address most needs.

Use virtualization (windowing) when loading table rows

When there are many sequences in the results table, the app becomes very slow.

We have to limit the amount of rows displayed and to only render rows that are currently visible

For, example this can be achieved with
https://github.com/bvaughn/react-window

scrollable container #35
searchable and filtering #25

QC: new structure

We want to restructure our quality control (QC) system:

have a set of independent QC rules
each rule has a set of configuration parameters (e.g. thresholds, coefficients, etc.)
each rule outputs a quality score (#105) as well as optional rule-specific outputs
each rule has a specific way to be rendered in the UI and in the export files (using rule-specific outputs)
each sequence can be highlighted with color ranging from pale yellow to bright red, depending on total QC score
rules can be configured or disabled independently

Probably:

treat alignment quality and failures as a QC rule, with it's own score (very high weight)

Cleanup tooltips

Each tooltip should only show information which pertains to the tooltip's target.

The tooltip for sequence name column should gather all these pieces together.

Color rows with QC issues with a "warning" color

We might want to emphasize the sequences with QC issues so that they are easier to find in the table.

Similarly to how we tint rows containing errors in "danger" red color, we may tint rows that contain QC issues in "warning" yellow or orange color.

We may vary tint intensity depending on number and severity of the issues. For example, rows with only slight quality degradation will be light-yellow, while rows with many problems will be in saturated orange.

This "gradient" coloring will compose well with the current coloring of errored rows.

Strip common prefixes in sequence names

We may save some screen real-estate by removing common prefixes in sequence names (and thus reducing width required by "Sequence width" column, and increasing width of sequence views).

Consider for example sequence names

hCoV-19/USA/FL-Miami-06_UMTL-A388/2020
hCoV-19/Madagascar/MA-Antananarivo-01_XXXX-A999/2020

We may strip hCoV-19/, as it does not bring a lot of valuable information (we only ever do this virus currently) and names would become:

USA/FL-Miami-06_UMTL-A388/2020
Madagascar/MA-Antananarivo-01_XXXX-A999/2020

Which is significantly shorter. This is also what Nextstrain does, so this would not be something unexpected.

We probably want to stay away from any automatic detection of prefixes, because there isn't a standard for how sequences are named. We may present users with some automatic suggestions though, leaving decision to them.

Prefixes should only be removed upon displaying, no actual data should be modified.

Add global quality control indicator

After analysis is done we may display an indicator of the quality of sequences in the batch.

It could replace the progress bar and be just another progress bar with multiple stripes - green stripe will indicate number of passed sequences, and remaining red stripe will show number of failed sequences.

Yellow stripe can be added when we have QC metrics with warning severity.

Add a README

Make table sortable and filterable

Allow for sorting and filtering table by sequence names, mutations, possibly other properties.

Possible implementation: react-table
https://github.com/tannerlinsley/react-table

Add marker showing current mouse position in sequences and gene map

Add vertical line to every sequence view and gene map that is positioned according to horizontal position of the mouse cursor.

This should allow for better tracking of which position in a sequence belong to which gene in a gene map, especially if sequence is far away on screen from the gene map.

Attach gene map to the bottom-most sequence

Speedup rendering of sequence views

Currently the SVG bars in the rightmost column of the results table are extremely slow to render. There is a visible delay when newly analyzed sequences appear and when scrolling the table.

We want to try to make rendering more efficient. The solution might be as simple as wrapping some of the components and functions in React.memo()/useMemo().

Eslint and typescript issues are still reported even after being fixed

When working in dev mode, once eslint and typescript issues appear in the console, they persist across incremental builds even after fixed in the actual source code.

I expect errors to go away as long as they are no longer present.

I think they might be cached somewhere and cache is not flushed on build.

Variable column width

It would be nice to be able to change the width of particular columns. This would allow for example, to increase or decrease the size of the sequence name column, depending on a given naming convention and length of those names.

At first we may simply add it as an option in the settings dialog.
Ideally, to make left and right borders of cells draggable, similar to how it's implemented in Excel and other spreadsheets. This might be tricky due to table virtualization (react-window).

We may go even further and to allow changing with of the entire table. This would require some tweaking of the page container styling and thorough testing across device sizes.

Add filtering and sorting reset button(s)

When filtering and/or sorting are applied to the results table, we might give user a shortcut to reset these to default state (i.e. no filtering and sort by id column).

We might:

add button with "cross" icon for each text fields, these will remove this particular filter.
add global filter reset button that will reset all the filters
add sorting reset button that will sort the table by id (default sorting)
if status bar is implemented (#102) add global reset buttons to it as well

QC: algorithmic improvements

ingest the frequency of each mutation on nextstrain then use this to flag rare mutations
This introduces additional input to the QC rule that evaluates number of mutations
exclude mutations on the edges of sequences from the consideration by QC (similarly to how nextstrain does), or, perhaps, weight QC scores lower for these fragments

Make example data more interesting

To attract a first-time user's attention we might add some example data that allows to showcase some of the interesting features of the application.

That would include, for example:

more sequences (roughly to cover table height on a typical 1080p screen), to add diversity and demonstrate analysis speed
several sequences with QC warnings
a few sequences with errors
sequences from different countries, with names that can be filtered in an interesting way
sequences with mutations and aminoacid changes that can be filtered in an interesting way
sequences targeting specifically on demonstrating a particular feature

This does not have to be real-world data, can be any dummy sequences, with appropriately dummy names.

Care should be taken to not break any licenses. If sequence is edited (e.g. to showcase gaps and corresponding QC feature), then name of the sequence should be modified.

We may implement a little interactive walktrough demo in the form of a series of popup boxes or alerts, encouraging user to try some of the features. This demo would only be available when loading example data and with a possibility to skip and disable it (with a flag saved in the local storage).

Format mutations text using common notation

For example: "C123A" and "C -> A"

Prettify on mobile

We want to make a first good impression on our users and the fact that the app is advertised on Twitter means that the most of the first views will probably come from mobile useragents. It would be nice to have a decently looking main page on mobile.

Making results page look good on mobile would be a tough task due to tabular layout. But we don't expect our users to have sequence data on their phones, neither the computational resources are appropriate for the task. So results page will have to stay desktop-first.

Put results table in a vertically-scrollable container

Put results table in a fixed-height container such that the table is scrolled separately, without the entire page being scrolled.

Currently, where there are many sequences there might be a lot of scrolling to do just to reach the "Back" button or any other controls on top. Same for Gene Map

This wall make all of the controls accessible at all times. In particular, the Gene Map will always be visible at the bottom (outside scrollable container) and user can scroll a particular sequence closer to it.

Problem: the page will not have scrolling, but we don't necessarily want to waste precious space showing the footer. He have to show Vercel logo somewhere as it was a part of the sponsorship deal though. (#41 )

table virtualization/windowing #40
zoom feature #34
footer #41

Add contact/feedback form

We are exited to hear back from our users, however currently we are not providing any means of contacting us in the app, so that only people familiar with GitHub can do so.

I propose to add a chat button in the corner what would pop up a chat window where users could compose a message. This message is then to be sent to the project's email box, where maintainers could review the messages and reply.

There are many third-party services exist that make integration of these kind of buttons fairly simple.
The email box is to be set up first.

Setup all-contributors

https://github.com/all-contributors/all-contributors

Add possibility to upload data from results page

Currently user has to go back to main page in order to "upload" more data. We want to avoid this unnecessary navigation by allowing to upload data right on the results page.

add button which opens dialog with the file uploader widget, same as on main page
make sure it plays well with future result persistence feature #30

#30
#121

Algorithm: find closest neighbor on nextstrain tree

This is an idea for the new measurement.

For every sequence we want to attempt to find closest neighbor on nextstrain tree. Perhaps, on a coarser version of it, to make it faster.

Additionally, we may infer where or when a sequence was common.

Find appropriate number of workers for the worker pool

Currently workers don't use all available resources.

First of all, the size of the worker pool is hard limited to 4:
https://github.com/neherlab/webclades/blob/45b9930b819e3ffec60b0f01be53a844944a1e1c/packages/web/src/workers/createWorkerPools.ts#L8

Secondly, heavy rendering on main thread delays workers from getting work items, which causes unnecessary idling in worker processes.

The reason is that the task queues which feed the workers is in the memory of the main process, along with rendering. If main process is busy (with rendering SVGs in sequence views for example), free workers cannot pop tasks from the queue.

Ultimately, there is no currently a solution for that in JS, because there is no shared memory (where the queue would be stored and accessible to all threads/processes at all times, like in native environments)

The problem can be mitigated by oversubscribing the available resources. For example, you may create 8 workers, even if there are only 4 physical processors. This way workers will compete for resources more aggressively, and if 4 of the workers are done with their tasks and are unable to get new work items, there are still 4 others that haven't finished and they will keep all 4 CPUs busy. This basically abuses the OS scheduler, using it as an additional task queuing mechanism (where items are temporarily stored in preempted worker's memory until it is scheduled for execution again)

The disadvantage of course is that too much oversubscription may lead to too much context switching overhead. That is, workers will be scheduled and unscheduled periodically to run on cores, so that each can make progress. This context switching takes time and wastes CPU cycles. And there is no way for us to set any scheduling priorities.

So there is a balance we should find when choosing the number of workers. I think that the navigator.hardwareConcurrency + 2 is a safe bet. We may also expose this setting to users.

There is an experimental work on SharedArrayBuffer happening in Mozilla for a few years
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer/Planned_changes
It does not seem production ready and AFAIK there were some serious security concerns in the past. Neither Chrome, nor our library (Threads.js) supports it currently. So we probably don't want to go there quite yet.

Implement zoom and horizontal scrolling for sequence views

I imagine it to be like Google Maps navigation but in 1 dimension.

Need to figure out what's the best user experience here would be, because vertical scrolling is an important part of the table widget (and should take precedence).

We may for example enable zoom when holding Ctrl
UPDATE: Ctrl would not work, because Ctrl + Mouse wheel triggers browser zoom.

We all know Ctrl-to-zoom is much more annoying than modifier-less zoom, so alternatively (or additionally to), we might have a button to toggle Ctrl modifier requirement (enable/disable "zoom mode")

In any case there should also be buttons for zoom in, zoom out, scroll left scroll right which move the viewport by a given amount of pixels.

Implementation should be aware of screen width, mobile clients. OS differences should be take into account (What was that "different" Ctrl key on Mac again?)

This should not conflict with zoom or other features in browsers or commonly used browser addons.

Completely different approach could be to just rip-off a system that Nextstrain uses in Entropy/Diversity section, however there might be some unexpected challenges, because we may have a lot of rows we and I expect d3.js to be extremely slow. React reimplementation of this feature may be fine though.

Related: scrollable container feature #35

Add favicon and common SEO tags

Make a releasable draft of the "about" section

I made a terrible mess there, @rneher could you please clean this up and prepare for the end-user's eyes:
https://github.com/neherlab/webclades/blob/feat/landing/packages/web/src/components/About/AboutContent.mdx

Link mutations to nextstrain.org

We want to add clickable links into mutation tooltips. These links would lead to the corresponding mutation view in nextstrain.org/ncov colored by the mutated nucleotide.

For example, the mutations at position 11083 would have a link that leads to this URL:

https://nextstrain.org/ncov/global?c=gt-nuc_11083

This is a great way to integrate the tool with nextstrain.org and will hopefully help to grow the ecosystem further.

This would require tooltips or any other informational widgets containing the link to persist, so that the link can be clicked #83 #84 #85

Implement sticky tooltips

As a part of the experiment with persistent informational widgets (#83), we would like to try to trigger tooltips on click and/or focus (in addition to hover). When tootltip is triggered this way it will stay even after mouse left the triggering element. It will be dismissed after the element looses focus (e.g. if clicked somewhere else)

We may consider to keep the sticky tooltip as it is, or we can trigger a different popover widget instead.

Display some of the PCR primers

We want to highlight some of the the primers typically used in PCR.
This can also be further extended to highlighting mutations in these regions.

Add "scroll to top" button

Move data transformations from components into webworkers

We only want to compute the data once and preferably not on UI thread.

However currently some of the transformations are performed inside components. These transformations needs to be moved.

This may require changing of interfaces between the algorithm and UI code.

Implement persistent informational widgets

On results page, various elements (table entries, marks on sequence views, gene boxes on gene map) trigger tooltips/popovers containing important information. They appear on hovering mouse over the corresponding element.

This is somewhat intuitive and this is how elements on nextstrain/auspice also work, however there are several drawbacks of this UX for our application:

There is currently no way to copy text from the tooltip (it disappears if mouse is moved).
We'd like to put links there, but it would not be possible to click on them currently
Hovering on mutation marks, which are 4px-wide rectangles currently is quite tricky, non ergonomic for long-term use and non-accessible for people with motion disability or simply even for people with low-resolution mice or trackpads.

We would like to experiment with a few approaches on making the informational widgets more persistent, ergonomic and accessible:

#85 obvious improvement would be to implement tootips that stay open if user clicks on a triggering element
#84 Table row accordion - allow rows to expand
modal popups ?
tabs ?
sidebar ?
bottom bar ?

To expand on constraints of the resulting solution:

we want to reserve as much space as possible for sequence views (the long SVG bars in the rightmost column of the table). This is the central piece of the application. Solutions which reduce horizontal space mostly would not work, unless extremely creative.
we want to display the additional information in these new widgets and we want it to stay until dismissed, so that, for example, click (#86) and other interactions are possible
there could be hundreds of rows at a time
the layout of the results page will soon be modified, such that the table is scrollable, but the page is not #35

This issue involves creativity, trying multiple ideas and then keeping one, few, or all of them. We would love to hear from the community on how we could improve user experience further.

Implement expandable rows ("row accordion")

As a part of the experiment with persistent informational widgets (#83), we would like to try to add table rows a capability to expand.

That is, on trigger, each row would expand down, exposing underlying details section. This is similar to bootstrap's accordion widget: https://getbootstrap.com/docs/4.3/components/collapse/#accordion-example

Each row would have a place where user clicks to trigger the expansion: could add a dedicated button with a chevron or just trigger when clicking on sequence name in the table.

We can allow (or not) for multiple open rows at the same time.

Short sequences are flagged with a QC pass

Very short sequences (1 or even 2 nucleotide) with no ambiguous bases are flagged as a pass. Could implement a filter for sequence length.
How are lead and trailing 5' and 3' UTRs handled? Should N's in these region comprise a fail in QC?

Add progress indicator

Add indication of progress for the status of the algorithm run - global, and if possible per sequence

Add help tips

We want to add brief descriptions to certain widgets and points of interest.

For starters we can add "?" buttons to column headers on results page. Clicking such a button will bring up a popup with explanations for this particular column.

We may include small schemas there, such as clade tree diagram and nucleotide coloring convention.

nextstrain / nextclade Goto Github PK

nextclade's Issues

Recommend Projects

Recommend Topics

Recommend Org