nextstrain / nextclade Goto Github PK
View Code? Open in Web Editor NEWViral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
Home Page: https://clades.nextstrain.org
License: MIT License
Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
Home Page: https://clades.nextstrain.org
License: MIT License
We want to allow users to stop the currently running task.
For example, there might be a button, which would trigger this action. The implementation should terminate all tasks in worker pools (both, parser and analysis) and return the app into the idling state.
The partial results should probably be discarded. However we may also consider keeping them, especially if #30 is implemented. One nuance to watch out is the export button. We may or may not allow exporting the incomplete results.
In this case we may also add a button to clear the current results, which would return the app to pristine state.
This feature will be handy for users if the run takes too long and they are not willing to wait or if they are running the wrong file by accident. It will be a cleaner solution than just page refresh.
We might want to extend the current QC algorithms to not only produce binary Yes/No output, but also provide more fine-grained results.
For example, for every rule we may produce a numeric score.
These scores can then be used to sort rows by severity of issues and to provide user with more fine-grained message text, upon which it will be easier to take an action.
It is preferable for scores for different rules to be numerically comparable for easier sorting and filtering. That means that more severe issues should have higher weights and produce higher "badness" (or lower "goodness") score. We may allow user to tweak these weights for their needs, to adjust desired QC strictness.
We want to explain our users the basics of how our app operates and what they are seeing.
We want to cover particularly carefully the places where we use some conventions (color, notation, terminology, etc.), no matter how well-known they are.
Add landing page which will contain the basic information about the application as well as file upload box to launch the algorithm. After algorithm is triggered, the app should navigate to another page with results.
Currently it's hardcoded in a constant.
However we want to support multiple viruses in the future, so this should be dynamic.
The best way is probably to take the length of the root (reference) sequence.
Currently the Footer component is a complete disaster.
We need to sort out what things we want there.
Vercel logo+link must be present on the page (in footer or otherwise)
Related:
When filtering and/or sorting is applied to the results table(i.e. not all data is shown or shown in different order), we want to give user a visual reminder, for example in the form of a status panel or status bar with text showing what filters and what sort exactly are in use.
We may also display the number of rows shown and total number.
This is to prevent confusion, especially about hidden data rows.
Related:
Currently the gene map is firmly attached to the bottom of the available table space. This creates a large gap between sequences and gene map when there aren't many sequences.
We want to attach the gene map right after the last sequence in the table, bringing it closer. This should facilitate tracking and comparison.
It might be trickier that usual, because we are using a custom flexbox layout and react-window to manage the table markup and style, especially the positioning.
There might be a decent javascript solution - the decision where to put gene map is made depending on whether sum of heights of rows is greater than the available height (calculated with already present autosizer wrapper or otherwise). There might also be a CSS solution, using position: sticky
, or a mixture, using css-in-js (we use styled-components).
Related:
We want user to be able to access previous runs.
We may add a table on resutls page that would list previous results and would allow to switch current results table to one of the previous ones.
We might also save the results in local storage (watch out the size)
Add a button on results page that triggers analysis of the current data all over again, exactly the way it happens after "uploading" a file on main page.
Related:
Currently input files are not being fully verified during parsing.
If I upload a random text or binary file the app treats it as a file containing 1 sequence. This sequence is then being sent to the analysis, which, of course, fails immediately at the alignment stage. Even worse, empty files seem to be driving the app into infinite loop.
We might want to detect if the data resembles sequencing data and tell user early if it's not.
Despite the lack of data sanitation and validation, there is no obvious security issue here because the entire processing is happening on the client side and the users themselves are responsible for feeding the incorrect files. However, the UX would be better if we could help user to spot and report the incorrect "upload". For example, user may accidentally feed compressed sequence file.
The format is as follows
Position 8072
'C' : 20A, 20B, 20C
'T': 19A, 19B
We are looking to extend and improve our translations.
We are using i18next
and react-i18next
modules in the app.
The extraction of English strings is performed with yarn i18n
.
The extracted strings are in packages/web/src/i18n/resources
We use GitLocalize service to perform the translation:
Step-by-step guide:
packages web/src/i18n/resources/en/common.json
file or use these direct links for some of the languages:
If your language is missing, please write a message here and we will add it
{{ }}
- these are used in the program to substitute values, such as numbers and other strings. These names could give you a hint as to what will be substituted.The current export returns a json structure which contains the results, along with a bunch of other things (like processing status, redundant clade definitions, etc). Most users will want to export a file they can view, share, and analyze in Excel or similar.
I think a tsv table with the following columns would address most needs:
sequenceName
clade
alignmentStart
alignmentEnd
numberOfMutations
numberOfNonACGTN
numberOfN
mutations: C242T,C14408T,A24304G,..
deletions: 567-572,23304-20423,...
insertions: 678:ACG,23565:GTGTCG,...
missing: 16121-16525,27343-27565,...
QC-Fail/Pass
QC-flags: 'too many mutations','SNP clusters'
Some of the fields are themselves arrays (like the mutations) and hence a table is not ideal. But if we provide json export as well, I think we address most needs.
When there are many sequences in the results table, the app becomes very slow.
We have to limit the amount of rows displayed and to only render rows that are currently visible
For, example this can be achieved with
https://github.com/bvaughn/react-window
Related:
We want to restructure our quality control (QC) system:
Probably:
Each tooltip should only show information which pertains to the tooltip's target.
The tooltip for sequence name column should gather all these pieces together.
We might want to emphasize the sequences with QC issues so that they are easier to find in the table.
Similarly to how we tint rows containing errors in "danger" red color, we may tint rows that contain QC issues in "warning" yellow or orange color.
We may vary tint intensity depending on number and severity of the issues. For example, rows with only slight quality degradation will be light-yellow, while rows with many problems will be in saturated orange.
This "gradient" coloring will compose well with the current coloring of errored rows.
We may save some screen real-estate by removing common prefixes in sequence names (and thus reducing width required by "Sequence width" column, and increasing width of sequence views).
Consider for example sequence names
hCoV-19/USA/FL-Miami-06_UMTL-A388/2020
hCoV-19/Madagascar/MA-Antananarivo-01_XXXX-A999/2020
We may strip hCoV-19/
, as it does not bring a lot of valuable information (we only ever do this virus currently) and names would become:
USA/FL-Miami-06_UMTL-A388/2020
Madagascar/MA-Antananarivo-01_XXXX-A999/2020
Which is significantly shorter. This is also what Nextstrain does, so this would not be something unexpected.
We probably want to stay away from any automatic detection of prefixes, because there isn't a standard for how sequences are named. We may present users with some automatic suggestions though, leaving decision to them.
Prefixes should only be removed upon displaying, no actual data should be modified.
After analysis is done we may display an indicator of the quality of sequences in the batch.
It could replace the progress bar and be just another progress bar with multiple stripes - green stripe will indicate number of passed sequences, and remaining red stripe will show number of failed sequences.
Yellow stripe can be added when we have QC metrics with warning severity.
Allow for sorting and filtering table by sequence names, mutations, possibly other properties.
Possible implementation: react-table
https://github.com/tannerlinsley/react-table
Add vertical line to every sequence view and gene map that is positioned according to horizontal position of the mouse cursor.
This should allow for better tracking of which position in a sequence belong to which gene in a gene map, especially if sequence is far away on screen from the gene map.
Related:
Currently the SVG bars in the rightmost column of the results table are extremely slow to render. There is a visible delay when newly analyzed sequences appear and when scrolling the table.
We want to try to make rendering more efficient. The solution might be as simple as wrapping some of the components and functions in React.memo()
/useMemo()
.
When working in dev mode, once eslint and typescript issues appear in the console, they persist across incremental builds even after fixed in the actual source code.
I expect errors to go away as long as they are no longer present.
I think they might be cached somewhere and cache is not flushed on build.
It would be nice to be able to change the width of particular columns. This would allow for example, to increase or decrease the size of the sequence name column, depending on a given naming convention and length of those names.
react-window
).We may go even further and to allow changing with of the entire table. This would require some tweaking of the page container styling and thorough testing across device sizes.
When filtering and/or sorting are applied to the results table, we might give user a shortcut to reset these to default state (i.e. no filtering and sort by id column).
We might:
ingest the frequency of each mutation on nextstrain then use this to flag rare mutations
This introduces additional input to the QC rule that evaluates number of mutations
exclude mutations on the edges of sequences from the consideration by QC (similarly to how nextstrain does), or, perhaps, weight QC scores lower for these fragments
To attract a first-time user's attention we might add some example data that allows to showcase some of the interesting features of the application.
That would include, for example:
This does not have to be real-world data, can be any dummy sequences, with appropriately dummy names.
Care should be taken to not break any licenses. If sequence is edited (e.g. to showcase gaps and corresponding QC feature), then name of the sequence should be modified.
We may implement a little interactive walktrough demo in the form of a series of popup boxes or alerts, encouraging user to try some of the features. This demo would only be available when loading example data and with a possibility to skip and disable it (with a flag saved in the local storage).
For example: "C123A" and "C -> A"
We want to make a first good impression on our users and the fact that the app is advertised on Twitter means that the most of the first views will probably come from mobile useragents. It would be nice to have a decently looking main page on mobile.
Making results page look good on mobile would be a tough task due to tabular layout. But we don't expect our users to have sequence data on their phones, neither the computational resources are appropriate for the task. So results page will have to stay desktop-first.
Put results table in a fixed-height container such that the table is scrolled separately, without the entire page being scrolled.
Currently, where there are many sequences there might be a lot of scrolling to do just to reach the "Back" button or any other controls on top. Same for Gene Map
This wall make all of the controls accessible at all times. In particular, the Gene Map will always be visible at the bottom (outside scrollable container) and user can scroll a particular sequence closer to it.
Problem: the page will not have scrolling, but we don't necessarily want to waste precious space showing the footer. He have to show Vercel logo somewhere as it was a part of the sponsorship deal though. (#41 )
Related:
We are exited to hear back from our users, however currently we are not providing any means of contacting us in the app, so that only people familiar with GitHub can do so.
I propose to add a chat button in the corner what would pop up a chat window where users could compose a message. This message is then to be sent to the project's email box, where maintainers could review the messages and reply.
There are many third-party services exist that make integration of these kind of buttons fairly simple.
The email box is to be set up first.
Currently user has to go back to main page in order to "upload" more data. We want to avoid this unnecessary navigation by allowing to upload data right on the results page.
Related:
This is an idea for the new measurement.
For every sequence we want to attempt to find closest neighbor on nextstrain tree. Perhaps, on a coarser version of it, to make it faster.
Additionally, we may infer where or when a sequence was common.
Currently workers don't use all available resources.
First of all, the size of the worker pool is hard limited to 4:
https://github.com/neherlab/webclades/blob/45b9930b819e3ffec60b0f01be53a844944a1e1c/packages/web/src/workers/createWorkerPools.ts#L8
Secondly, heavy rendering on main thread delays workers from getting work items, which causes unnecessary idling in worker processes.
The reason is that the task queues which feed the workers is in the memory of the main process, along with rendering. If main process is busy (with rendering SVGs in sequence views for example), free workers cannot pop tasks from the queue.
Ultimately, there is no currently a solution for that in JS, because there is no shared memory (where the queue would be stored and accessible to all threads/processes at all times, like in native environments)
The problem can be mitigated by oversubscribing the available resources. For example, you may create 8 workers, even if there are only 4 physical processors. This way workers will compete for resources more aggressively, and if 4 of the workers are done with their tasks and are unable to get new work items, there are still 4 others that haven't finished and they will keep all 4 CPUs busy. This basically abuses the OS scheduler, using it as an additional task queuing mechanism (where items are temporarily stored in preempted worker's memory until it is scheduled for execution again)
The disadvantage of course is that too much oversubscription may lead to too much context switching overhead. That is, workers will be scheduled and unscheduled periodically to run on cores, so that each can make progress. This context switching takes time and wastes CPU cycles. And there is no way for us to set any scheduling priorities.
So there is a balance we should find when choosing the number of workers. I think that the navigator.hardwareConcurrency + 2
is a safe bet. We may also expose this setting to users.
There is an experimental work on SharedArrayBuffer
happening in Mozilla for a few years
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer/Planned_changes
It does not seem production ready and AFAIK there were some serious security concerns in the past. Neither Chrome, nor our library (Threads.js
) supports it currently. So we probably don't want to go there quite yet.
I imagine it to be like Google Maps navigation but in 1 dimension.
Need to figure out what's the best user experience here would be, because vertical scrolling is an important part of the table widget (and should take precedence).
We may for example enable zoom when holding Ctrl
UPDATE: Ctrl would not work, because Ctrl + Mouse wheel triggers browser zoom.
We all know Ctrl-to-zoom is much more annoying than modifier-less zoom, so alternatively (or additionally to), we might have a button to toggle Ctrl modifier requirement (enable/disable "zoom mode")
In any case there should also be buttons for zoom in, zoom out, scroll left scroll right which move the viewport by a given amount of pixels.
Implementation should be aware of screen width, mobile clients. OS differences should be take into account (What was that "different" Ctrl key on Mac again?)
This should not conflict with zoom or other features in browsers or commonly used browser addons.
Completely different approach could be to just rip-off a system that Nextstrain uses in Entropy/Diversity section, however there might be some unexpected challenges, because we may have a lot of rows we and I expect d3.js to be extremely slow. React reimplementation of this feature may be fine though.
Related: scrollable container feature #35
I made a terrible mess there, @rneher could you please clean this up and prepare for the end-user's eyes:
https://github.com/neherlab/webclades/blob/feat/landing/packages/web/src/components/About/AboutContent.mdx
We want to add clickable links into mutation tooltips. These links would lead to the corresponding mutation view in nextstrain.org/ncov colored by the mutated nucleotide.
For example, the mutations at position 11083 would have a link that leads to this URL:
https://nextstrain.org/ncov/global?c=gt-nuc_11083
This is a great way to integrate the tool with nextstrain.org and will hopefully help to grow the ecosystem further.
This would require tooltips or any other informational widgets containing the link to persist, so that the link can be clicked #83 #84 #85
As a part of the experiment with persistent informational widgets (#83), we would like to try to trigger tooltips on click and/or focus (in addition to hover). When tootltip is triggered this way it will stay even after mouse left the triggering element. It will be dismissed after the element looses focus (e.g. if clicked somewhere else)
We may consider to keep the sticky tooltip as it is, or we can trigger a different popover widget instead.
We want to highlight some of the the primers typically used in PCR.
This can also be further extended to highlighting mutations in these regions.
We only want to compute the data once and preferably not on UI thread.
However currently some of the transformations are performed inside components. These transformations needs to be moved.
This may require changing of interfaces between the algorithm and UI code.
On results page, various elements (table entries, marks on sequence views, gene boxes on gene map) trigger tooltips/popovers containing important information. They appear on hovering mouse over the corresponding element.
This is somewhat intuitive and this is how elements on nextstrain/auspice also work, however there are several drawbacks of this UX for our application:
We would like to experiment with a few approaches on making the informational widgets more persistent, ergonomic and accessible:
To expand on constraints of the resulting solution:
This issue involves creativity, trying multiple ideas and then keeping one, few, or all of them. We would love to hear from the community on how we could improve user experience further.
As a part of the experiment with persistent informational widgets (#83), we would like to try to add table rows a capability to expand.
That is, on trigger, each row would expand down, exposing underlying details section. This is similar to bootstrap's accordion widget: https://getbootstrap.com/docs/4.3/components/collapse/#accordion-example
Each row would have a place where user clicks to trigger the expansion: could add a dedicated button with a chevron or just trigger when clicking on sequence name in the table.
We can allow (or not) for multiple open rows at the same time.
Very short sequences (1 or even 2 nucleotide) with no ambiguous bases are flagged as a pass. Could implement a filter for sequence length.
How are lead and trailing 5' and 3' UTRs handled? Should N's in these region comprise a fail in QC?
Add indication of progress for the status of the algorithm run - global, and if possible per sequence
We want to add brief descriptions to certain widgets and points of interest.
For starters we can add "?" buttons to column headers on results page. Clicking such a button will bring up a popup with explanations for this particular column.
We may include small schemas there, such as clade tree diagram and nucleotide coloring convention.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.