bimberlab / discvrlabkeymodules Goto Github PK

A collection of public LabKey modules developed by the Bimber Lab

TSQL 5.52% PLpgSQL 0.02% HTML 55.64% JavaScript 6.00% Java 30.65% Shell 0.31% R 0.48% CSS 0.03% TypeScript 1.35% SCSS 0.01%

discvrlabkeymodules's People

Contributors

Stargazers

Watchers

Forkers

labkey jctrotter barionleg

discvrlabkeymodules's Issues

Filter fields with multiple values

Some variants may have properties have multiple values in a list. Currently, filters check against the first entry in this list, ignoring the rest. As a user, I should be able to filter against these values in some capacity.

ex: if a variant has AC = X, Y, Z, a filter will only check against X for purposes of filtering.

Bypass plugin selection

Convert much of ExtendedVariantPlugin to proper typescript instead of .js files

Write a plugin, hosted in the module, to modify the VCF details page

Task 3

Lucene free-text VCF search

Here's some query use cases we need to capture:

AF: this can be an array of numbers when there's more than one allele for a site. Can the search deal with that intelligently. I think an example would be data like AF=0.01,0.05, and then searching on either AF>=0.5 or AF<0.02. Both queries should return this row.

I doubt we want weighting in the queries, especially if that makes paging hard. I think we probably want results strictly sorted on chromosome/position.

Would be really useful if chromosome (which is alphanumeric), did natural sorting, such that the order is 1,2,3...10,...MT,X,Y. That's opposed to 1,10,2,3,...MT,X,Y.

When searching the set of samples variable at each site: do proper matching on the whole ID, such as the pseudocode "return any site variable in the animal 35322". We want that to find 35322, and not a simple contains. There we dont want "353" to return that record and we dont want it to find the ID "35322x".

Need good testing/examples for pagination and ordering of results.

Create minimal documentation on the config object for ExtendedVariantWidget

I suspect the code is going to keep evolving so we shouldn't invest too much in docs, but it might be nice if the code at least contained a simple example of the config object for ExtendedVariantWidget, perhaps in a comment block?

This is probably out of date, but maybe some example like this, illustrating all the available options:

      "metadata": {
        "extendedVariantDisplayConfig" : [
          { "name": "Predicted Function - 1",
            "properties": ["AF_ESP", "ALLELEID"]},
          { "name": "Regulatory Data - 1",
            "properties": ["CLNDN", "CLNVC"]}
        ]
      }

Fix Field Title Mapping

Something broke field title mapping. Titles in ExtendedVariantWidget/fields.js are no longer being displayed correctly in the widget.

Free text search: default columns

We should ensure the chromosome, position, REF and ALT columns appear in the client side table, as the left-most columns. See related issue about sorting on chromsome+position. Consider whether we should expose a composite client-side field that displays "chromosome: position", and has a custom sort function to both make this information more dense and help with easier sorting.

See related issue about genomic position. At minimum, i think the client-side position field should actually execute the sort on genomicPosition, to ensure chr2:1 sorts after chr1:2.

Prepare GenotypeTable link in ExtendedVariantWidget

Load hosted plugin from config file

Task 1

Demonstrate ability to load minimal hello world plugin in jbrowse-linear-view

Remove sessionStorage from Variant widget

The variant plugin uses the config property 'variantDisplays' to read which properties to display. Currently, that's parsed by the plugin and stashed in window.sessionStorage. We really ought to be able to do something more elegant within the JB2 architecture. I dont know if any of these work, but some thoughts:

In NewTable(), which is ~line 86, do we have access to any of the jbrowse methods like getContainingTrack()? There must be something exposed within JB itself to let our code discover the owning track.
in selectFeature(), line 28, we can call getContainingTrack(). Assuming this would provide us with the variantDisplay data, can we pass this as a property to VariantWidget? I can understand some reluctance to track this onto featureData (although it's not crazy). Perhaps we add a typed property to VariantWidget:

const featureWidget = session.addWidget(
'VariantWidget',
widgetId,
{ featureData: feature.toJSON(), variantDisplayConfig: variantDisplay },
)

BimberLab labkey-util package

Collect helper code, styled components, reusable code under a unified lab-specific package. Identify code to be included in this package.

Review CSS/styling on the custom variant popup

The default JB2 popup when you click a variant shows a dialog with a table of variant properties. The default has a grey shading around the name. The custom dialog also shows tables of properties, but lacks that styling. The styling of our table should match the JB2 default, and ideally use identical or nearly identical HTML/CSS as they do, to keep us future proof.

Free text search: multi-valued numeric fields

Splitting rows/documents by ALT allele will reduce this, but we will have cases where an INFO field has multiple values per row. We should have a test case (which we could engineer into the mGAP demo VCF), and i think the desired behavior is for the client to be able to do normal numeric search, and return that row if any of the values matches.

Example:

FIELD1=1,2,30

search: "FIELD1 > 20"
result: row is returned
same for "FIELD1 < 2" or "FIELD1 == 2"

Add new filter option based on sample

For JBrowse, we have filter UI to let the user filter variants based on IMPACT and AF. We should add one more filter type, based on sample ID. The general goal is that the user can enter one or more string sample IDs (perhaps as a CSV list). Only variants with a non wild-type genotype in any one these IDs will be shown.

As a reminder, genotypes are the raw data used for the genotype graph and table at the bottom of the variant details page.

To take an example, see the mgap demo:

https://prime-seq.ohsu.edu/jbrowse/Labs/Bimber/jbrowse.view?session=mgap

Position 1:116,981,270..116,981,270 is a SNP where the reference is A and alternate allele is T. "Non-wild type" is defined as any genotype that is not A/A (since A is the ref). We also want to exclude no-calls, which are represented as "./." or ".|.".

Some caveats/notes:

The feature should have have a SAMPLES attribute. This holds genotypes. However, it is possible for a VCF to lack samples. The demo session has these, I believe. We just want to be sure the code doesnt crash if feature.SAMPLES is null.
If the user enters a sample that doesnt exist, presumably all the variants are filtered (since it is impossible for a non-existent sample to have a non-wild-type variant).

Explicit typing for variantDisplays property in ExtendedVariantTrack model

ExtendedVariantTrack requires a property like the following:

      "variantDisplays": [
        { "name": "Predicted Function - 2",
          "properties": ["ALLELEID"]},
        { "name": "Regulatory Data - 2",
          "properties": ["CLNVC"]}
      ]

We should figure out how to formally define this property in the model for ExtendedVariantTrack and possibly also ExtendedVariantWidget. If we do this correctly, I believe variantDisplays will get passed through the code (I believe it's currently discarded since the model is explicitly typed and we dont define variantDisplays). If we can do this right, I think it's also likely that we can avoid needing to use window.sessionStorage in ExtendedVariantWidget.

I dont fully understand data flow in JB2, but I believe we need to update the configSchema in ExtendedVariantWidget/index.js as a start point. This seems to be where we'd create a property and setter for "variantDisplays". We could then work backward to ExtendedVariantDisplay. Around line 32 we have access to the track object, and we also create the ExtendedVariantWidget widget.

Selenium Testing

Task 5

JBrowse: allow more parameters to be set from the URL

The jbrowse reaction action should allow the following to be set from the URL. These should be deep-merged into the server-supplied session config. URL params should include:

activeTracks: a csv list of the tracks to show by default. because this creates more complex configuration, perhaps these should be read from the URL, passed in the request to getSession, and handled server-side?
We should have url params to set filters. The key use case is sample filtering. However, if we could develop a way to serialize/deserialize the whole filter config into a URL string that might be more powerful.

Free text search: sort order, genomicPosition, and pagination

The genome is a set of chromosomes, each of which is a different length. The chromosomes have a sort order. Naively sorting on "start" rarely makes sense, because then chromsome 2 position 1 would sort before chromosome 1 position 2. I propose a couple things to improve the user experience:

The DISCVR-seq tool already indexes a field called 'genomicPosition'. This in a running total of the position within the genome. Therefore the genomicPosition of chromosome 2 position 100 is the length of chromosome 1 plus 100. The idea is to give us a simple integer for positional sorting. I think we want lucene to return 100% of queries sorted on genomicPosition. And I think we never want to show this field to the user.

Practically, we should:

On the client grid, it would be really useful if we could make the position field show the start, but write a custom sort function such that is actually sorted on the field genomicPosition. This would result in those values sorting in a more natural order.
When results are returned by lucene, we should sort them on genomicPosition.
Pagination should be based on genomicPosition sorted data, instead of like a search ranking type thing. This gives a deterministic sort order, which i think is important for pagination.

We need testing for:

We should have integration or similar testing on the lucene search endpoint related to genomicPosition sorting.
We need to test pagination, probably through integration testing

Migrate appropriate parts of ReactJS code to discvr-components

As our react code evolves, it will likely be useful to have some core code/components in a standalone monorepo with multiple npm packages, rather than living in the repo with a specific LabKey module. I created bimberlab/discvr-components, which is currently just a stub repo. We should build the basic repo structure to house multiple npm packages, and add whatever basic github actions are appropriate for CI and publishing artifacts.

Some thoughts on what might make sense to migrate here:

The JBrowse plugins might make sense as discrete standalone packages. This is how JBrowse normally expects a plugin to work. This would make it easier to re-use them across derivatives of the Browser in different LK modules
We might want to make a generic Utils type package, with LabKey-specific code, such as general error handling

Widget to View/Export Variant Data in table form

For mGAP, viewing variant data is very important to users. currently, we primarily allow users to see these data as a genome browser, which is a complex visual component. However, in some cases viewing data in table form, or exporting as a table, would be helpful. The general idea of this feature is to re-purpose JBrowse's configuration and query layer, but render the results as a table. Some points:

This is similar to standalone search in that we start with a session ID, call getSession to get jbrowse config, and then load our custom UI.
In addition to session, we need to know the ID of the track to query, and genomic coordinates (contig, start, and stop). We would use JBrowse's internal query code to issue a request to get the features from this region. This should be returned as an array of JSON objects. Each feature object has different properties. some are present all the time and some are unique to a given track.
The results should be presented using some kind of modern client-side table, ideally that provides some nice features including export, sort, filter, etc.
We should also have some ability to change/select columns. I think we want our code to parse the feature and make a custom column model from the properties. There should be some concept of which properties are shown by default, and some concept of raw name to column title (i.e. we should figure out a user-facing label for the properties and show this).
Users should have some ability to add/remove columns.
Much like the search widget, there should probably be a standalone react component (which might actually live in the to-be-created standalone package). The LabKey view should be responsible for reading the sessionId, trackId, contig, start, and stop coordinates off the URL. It will pass these to the table component to render. This separates the concerns of how to pass config from the actual table.
This table component should accept some configuration around the column model. For example, the code that creates the table might supply a list of defaultVisibleColumns or similar properties.

Improve VariantTable feature loading and capacity

Currently the variable table has a single hard-coded range of variants. This performs one ajax request to load those into memory, which has secondary in-memory filtering in the grid. Further, we have a hard-coded maximum for the size of the window. That is important so we dont crash the browser by loading too many variants; however, this is not ideal since the density of variants/window will vary widely by dataset. I propose:

We should do paging with multiple ajax requests. We should figure out how JBrowse does this internally in the browser, since they have the exact same use-case and challenges as the table.
we should leverage JBrowse's internal capabilities to estimate feature density.
we should exposed some user-facing UI (maybe co-opting the existing xgrid paging stuff) to let the user page through different coordinates dynamically, where the client automatically issues new requests to load more data.
There might be existing xgrid examples about remote paging that we could follow

Create and host a minimal plugin for testing purposes

Task 2:

Host plugin from LK server
Host plugin json+code either in ./resources/jbrowse/plugins (if no build needed), or in ./src/client/XXXX if we need to build code
Update minimalSession.json to include plugin config (note: this might need to update server URL)
Make the demo plugin do something really obvious
Create test that verifies this plugin is loaded

Figure out property to toggle whether text below variant shows

In JB2, it looks like the text below the variant (i.e. SNV A>T) shows up by default. Is there some config property that toggles that behavior? Can we choose the value for this in the config JSON? In general, we probably want to default to not showing this. If this can be toggled in the config, we can just set that there.

Full text search: the interaction of sorting, client-side sorting, and pagination

We definitely need to support a server-side pagination for lucene. I am currently proposing that we always sort lucene results on genomicPosition. We need to support some kind of pagination, probably based on genomicPosition sorted results. This could be changed.

On the client, the user can also execute sorts on the table. Those are currently in-memory on the client records. How does this interact with client vs. server-side sorting, and also with pagination? Are they totally disjoint? Can we make them integrate better, perhaps at least if the user sorts on position in the client? it would be weird if the user sorts on position client-side and it only operated on the first XX rows in-memory.

Idea: maybe we could pass a more flexible sort-order to lucene when we issue the query? perhaps then we actually punt virtually all sorting to the server and do minimal sorting client-side?

Enhancements to filtering

There are a few enhancement around variant filtering:

currently filters like ANN (predicted effect) allow the user to pick one item from a drop-down. We should make these a multi-select, and treat the JEXL filter like an OR (i.e. ANN == 'A' OR ANN == 'B')
We should add some more supported filter types and coloration options. In theory, we should just need to add them to the filter and/or color definitions map. We should start with CADD_PH (which is labeled CADD Score).

Further filter error handling

We need to handle the case where a user inputs a numeric value for a filter outside the range of that field's minValue and maxValue properties
We need to find a solution for error handling that moves away from try/catch blocks.

react warning related to unique keys on list

It's not clear to me if this is react-tools being overly aggresive in warnings or if this is a real problem, but I am getting warnings like this logged by react-tools in chrome. At different times the stack is different. In most stacks you can find the line from our code that triggered it. I see different places that all involve rendering a table. I dont see any mention of tables in the docs on the 'keys' warning, and I dont see our code generating lists anywhere. But maybe the docs are not up to date and we're supposed to give unique keys to table rows or something?


react_devtools_backend.js:2842 Warning: Each child in a list should have a unique "key" prop.

Check the top-level render call using <WithStyles(ForwardRef(TableBody))>. See https://fb.me/react-warning-keys for more information.
    in WithStyles(ForwardRef(TableRow))
    in NewTable (created by wrappedComponent)
    in Suspense (created by wrappedComponent)
    in wrappedComponent (created by wrappedComponent)
    in div (created by ForwardRef(Paper))
    in ForwardRef(Paper) (created by WithStyles(ForwardRef(Paper)))
    in WithStyles(ForwardRef(Paper)) (created by wrappedComponent)
    in div (created by ForwardRef(Paper))
    in ForwardRef(Paper) (created by WithStyles(ForwardRef(Paper)))
    in WithStyles(ForwardRef(Paper)) (created by ForwardRef(Dialog))
    in div (created by Transition)
    in Transition (created by ForwardRef(Fade))
    in ForwardRef(Fade) (created by Unstable_TrapFocus)
    in Unstable_TrapFocus (created by ForwardRef(Modal))
    in div (created by ForwardRef(Modal))
    in ForwardRef(Portal) (created by ForwardRef(Modal))
    in ForwardRef(Modal) (created by ForwardRef(Dialog))
    in ForwardRef(Dialog) (created by WithStyles(ForwardRef(Dialog)))
    in WithStyles(ForwardRef(Dialog)) (created by wrappedComponent)
    in wrappedComponent (created by wrappedComponent)
    in div (created by wrappedComponent)
    in wrappedComponent (created by View)
    in ThemeProvider (created by View)
    in View
overrideMethod @ react_devtools_backend.js:2842
printWarning @ react.development.js:316
error @ react.development.js:288
validateExplicitKey @ react.development.js:1631
validateChildKeys @ react.development.js:1657
createElementWithValidation @ react.development.js:1807
eval @ ExtendedVariantWidget.js:284
commitHookEffectListMount @ react-dom.development.js:19732
commitPassiveHookEffects @ react-dom.development.js:19770
callCallback @ react-dom.development.js:189
invokeGuardedCallbackDev @ react-dom.development.js:238
invokeGuardedCallback @ react-dom.development.js:293
flushPassiveEffectsImpl @ react-dom.development.js:22854
unstable_runWithPriority @ scheduler.development.js:654
runWithPriority$1 @ react-dom.development.js:11040
flushPassiveEffects @ react-dom.development.js:22821
performSyncWorkOnRoot @ react-dom.development.js:21738
eval @ react-dom.development.js:11090
unstable_runWithPriority @ scheduler.development.js:654
runWithPriority$1 @ react-dom.development.js:11040
flushSyncCallbackQueueImpl @ react-dom.development.js:11085
flushSyncCallbackQueue @ react-dom.development.js:11073
discreteUpdates$1 @ react-dom.development.js:21894
discreteUpdates @ react-dom.development.js:807
dispatchDiscreteEvent @ react-dom.development.js:4169

Improve user-facing feedback over which filters are active

Both the genome browser and variant table allow the user to apply filters on either the INFO field or samples. There is basically no user feedback as far as what filters are active. I propose adding a new component, which is passed the LGV model, that is rendered above both the LinearGenomeView and above VariantTable. This component would react to the active filters and display some sort of element (maybe a button-like thing?) indicating each active filter. If you click any of those elements, it opens the corresponding widget, allowing the user to update that kind of filter.

As an aside: if the variant grid widget we choose allows in-memory filtering, we should make sure there is feedback on this. I would be OK if this feedback was part of the grid itself, and it would be nice if we didnt need to make that ourselves.

Improve github actions CI / run LK server and selenium

Currently the github actions code runs this plugin on PRs:

https://github.com/bimberlabinternal/DevOps/tree/master/githubActions/discvr-build

It will infer the proper release from the branch name, and download code from various repos. It then does a gradle build. It provides real simple validation, but starting a server and running tests would be more helpful.

To start LK, we need to run a database and tomcat. GH actions has some facilities that might help us. For example, we could start a suite of containers. We might be able to leverage an out-of-the-box docker database:

https://docs.github.com/en/actions/guides/about-service-containers

We would need to provide a tomcat XML config file (probably in the plugin). But otherwise I think software installation needs are minimal. If we had this alone, we could at least start a server, which adds a layer of validation beyond just a passing build.

There are examples out there about running headless selenium tests on github actions. One layer further would be to execute one or more suites of selenium tests. This is probably the most complicated, at least insofar as capturing useful output on test failure.

Aggregated issues around variant table

This issue is an aggregated list of bugs/enhancements for the variant table:

We consider swapping a different react table for the one we have. This would help: a) some of the resizing/styling issues, and 2) some of the react grids have considerably nicer out-of-the-box filter capabilities, including operators (greater-than, etc.)
The variant grid should have a Details column (maybe with an icon) that loads the same popup one that you get when you click a variant in the genome browser.
We should add a column called that gives a link for 'Show Genotypes', which links to to the genotypeTable action, passing the session, track, and position on the URL. This should open in a new tab.

Complete user-facing variant filter dialog

The ultimate goal is to allow the user to fully control what filters are applied. They should be able to add any number of filters (including multiple instances of a filter over a given field), and remove them. The config JSON should be able to supply the set of currently active filters, which are read by this UI. The proposal is:

Change the FilterWidget to display something more like table/grid, rather than list of checkboxes.
Each active filter has one row. This row has inputs for Field Name, Operator, Value. It also has an [X] or button to remove that filter. Perhaps 'Field Name' is not editable. Operator should be a drop-down. Value is discussed below.
There is an 'Add Filter' button. This is perhaps dropdown menu button. The menu has one item for each filter defined in filterMap. Currently this is AF, AC, IMPACT. We could eventually add more.
When the user chooses one of these filter types, a new row is added to the active filters table. Based on what they chose, the Field Name is populated. The code can use fieldName to lookup the filter definition from Filters.js
The set of operators available in the operator field would be determined by the datatype property of the filter definition.
The input to specify the value should vary based on the filter definition. For example, if the datatype is float, it should enforce numeric values. If integer, same idea. If the filter definition has minValue or maxValue, enforce that. Likewise, many string fields might have an allowableValues property, in which case the field should be something like a combobox.

If the user opens the dialog and there are no active filters provided in the config JSON, it's an empty window with just the 'Add Filter' button. If the user opens the dialog and there are active filters in the config JSON, the UI should read each of them and add one row to the filters table. The inputs for operator and value should be editable. The user could make changes, hit 'apply' and these new filters should get applied.

Consider removing the separate filter modal from the table grid, in favor of 100% xgrid-based filters

Currently the variant grid has two parallel filter codepaths: the xgrid-based scheme and the separate scheme (taken from the genome browser and stored in the renderer/adapter). What if we simply removed the entire adapter-based scheme and converted URL-based filters into a filter model that's always applied to the table?

View/Export of Variant Data as TSV

When viewing data for a given track in JBrowse 2, there is an API call to the server, this returns/parses data into JSON, and then it's rendered as colored dots on the browser. I'd like to leverage JB2's existing APIs and code to a high degree, but instead do this:

Make a separate page.
This page would read the sessionId and trackId from the URL.
The code will make an identical request to LabKey to load the appropriate session JSON
it will find the track in that JSON, based on the trackId provided in the URL
It should have UI, ideally using the exact same JB2 components, to select the contig and position (like the box at the top of JB2)
When the user enters coordinates, using existing JB2 APIs and classes, issue a query to return the features overlapping a given interval
Then render a simple bootstrap-looking (does not need to be bootstrap. something like data-tables might be good) table with the results, one variant per line
This table should leverage some pre-existing component, ideally with out-of-the-box capabilities like export to TSV.
We will ultimately add more capabilities to this widget, such as add/remove columns. Bonus points if the table widget we use can already do this.

Compatibility with Labkey 22.3.2

Hi,
I tried to install DISCVR 21.7 to Labkey Server Community 22.3.2 but it was not compatible. I also tried to use only modules for DISCVR-seq only (laboratory, ldk and sequenceanalysis) but it was the same.

Will it be possible in the future to use DISCVR with this version of Labkey?

Regards,
Johann

Generalize Mode of Color Selection

Background: the current code allows the user to choose how variants are colored based on a combination of SNP Type and ANN field. We want to make this more generalizable to fit a few goals:

The cascading menus are going to be too unwieldly as the number of color choices increases. We should instead use a modal window with dedicated UI.
The current code mixes two sources of color, the Type and ANN/IMPACT. While it currently has a hierarchy of values, these are two distinct things. To make an analogy, if the data was animals, that would be like saying, "do you want to color the dot based on species or continent where it lives". The point is that these are two independent types of data, and a given data point can belong to different combination of them.

The proposal:

In general, the code should never need to encode variables or values related to the states, including HIGH, MODERATE, SNV, Indel, etc. We should be driving code off data and it should be more abstract in terms of configuration.
The user should first choose the variable to use for color. In the current implementation they are choosing 'Variant Type', or 'Predicted Effect'. These map to SNV type and ANN[IMPACT].
When the user picks one of these, they are given a set of fields with one value per state (i.e. High/Moderate/Low, or SNV/Insertion/Deletion). They choose one color per value. These should each have a default color.
We need to always allow a color for 'Other', as a fallback in case the data doesnt match our expectations.
As far as the renderer code: I think this code should accept one string for 'accessorString', and a map for valueToColor. The accessor string is literal jexl code that will be used to build the final jexl string. In the case of Predicted Effect, I think it would be "get(feature,'INFO').ANN['IMPACT']". In the case of Snp Type, it would be "get(feature,'type')"
The valueToColor map is a simple map like {HIGH: 'red', MODERATE: 'green', LOW: 'blue'}.
This config scheme should be very extensible and easy to add new color providers in the future

I might actually think about coding this as having a VariantColorProvider class. This class could define the accessorString, allowable values, and a default color for each value. Using this class to draw the UI in the user-facing popup would be a good idea. We can also make two instances of VariantColorProvider, one for ANN and one for SnpType.

We should support providing the configuration in the session JSON. If we support the idea of ColorProvider, this makes it simpler. We need to save the string of this provider (which avoids needing to know the accessorString). if all colors are defaults, we dont need to repeat the valueToColor map.

Let me know if the description of what the popup window should do doesnt make sense. I can try to mock something up.

Beta release task list

Further JSON config capabilities

Task 4:

Can our JSON config specify a default region to show?
Can our JSON config specify a default set of tracks to show?

Retain option to show/hide text labels?

The default VariantTrack has a menu with options for About track, Show labels, etc. Can ExtendedVariantTrack inherit and extend this menu, instead of completely override it? This would be a lot more useful instead of re-creating their options.

Also of note, I figured out that one way to set "showLabels": false is something like this:

      "displays": [
        {
          "type": "ExtendedVariantDisplay",
          "displayId": "mgap_hg38-ExtendedVariantDisplay",
          "renderer": {
            "type": "ExtendedVariantRenderer",
            "showLabels": false
          }
        }

Customization of JB2 Variant Display

In Jbrowse 1, the variants are displayed with variable color and shape:

Single-base variants are diamonds. Insertions or deletions are green rectangles. The single base variants are further colored based on the ANN field. The default is blue. Any variant where the ANN field lists it as having a MODERATE impact are yellow. Any variant where the ANN field lists the variant as HIGH impact is red. This was the extent of JB1 customization.

In JB2 we need to port this capability, but also extend it considerably. The proposal:

Make a new menu item that shows up in the drop-down list for tracks of type ExtendedVariantTrack called 'Customize Display'. - This opens a new, possibly large, dialog box. This box will present the user with options to toggle how variants are colored and to possibly filter them. Initially the supported options will be limited, but eventually they will grow.
The default coloration is called 'Color By Predicted Effect'. This will color using the logic listed above with the ANN field (code here:

DiscvrLabKeyModules/jbrowse/resources/web/jbrowse/plugins/AnnotatedVariants/js/View/Track/VCFVariants.js

Line 49 in d7f1edd

color: function(feature){

).
If this can be done quickly enough, the next color option will be to give the variants a color gradient based on the AF field (this is a number 0 to 1).
For now put a placeholder in the pop-up UI for filters. The idea behind filters is that you might be able to pick "Only show variants where AF > 0.2", or "Only show sites where predicted effect == HIGH". Again, we can punt on supporting these for this PR; however, we should plan for this in the code and UI.

For both color and filter, we need to be able to supply default settings in the track's JSON. Any configuration should be in metadata.extendedVariantTrackConfig.XXXX, not top-level in metadata. This is so we dont pollute/conflict with other plugins. Any settings provided in the track config should be respected on track load, and respected in the popup UI when it is first loaded.

Finalize ExtendedVariantTrack and Adapt for mGAP VCF

The overall goal is to create ExtendedVariantTrack as a generalized plugin/custom VCF display. This primarily involves a custom popup. The mGAP site will use this in their JB2 session, to display their VCF. To help with testing, here is a public URL to that VCF (it's 221gb)

https://prime-seq.ohsu.edu/_webdav/Labs/Bimber/Collaborations/Public/%40files/mGap.v2.1.vcf.gz

I believe you can put that into a JB2 session. We may or may not need to configure the genome to work with this.

Next, we need to adapt/port the JB1 custom code, including custom renders, to JB2. Here is the bulk of that code:

DiscvrLabKeyModules/jbrowse/resources/web/jbrowse/plugins/AnnotatedVariants/js/View/Track/_VariantTrackVariantDetailsMixin.js

Line 49 in 6ea9ddb

FIELDS: {

You'll see there is a list of INFO field -> Label. Then rendering follows this:

DiscvrLabKeyModules/jbrowse/resources/web/jbrowse/plugins/AnnotatedVariants/js/View/Track/_VariantTrackVariantDetailsMixin.js

Line 271 in 6ea9ddb

 defaultFeatureDetail: function( /** JBrowse.Track */ track, /** Object */ f, /** HTMLElement */ featDiv, /** HTMLElement */ container) { 

There are some patterns we dont want to replicate in JB2; however, you'll see many of the fields have custom renders, defined by having a method of the fmtXXXXXValue, such as:

DiscvrLabKeyModules/jbrowse/resources/web/jbrowse/plugins/AnnotatedVariants/js/View/Track/_VariantTrackVariantDetailsMixin.js

Line 451 in 6ea9ddb

fmtDetailANNValue: function(parent, title, val, f, class_){

we need to hook up renderers a different way, but the code within each of these might be able to get copied or adapted.

This is roughly comparable to the variantDisplay config we've been giving:

DiscvrLabKeyModules/jbrowse/resources/web/jbrowse/plugins/AnnotatedVariants/js/View/Track/_VariantTrackVariantDetailsMixin.js

Line 479 in 6ea9ddb

tagCategories: {

Here is an example from the production mGAP, with some comments added in red:

Clean up plugin loading, make it conditional

Free text search: groups of samples

There are natural groups of samples within the data. For the time being, we can assume those exist, and the server can identify them. It would be helpful if we supported a sub-type of the client-side sample search where the user simply entered a group name, rather than a big list of IDs. I propose we:

have a client side field type with all the same operators as sample search.
for this field, the user enters a group name, like "ONPRC Animals"
This is serialized in our lucene query string to the server as something like: ${GROUP:ONPRC Animals}"
The server inspects all incoming query string for this pattern. If found, it does basic string replacement of ${....} with the list of sample names, formatted in some consistent manner (TBD depending on how the ultimate query string needs to look).

This keeps all the code of resolving sample-to-group obscured on the server. We can implement this separately.

As above, needs integration test case.

Free text search: handling chromosome/start

Users will want to search on start position, but this is always linked to a contig/chromosome. These are actually two fields, but to a user it would be helpful if it acted like one. I propose we:

On the client, expose a filter type for 'position'. They enter values like: "1:1000122". The allowed operators would be numeric (i.e. equals, GT, LT, GTE, LTE).
The react field should validate the user's input
The react field should probably drop whitespace and commas automatically. Or maybe disallow typing them to begin with?
The client-side code should translate the user input of "position greater than 1:1000222", into a lucene query that is "contig equals 1 AND start greater than 1000222". This should just happen without the user needing to see or know it's occurring.
For the task for serializing/deserializing filters on the URL, we can probably serialize the user string.

Side note: the DISCVR-seq tool indexes a field called 'genomicPosition'. This in a running total of the position within the genome. Therefore the genomicPosition of chromosome 2 position 100 is the length of chromosome 1 plus 100. The idea is to give us a simple integer for positional sorting. I think we want lucene to return 100% of queries sorted on genomicPosition. And I think we never want to show this field to the user.

Full text search: searching on samples

The DISCVRseq tool will index a field called variableSamples. This should be considered an array of the sample names that are variable at that position (though I dont know how lucene technically treats it). This is a really important user-facing search type. For these examples, let's assume we have these two rows:

Row1: variableSamples=sample1,sample2,sample30
Row2: variableSamples=sample2,sample3,sample10

Example queries are:

On the client, the use needs to search "find sites where sample1 has a variant". This should return row 1, but not row 2. We cannot do a naive string match, since the string from row 2 has "sample10" and this contains "sample1". We need integration testing on this behavior.
Similar to above, we need an operator for "not variable", and this needs to respect the same kind of contains behavior.
Need a way to supply a list of sample names. An operator should be "variable in all of"
Need a way to supply a list of sample names. An operator should be "variable in any of"
Need a way to supply a list of sample names. An operator should be "not variable in any of"
Need a way to supply a list of sample names. An operator should be "not variable in one of" (not sure if we really need this)
All of those need integration test cases

The client-side code needs a special field type for sample.

That field type has these unique operators
needs to construct the right kind of lucene string
needs validation over the user input into the field
for the time being, the sample field can be free-text entry. Eventually the server could supply/validate sample names but let's punt for now.

Reach goal:

See extension related to groups of samples.

ExtendedVariant

ExtendedVariantAdapter has very specific configuration; however, we implement this using a non-typed JSON object we tack onto the metadata slot of the track's config. I dont know if this is new, but I just read these docs:

https://jbrowse.org/jb2/docs/developer_guide/#adding-custom-props-to-the-renderer

I think this is a blueprint for extending the model of ExtendedVariantAdapter, to support a defined top-level 'extendedVariantDisplayConfig' object, which would be better practice that tacking our config into metadata. This will also let us document/define all the config options we support in this model. We could probably also use the model to set default values where appropriate. All in all, it's the way we should be implementing custom contig

Rename VariantWidget to Extended

Standalone JB2 Search Box

On JBrowse 1 and 2, there is a search box near the top of the page. See here:

https://mgap.ohsu.edu/jbrowse/mGAP/browser.view?database=84AF90F4-AE4F-1039-A61D-776ADE0881CD&loc=1%3A116994850..116995646&tracks=reference_sequence%2Cdata-117%2Ctrack-6&highlight=

When the user types in this box, at least in JB1, it auto-completes gene names. JB2 doesnt have the capability yet, but it will in Sept:

The goal of this feature is to let us render just that search box on a page. The search box should be provided with a sessionId, just like the browser widget. The search box should query the server to get the session JSON. It should initialize whatever JB2 machinery is required in order to make typeahead work right.

This is our JB1 implementation:

When the user enters something into the box and submits (or hits enter), it navigates the page to the full browser (i.e. jbrowse-jbrowse.view), loading that coordinate.

Eval -> Jexl?

In general, it would be better not to use eval() to evaluate our filters. There are two possible changes:

use jexl. I think we probably can import this:

https://github.com/GMOD/jbrowse-components/blob/89d45c42531cca9f7b0e7b95622f48ee31d0ba1f/packages/core/util/jexlStrings.ts

and possible add new functions over time. we could then build a jexl string, which is effectively what we are already creating for eval() right now. We could then do something like this, and note that we pass 'feature' as context to evaluate it:

jexl.eval(myFilterString, feature)

We could make this a pure javascript solution. I'm not sure which is better. It could be vaguely like this:




function isDisplayed(feature, filterName, operator, value) {
	// Get definition based on key, like 'AF'
	const filterObj = filterMap[filterName]

	// One scheme would be to make some assumptions about what we query. 
	// For example, if always under INFO, we could do this, where we assme filterObj.fielfName is AF:
	const sourceVal = feature.INFO[filterObj.fieldName]]
	
        // While it might appear hacky, a javascript function would allow us to make more complex logic to handle fringe cases. 
        // For example:
	if (filterObj.allowMissingValues && sourceVal === undefined) {
		return true;
	}

	// There might be a better solution to translate operator string into math operation, but this is one possibility:
	switch (operator) {
		case: 'eq':
			return sourceVal === value;
		case: 'gt':
			return sourceVal > value;
		case: 'gte':
			return sourceVal >= value;
		case: 'lt':
			return sourceVal < value;
		default:
			console.error('Unknown operator: ' + operator)
	}
}

bimberlab / discvrlabkeymodules Goto Github PK

discvrlabkeymodules's People

Contributors

Stargazers

Watchers

Forkers

discvrlabkeymodules's Issues

Recommend Projects

Recommend Topics

Recommend Org