legumeinfo / gcv Goto Github PK

Federating genomes with love (and synteny derived from functional annotations)

License: Apache License 2.0

CSS 0.93% HTML 5.54% JavaScript 3.39% TypeScript 87.13% Dockerfile 0.23% SCSS 2.77%

docker angular bioinformatics synteny functional-annotations federation comparative-genomics pangenomics microservices grpc-web

gcv's People

Contributors

Stargazers

Watchers

Forkers

mariahadres jcvi-plant-genomics abretaud altingia pythseq soybase alancleary adf-ncgr zm-git-dev atintern genostack

gcv's Issues

GCV: Search scroll default value

It appears that some of our users did not understand that the scroll control needed to be filled in with a number. The control has been pre-fill with what seems to be the most sensible default. Is there a better solution? Such as an alert to tell them that it needs to be filled in if the buttons are clicked with no step being specified.

[LEGUME-681] created by alancleary

Left Slider Covered by Loading Overlay

When the loading overlay appears over any of the visualizations, especially the micro and macro-synteny viewer containers, it covers the left slider as well, if the slider is open. This prevents the user from being able to interact with the contents of the slider, which is not intended. Furthermore, if the macro and micro-synteny viewers are both visible, since they each get their own overlay, there is a small gap between the overlays where the split-pane divider lives. This cause an incongruity in the overlays' coverage of the left slider, i.e., it's obvious this is a UI bug.

intergenic segments being (mis)labeled as negative distances

the only case I can think of where it seems valid to call an intergenic distance negative is in the
case of overlapping gene models.

this example has at least a few tracks that exhibit the behavior:
http://legumeinfo.org/lis_context_viewer/index.html#/search/lis/glyma.Glyma.06G017900?regexp=&order=chromosome&neighbors=5&sources=lis&matched=2&intermediate=2&algorithm=repeat&match=3&mismatch=-1&gap=-1&score=25&threshold=10

thought it might be a case of a track aligned in reverse-orientation, but not clear that this is the case.

[LEGUME-680] created by adf_ncgr

search tracks service needs reformulation for distributed search mode

The current implementation utilizes the "standard form" of the search tracks service, whose primary parameter is the focus gene. This basically assumes that any implementing instance of the service will be able to walk out from the focus gene to get the families to be matched by candidate result tracks. That's fine if they happen to have the genome from which the focus gene was derived, but fails in the more likely case for the distributed context which is that they don't have it. Seems like we're going to have to change the interface for this call so that it is the set of gene family identifiers from the query track that is passed to the track retrieval providers.

As noted in email exchange, this is an excellent opportunity to refactor the django code so that we can have a standalone track retrieval service without bringing along much code that has served as well but is now pretty well obsolete.

[LEGUME-530] created by adf_ncgr

Mico-Synteny Track Genus and Species

In commits d0cd204 and b3964ef, the "species_name" attribute of the micro-synteny tracks being returned as json by the server for the basic and search views was replaced with the "genus" and "species" attributes. It appears as though the changes in the first commit (the server code) have been reverted while the changes in the second commit (the client code) remain intact, causing the genus and species name to be broken in the client, specifically in the left slider that appears when a micro-synteny track is clicked.

more "engineering" for the repeat algorithm post-processing

discussed a bit outside of JIRA, but can't seem to find the email to copy-paste. Maybe it was a verbal communication. In any case, this is to address the issue observed in some complex contexts like the one below where some segments aligned elsewhere in the track group are also presented as a huddled mass yearning to be trimmed from another track (synchronized highlighting with the genes in the dotplot proves they are duplicated in the alignment display)

[LEGUME-455] created by adf_ncgr

Add Babel to Auto-Build

Changes to files in the client are automatically detected, causing the build command to be rerun, which does things such as compile TypeScript files into JavaScript. Babel is currently not part of the build command, so although changes to the JavaScript files in the assets directory are detected, the files are not automatically transpiled and minified. This means the application is reloaded after changes occur, but the changes don't actually affect the application until the developer manually transpiles with Babel. Add Babel to the build command so this manual workflow is automated.

GCV: Create generic visualization component

The visualization components have a lot of redundant code. It would be good to encapsulate the redundancies in a generic component that each visualization can then extend.

[LEGUME-685] created by alancleary

GCV: Highlight the gene family in the legend

The GCV the triangle gif that is the gene of interest has a thick yellow line around it.
Add some sort of yellow line/box/highlight is around the associated gene family color in the legend so that users can quickly see in the legend which gene family the gene of interest belongs to.

Andrew's thoughts from 12/12 email:
with the current procedure for constructing the legend, the gene family to which the focus gene is assigned (if any) may be initially out of view. Probably the easiest thing to do for this would be to auto-scroll the legend to the highlighted family. Alternatively, there may be some advantages for changing the algorithm for creating family-color relationships so that the focus-family is processed first (and presumably then also would appear first in the legend). I think the current algorithm basically starts at the leftmost gene in the query track and moves right-ward, then downward (as if reading the contexts as a "text"). I was noticing recently that this has the effect of widely separating (in the legend) gene families whose genes are present together in "columns" of the context alignment (these are often probably families that are related but got split up for some reason). I will often "scan" the context view by mousing over the families in the legend (inducing the "highlight" of the genes assigned to that family) and then just moving the mouse down the family list. This works well, but I think the effect might be somewhat more natural if we tried to process the families in this "column-wise" way. Starting with the focus family as the first column might be a little odd since it would introduce a little discontinuity. But on the other hand, it would mean that the color assignments for a specific focus would not be altered when the cache was cleared, even if the neighborhood chosen was bigger (this is sometimes an issue for me when making slides and I'm not careful to avoid re-initializing the color assignments).

[LEGUME-666] created by jdjax

"Search for similar contexts" in Basic View

If you click on a micro-synteny track in the Basic View and click the link "Search for similar contexts", the application will throw a fatal error. In Firefox, "TypeError: can't access dead object" (tab.js). In Chrome, "TypeError: Cannot read property 'setSizes' of undefined" (_onAlignedMicroTracks in search.component.ts).

a tweak to make federated macrosynteny easier?

I'd like to propose the following change
diff --git a/client/app/selectors/macro-tracks.selector.ts b/client/app/selectors/macro-tracks.selector.ts
index f4bff02..663c9bc 100644
--- a/client/app/selectors/macro-tracks.selector.ts
+++ b/client/app/selectors/macro-tracks.selector.ts
@@ -6,7 +6,7 @@ export const macroTracksSelector = (filter, order) => {
if (macroTracks !== undefined && filteredMicroTracks.groups.length > 0) {
let query = filteredMicroTracks.groups[0];
let chrs = filteredMicroTracks.groups.reduce((l, g, i) => {

     if (i > 0 && g.source == query.source) l.push(g.chromosome_name);

     if (i > 0) l.push(g.chromosome_name);
     return l;
   }, []);
   let macro = Object.assign({}, macroTracks);

not requiring "same sourceness" appears to be just what I needed to allow macrosynteny blocks defined on one source with respect to a chromosome whose actual gene content is found in another source to play nicely with respect to the macro-micro coordinated views. See any downsides to this (assuming we can trust "names" to be identifying cross-source)?

GCV: Include white genes in legend

It is not at all apparent why some micro-synteny gene glyphs are white and some a white with dash outlines. Include these in the legend. Moving the highlighted gene family to the top may help as well.

[LEGUME-697] created by alancleary

search tracks service needs reformulation for distributed search mode

[LEGUME-531] created by adf_ncgr

context viewer: threshold for repeat algorithm probably ought to be separated out

while playing around with some alignments that I was trying to finesse the parameters to align things as I thought they "ought" to show up (in a slide), I realized that threshold is playing a bit of a dual role now, both serving as the minimal total alignment score for display of an alignment group (single optimal alignment in case of SW or sum of subalignments in case of repeat) as well as its original (IIRC) usage as the minimal subalignment score for the repeat algorithm to consider including it in the alignment group.

This means (for example, if I want to be able to include an inverted segment of 3 genes in one track, then I have to allow other tracks that are basically nothing more than 3 matched genes to be included.

As a bonus, we might try to use this as an opportunity to document how the user might want to use the awesome powers given them by all these parameter choices.

[LEGUME-453] created by adf_ncgr

GCV: Basic orientations

In the Basic view, the non-uniform track orientations is disorienting users... Orient all the tracks such that the focus genes all have the same orientation.

[LEGUME-691] created by alancleary

context viewer: don't realign when not needed (sorting/filtering param changes)

a bit fussy, but I found myself wishing for the refinement when playing around with trying to get some track displays "just so" for a slide. having to wait for a largish realignment when I was just resorting was a drag. Filtering is probably also not needing realignment (at least, if we effect the separation of the repeat algorithm subsegment threshold from the global threshold filter that I'm thinking of in this context)

[LEGUME-457] created by adf_ncgr

Genome Context Viewer: Minify urls

GCV urls can get a bit unwieldy. There should be a button that minimizes these URLs for easy sharing.

[LEGUME-641] created by alancleary

Implement chromosonal diagonal synteny viewer

Should have functionality similar to the synteny dot plots.

[LEGUME-602] created by alancleary

context viewer: repeat algorithm considered dangerous

there seem to be some parameterizations of the repeat algorithm that can cause it to get a little overly intensive of browser resources (sometimes leading to crash). For example, I think if the
threshold is set to lower than the match value, it may be considering many suboptimal alternatives or something like that. I put a very minimal (ie braindead) guard against thresholds
that aren't at least positive integers, but I think it probably needs to be considered more carefully to give us a good idea for how the various parameters interact before deciding how to put a more sensible safeguard into place.

[LEGUME-452] created by adf_ncgr

GCV: Contextmenu

Requiring that users right-click to reveal the viewer contextmenu effectively makes it a hidden feature. Let's put this content front and center by making it a discrete but apparent menu in every viewer's container.

[LEGUME-694] created by alancleary

add ability to search for gene(s) in current view

in addition to the thing we've discussed for just having an entry box that will initiate a new search with a given gene, sometimes it is useful to just want to ask to have a given gene (or possibly a set of genes) that one expects to be in the current view to be highlighted (e.g. as if the user had moused-over it, but without requiring them to do the needle in haystack shtick).
This could be a more exciting functionality if genes could have more metadata associated with them and it would be made searchable in this fashion- e.g. supposing a provider could add descriptors or gene ontology terms.

let's leave this one as a idea for the future (ie after our fling with pangenomics or else when you need a software engineering task to calm your nerves).

context viewer: hasten track retrieval

we're starting to push the limits of user attention spans with our nicely widened contexts in the refactored version. first step is probably to revisit the django code and see if any quick
obvious fixes could be made; alternatively, maybe some parallelization along the lines of separate requests per species (which would fit in nicely to the plan for parallelizing to multiple data sources!). longer term, we may want to consider whether any of the algorithmic explorations we (sensu royal) made this summer of suffix trees or related data structures could be brought to bear on this problem (in some superficial ways, this is also like the "seed and extend" of BLAST before gapping is brought into the picture)/

[LEGUME-456] created by adf_ncgr

GCV: Help

The Genome Context Viewer Help needs help... Remove the blue help boxes and make the Help button toggle Help mode. When in help mode, whatever element the user clicks on will pull them into that element's step in the Bootstrap tour. Such "hooks" into the tour should cause the next button to be suppressed.

[LEGUME-690] created by alancleary

gene-search component error handling

was just looking into this a bit; seems like the discussion here:
https://angular.io/guide/router#resolve-pre-fetching-component-data

is relevant to our situation. maybe- it's entirely possible I have no idea what they are talking about!

but we seem not to have a previous issue for the error handling problem that is preventing us from joyfully merging the work you did on the gene-search branch . so here it is...

improve load times

like we tried to do once before- only better!

GCV: Sticky alerts

It would be nice to have some (sticky) alerts return after other, more temporal alerts have been shown. For example, it would be nice to have the context search result statistics reappear after an alert saying that the gene links service failed.

[LEGUME-682] created by alancleary

when basic view tracks are flipped for inclusion in MSA, individual gene strandedness should be flipped

not completely sure, but I think this is probably why one of these tracks is not quite like the others

Extend HMM topology to handle structural variations of itnerest

In the search view, the Repeat algorithm is used to capture structural variations, such as inversions. It would be nice to capture such structural variations in the basic MSAs as well. This will require modifying the HMM, likely its topology.

allow control over "highlighting" of genes in context view

just tried the experiment of allowing genes in the trees to link into the context search, and
it is working pretty well; the one drawback is that it's not that easy to pick out the gene
that triggered the search from the resultant display unless you know you want the gene at the
center of the top track. Would be nice to be able to call out the referring gene in some way.
For now, just thinking that a bold border similar to what is done for the focus genes in the classic context view would be reasonable, but would be open to other ideas.

[LEGUME-378] created by adf_ncgr

highlight focus gene of search similarly to basic search case

LEGUME-378 : and as requested by David Grant

Robust tooltips

Currently, tooltips are tightly coupled with the D3 visualizations. This makes them hard to extend and prevents their use outside of the visualizations. Encapsulate tooltips in their own module independent of the visualization code. Additionally, make the tips robust so they can handle arbitrary content, such as meta data.

load synteny block data from gff files into chado for consumption by new context viewer feature

[LEGUME-598] created by adf_ncgr

GCV: avoid "crossed swords" labeling effect

maybe rotating clockwise from noon for the labels that run upwards would solve this:

[LEGUME-678] created by adf_ncgr

Add general meta data support

We updated the v1 schema to indicate that certain AJAX payloads sent by the server may contain meta data, though none of service implementations actually provide meta data at the moment. Update some services to provide relevant meta data, for example, add ks values to the macro-synteny service. Also, update the UI to nicely handle meta data, for example, update the context-menu filter to robustly filter on meta data.

add "outliers" for the query track?

currently, the outliers are displayed for the result track, but not for the query track. seems like there is no reason in principle for this asymmetry. Those genes present in the query track but not in the result track could be displayed along the right hand-vertical edge of the plot area.

GCV: Basic phylogram

This is an experimental feature.

Landing on the Basic view of the Genome Context Viewer from an LIS phylogram can be disorienting if you've never encountered the Genome Context Viewer. To make it more apparent what is happening a phylogram should be placed next to the micro-synteny viewer such that the leaves (genes) align with their corresponding tracks.

[LEGUME-692] created by alancleary

GCV: Gene entry

There should be an input where users can specify a gene to use as the focus.

[LEGUME-689] created by alancleary

outliers display seems to only function on initial vis. of local dotplot

If you have a local plot with outliers, and then "zoom" using the brush, the outliers disappear, even when it seems they should be in view. In fact, if you then clear brush, they do not return to the display until you refresh by clicking on the local tab of the dotplot widget. seems like a bug.

large scale synteny track data integration with context viewer microsynteny display

integrate the large scale synteny info displayed in the gbrowse tracks into the dotplot global views (to give a sense for how far the local regions of synteny
displayed in the context tracks may be extended).

I'd like to get these synteny blocks loaded as features into our chado, which would provide some benefits in being able to more easily share them among other relevant chado consumers, e.g. intermine.

there are also some questions I have on how the tracks are currently defined. For example, when inversions are present, they seem to appear as an overlapping block in the other orientation, which basically makes sense, except that it seems the overlapped track should be "broken" at that point rather than simply overlapped (since it otherwise seems to imply a duplication of some sort).

[LEGUME-454] created by adf_ncgr

GCV: Visualization workers

The various visualizations are currently blocking the interpret when being drawn, effectively freezing the UI until they are done. This can be remedied by using Web Workers when drawing.

[LEGUME-683] created by alancleary

500 error on context retrieval for some genes

e.g. http://localhost:8888/lis_gene_families/chado/context_viewer/search_tracks_service/vigra.Vradi08g07350/?numMatchedFamilies=6&numNeighbors=8&numNonFamily=5

error in DEBUG is :
TypeError at /chado/context_viewer/search_tracks_service/vigra.Vradi08g07350/
reduce() of empty sequence with no initial value
Request Method: GET
Request URL: http://localhost:8888/lis_gene_families/chado/context_viewer/search_tracks_service/vigra.Vradi08g07350/?numMatchedFamilies=6&numNeighbors=8&numNonFamily=5
Django Version: 1.8.4
Exception Type: TypeError
Exception Value:
reduce() of empty sequence with no initial value
Exception Location: /usr/local/www/lis_gene_families/django/chadotest/chado/views.py in context_viewer_search_tracks_service, line 1389
Python Executable:
Python Version: 2.7.10
Python Path:
['/usr/local/www/lis_gene_families/django/chadotest',
'/usr/local/lib/python27.zip',
'/usr/local/lib/python2.7',
'/usr/local/lib/python2.7/plat-freebsd9',
'/usr/local/lib/python2.7/lib-tk',
'/usr/local/lib/python2.7/lib-old',
'/usr/local/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/site-packages']
Server time: Sun, 20 Dec 2015 16:06:04 -0600
Traceback Switch to copy-and-paste view

/usr/local/lib/python2.7/site-packages/django/core/handlers/base.py in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs) ...
▶ Local vars
/usr/local/lib/python2.7/site-packages/django/views/decorators/csrf.py in wrapped_view
return view_func(*args, **kwargs) ...
▶ Local vars
/usr/local/www/lis_gene_families/django/chadotest/chado/views.py in context_viewer_search_tracks_service
gene_pool = list(GeneOrder.objects.filter(reduce(operator.or_, gene_queries)))

[LEGUME-488] created by adf_ncgr

Reimplement context viewer dot plots using WebGL

Should have all the same functionality as the existing dot plots.

[LEGUME-601] created by alancleary

v1.0 context viewer assigning colors to singleton representatives of gene families?

This may be something we decide is not a bug but a feature, but I think the behavior has changed. Case in point:
/lis_context_viewer/index.html#/search/lis/phavu.Phvul.002G085200?algorithm=repeat&match=10&mismatch=-1&gap=-1&score=30&threshold=25&order=distance&sources=lis®exp=&neighbors=15&matched=2&intermediate=2

gene family with single gene representative:
phytozome_10_2.59032189
phavu.Phvul.002G085100: 13274583 - 13278925

[LEGUME-676] created by adf_ncgr

gene families represented in multiple copies per track probably should not count independently towards region support

as with the approach we've taken for finding candidate alignment tracks in the search view, I think we ought to only count distinct gene family instances as contributing to a track's support for an FR. for example here is a group whose two members have very little in common and I'm guessing that the way each individually meets the FR criterion is by having multiple instances of genes that were added to the FR (note that these repeated families don't themselves appear to be shared between the two tracks in this case):

some issues with tracks from different species getting interleaved when they should be grouped together

One outstanding issue is the occasional incorrect display of merged tracks in Chrome (i.e. #/search/cicar.Ca_08611_gene?numNeighbors=20&numMatchedFamilies=6&numNonFamily=5&algorithm=repeat&match=5&mismatch=-1&gap=-1&threshold=17&order=distance)

[LEGUME-440] created by adf_ncgr

Legend Auto-Resizing

When the legend is first drawn, if it is tall enough to require the containing element to have a vertical scroll-bar then the scroll-bar covers part of the legend. The legend should automatically adjust for this scenario whether the autoResize flag has been used or not. When the autoResize flag has been used and the containing element's width changes, there is a delay between the resizing of the container and the legend, which is fine. What isn't fine is that this delay makes the legend look like its horizontal position is sliding around as the container is resized. This is because the contents of the legend are right justified. Fix this by either left justifying the content of the legend or floating the legend to the right of the container so this illusion does not occur.

does it make sense to have "outliers" in global dotplot?

The global dotplot retains the "Outliers" label above the graph, but does not seem to actually display any content there. It seems a little less useful in the global context, just because the "holes" that it highlights in a local segment are more likely to get "filled in" by a non-local member of the same gene family, although only if the non-local member is on the same chromosome, which seems a little arbitrary. I guess we should either:
a) suppress the Outliers label on the global plot if we decide it doesn't make sense
b) make it functional (I'm pretty sure it should at least retain the genes that are unassigned to
families when you go from local to global, since those by definition can't be "filled in")

probably some more thinking to do about the utility of outliers before making this decision

need to account for vertical offsets introduced by new UI elements (?)

I'm not %100 sure I've diagnosed this correctly, but I've noticed that mouseover on the top microsynteny track and other elements near the tops of their respective divs (macro and gene family legends, for example) seems to "not work" unless you are shifted down an amount that seems approximately equal to the height of the little menubar elements. I suspect this is easily fixed for someone who knows how all this fits together (yes, this means you @alancleary)

Dead Space Below Left Slider

In commit 4ce060c the contents of the fixed-position bottom navbar were moved to the new context menu that was introduced in an earlier commit, and thus, the bottom navbar was removed. A vestigial trait of the bottom navbar that remains is the space below the left slider that ensured the slider would not cover the bottom navbar when it is visible. Now that the bottom navbar is gone, this is apparently a UI bug.

Clustering Parameters

Add the parameters for the clustering (FR) algorithm to the "parameters" slider in the basic view.