tiagofilipe12 / patlas Goto Github PK

View Code? Open in Web Editor NEW

20.0 5.0 10.0 63.79 MB

Plasmid Atlas - A web interface to browse for plasmids and their associated genes. Visit us at:

Home Page: http://www.patlas.site

License: GNU General Public License v3.0

JavaScript 54.39% HTML 22.10% CSS 2.44% Python 21.07%

plasmids database mash visualization vivagraph flask

patlas's People

Contributors

Stargazers

Watchers

Forkers

odiogosilva bfrgoncalves gitter-badger thanhleviet b-ummi nunoalexandrefaria

patlas's Issues

Duplicated links removal needs refactor

Currently, duplicated links are being removed using js front end, however this could be done more efficiently using python back end. While creating json file, with all entries, something like this gist can be done.
hash() can be used to improve script efficiency but maybe it is not worth given that strings are small (needs testing).

Also, it should be considered if json file should follow a structure more similar to database: {acc: { length: x, links: [a, b, c]}} . This would be nicer for js to parse but it will require more refactoring from the front end side.

Order filters not working properly

Order filters are not working properly, all orders are being appended to the same entry and not plotting the colour (branch:taxa).

Update database

Before releasing full database, it should be updated from NCBI, given that this database is suffering updates every 3 months, which often breaks fasta parsing.

lenght selection should return to previous color instead of default color

should implement a different method for returning to previous color of each node

multi-level selection issue

Multi-level selection of taxa has an issue when all 4 levels are selected, rendering no selection at all.

add new taxa tree

Add new taxa_tree.json file to populate the taxa menus within the app.

Multiple calls on file handling function

When uploading .json files to the application, the handling function is called twice.

more than 20 colors

currently the visualization has no support for more than 20 colors for each taxa. In future versions this should be addressed.

Fix plasmid names

Plasmid names are retrieving something like pLMG9303 instead of pLMG930.3. Database needs to be re-worked in order to correct this issue.

Playing around with p_values, mash distances and maximum number of links

Future implementations should consider including options to specify the p-value, mash distances and maximum number of links between sequences to which the user want to define a cutoff.

Memory overkill

When many sequences are given as input pairwise comparisons can became very intensive and function mash_distance_matrix is storing a lot of entries which might be consuming a lot of memory.

Add labels to nodes

One way to quickly visualize metadata such as accession number could be displayed in a label next to the corresponding node. However this might be very confused... But perhaps there is other way.

This would be very useful to display images outside patlas as png or jpg.

After first filtering, can't remove color from legend and graph

After filtering with a given set of taxa, cannot properly remove color from legend and graph.

Add zoom in and out

Add a zoom in and zoom out slider to vivagraph output

add each coverage length on read comparsion

When comparing reads diff add each read coverage length of reference sequences.

Add circular plots for coverage

Taken the results from samtools depth file generated by PlasmidCoverage it would be nice generate a plot with coverage depth of all positions of a given plasmid.
However, this should be done only for the results under the defined cutoff of PlasmidCoverage script, in order to avoid an overload of information.

We should check if plotly or any other js library has implemented any kind of circular histogram that we can re-use.

not all taxa are being properly colored [branch:taxa]

When selecting a taxa some of the child taxa will not be processed.

distance filters after re-run

Distance filters after re-run currently doesn't have the actual distance value (it just has the accession in the database), therefore it would be important to populate the database with the accession numbers + distances.

Currently this has the following structure:

{"significantLinks": ["NC_010869_1", "NC_025192_1"], .... }

However a more nested structure with name and distance linked together, e.g. accession|distance instead of accession. This would be easier to implement in a first instance.

change filtering

Right now filter iterates through all nodes and removes the nodes that doesn't have a color attributed or a link to a colored node. However, this behavior renders a slow loading time and thus should be replaced by queries to database that retrieve the information on the nodes and generates a new json to render a new instance of the graph (smaller than the initial).

Dark mode

Add a dark mode to visualization.

four additional nodes spamming links in visualization.html

In the example provided in modules/dict_temp_005_l4.json, four additional links are being created and linking to every node. From a total of 5384 sequences retrieved in python, 5388 nodes are being created in which 4 nodes connect to every other node.
Note that, currently only 4 links are being stored in json file, so visualization.html should not have nodes with more than 4 links and should have 5384 nodes instead of 5388.

reader is not defined

When clicking in cancel selection in file modals, reader variable is not defined, which makes the button useless.

Add cluster visualization

A way to cycle between clusters should be implemented and then there is already a way to search for accessions that could help to find a given cluster associated with a given sequence.

add loading information for plots

linked with #74 . Plots should benefit from a loading information where the user can see the queries that are being made and the ones that have already been made.

progress bar broken

progress bar became broken after inserting a pool.join() to wait for the mp process to finish.

add UI for graph control

Ui control graph for vivagraph display may help to establish a better visualization. Therefore, add a div that allows to specify and change parameters for vivagraph layout.

Remove gifs

Remove example gifs that are not used anymore.

Display metadata box

Metadata box could be displayed on some event click (button or something else). This metadata could show:

already available - check listGiFilter variable

still needing implementation

number of nodes with a given resistance gene / plasmid family

Add a slider for coverage

Coverage results could have a slider similar to length filters, that enable the user to select and unselect previous nodes with a certain coverage.
Also legend should be updated while interacting with this slider, but only on submit definitive range of coverage percentages

Show filter coverage

Add a filter that only shows a given coverage threshold in reads mode.

Adding two entries to html legend

Two entries of the same taxa are being added to the html legend.

database cleanup

For some reason last NCBI database (plasmid) from 20/7/2017 has genes mixed with plasmid sequences. To remove them search for the header CDS and match string using .lower(), because there "CDS" and "cds".

modals in different window size

The different elements of modals overlap in small window sizes.

Update README

README needs to be updated according with api branch.

center graph after removing nodes

Graph should be re-centered after removing nodes and links with re run button.

conflict between legends and reset buttons

When read filter legend is triggered, and taxa filters are then appended to the legend, the lists of all species present in legend is not removed until next instance of taxa filters.

order appending color scheme

If we choose a color scheme for distances it will be appended to taxa filters modal body.

Minimap

add a mini-map to the bottom-right corner

asynchronous removal of all nodes and links [branch:taxa]

When triggering Re run button the removal of nodes and links is odd, and only after several clicks on the Re run button all the removals are performed.

error while filtering with no taxa filters

While trying to submit a function when no taxa filters are applied an error message is raised:

Uncaught ReferenceError: assocFamilyGenusGenus is not defined
    at HTMLButtonElement.<anonymous> (visualization_functions.js:855)
    at HTMLButtonElement.dispatch (jquery-3.1.1.js:5201)
    at HTMLButtonElement.elemData.handle (jquery-3.1.1.js:5009)

Although this doesn't affect the final result and a proper warning is raised for the user, error messages to console should be avoided and thus handling instances where assocFamilyGenus , assocOrderGenus and assocGenus are undefined should be done.

Linked node selection

When two nodes are selected on mouse click, after deselecting one, the linked node is deselected also despite the initial node is still selected.
A check has to be implemented in order to see if the linked node is still selected in another node.

taxa_fetch.py refactor

taxa_fetch.py should be refactored in order to be loaded by MASHix.py instead of running separately. This will imply that doc dictionary will have information regarding the taxa and committed just once, rather than removing previous entry and adding a new entry each time we want to add taxa information to the psql database.

Concurrency

Nodes being added async is rendering the browser to freeze in firefox and in pcs with less resources.

Tried to implement a concurrency like this:

const limit = 10
let running = 0

const scheduler = () => {
  while(running < limit && json.nodes.length > 0) {
     const array = json.nodes.shift()
     console.log(array)
     addAllNodes(array, () => {
     running--
     if (json.nodes.length > 0) {
       scheduler()
      }
    })
    running++
  }
}

scheduler()

This returns too much recursion because scheduler is being executed inside scheduler.