benkaehler / q2-makarsa Goto Github PK
View Code? Open in Web Editor NEWA QIIME 2 plugin to generate and visualise microbial networks.
License: BSD 3-Clause "New" or "Revised" License
A QIIME 2 plugin to generate and visualise microbial networks.
License: BSD 3-Clause "New" or "Revised" License
Dependencies: #6
Prerequisites:
Write a tutorial for the QIIME 2 forum.
Example: https://forum.qiime2.org/t/using-q2-clawback-to-assemble-taxonomic-weights/5859
Prerequisites:
Write the plugin. It should register a function that calls your R script.
Example: https://github.com/qiime2/q2-dada2/blob/master/q2_dada2/plugin_setup.py.
Background: https://dev.qiime2.org/latest/tutorials/first-plugin-tutorial/.
Dependencies: #4
Prerequisites:
Create a conda package for distribution of the necessary R dependencies and the plugin itself. Add continuous integration.
Will need to figure out how to get SpiecEasi to install via conda.
Here: https://github.com/BenKaehler/q2-SpiecEasi/actions/new?category=continuous-integration.
Example: https://github.com/qiime2/q2-dada2/blob/master/ci/recipe/meta.yaml and https://anaconda.org/kaehler/q2-clawback.
Dependencies: #1
Prerequisites:
Create a Network QIIME 2 semantic type. It should provide transformers for reading and writing the igraph data structures saved be the R script.
Outdated example with lots of explanatory comments: https://github.com/qiime2-graveyard/q2-dummy-types/tree/master/q2_dummy_types.
Real-world working example: https://github.com/qiime2/q2-dada2/blob/master/q2_dada2/_stats.py and https://github.com/qiime2/q2-dada2/blob/master/q2_dada2/_transformer.py. Background: https://dev.qiime2.org/latest/storing-data/.
Hi,
thanks for putting this together! ๐
I've been trying to apply it to my FeatureTable of 323 samples, ~5000 features, and ~2,900,000 total frequency. Running it with 8 cores it crashed after 3 days when the memory usage reached 60 GB. Do you have any recommendation for the number of cores to use, and maybe an estimate for the corresponding run time and memory requirements?
Thanks a lot! ๐
Best,
Lena
This is a note that I've taken the config file pulsar parameter out of the inputs, because it breaks qiime2 provenance tracking.
Could be added in future, but would need extra work. Perhaps a new config file semantic type would be needed, if it doesn't already exist.
See here.
Create a new method that exposes basic FlashWeave functionality and update installation instructions and unit tests.
Flashweave:
Python code is linted but R and Julia has not been.
Prerequisites:
Write unit tests for the semantic type, the plugin function, and the visualisation.
Example: https://github.com/qiime2/q2-dada2/tree/master/q2_dada2/tests.
FlashWeave seems to be sensitive to having exactly matching metadata, so we can probably put something in _flashweave.py to encourage compatibility before sending it for analysis.
Review and expose all the parameters to the default SpiecEasi function and Pulsar.
Move node descriptions to the corner of the graph and have them persist for long enough to be copied.
As of right now, Louvain community detection uses edge weights, but centrality calculations discard weight information.
It would be better if weight information were used consistently. That is Louvain community detection should optionally allow unweighted calculation, and centrality calculations should use weights by default but optionally allow unweighted calculations.
But there are some complications. I'll collect them here to help with future unravelling.
SpiecEasi MB, SpiecEasi Glasso, and FlashWeave all return different "weights". They appear to be:
So while there is some doubt about the specific interpretation of each weight, they all seem to be "correlation-like". That is, larger in absolute value implies a stronger connection. I compare correlation-like to "distance-like", where a stronger connection would be implied by a smaller value.
Reviewing how weights are handled in our centrality statistics:
So correlation-like weights are probably appropriate for the latter two, but should be flipped for the second and third. For the first it doesn't matter.
In the visualise-network
visualization it would be nice to have an overview tab with all groups (even if it is just static) for an easier overview (and condensed figure for publication). Switching between tabs makes it difficult to compare groups if there are many.
Likewise, such a tab could give a nice comparison of the network topologies. This overview tab could display a table with average network topology metrics for each group (and an additional stat test?). This would not need to be dynamic of course, as the groups would not change, so a static table would suffice.
Prerequisites:
Write a Python script to create the visualisation that takes the Network semantic type and maybe the original table as inputs and displays an interactive network. It would be great if we could produce an interactive display of the network using the d3 javascript library.
Example: https://github.com/ConstantinoSchillebeeckx/q2-phylogram/blob/master/q2_phylogram/plugin_setup.py.
The new Louvain community detection functionality needs to be mentioned in the JOSS paper.
Please create a PR to the joss-paper branch with appropriate amendments to paper.md and joss.bib.
In the visualise-network
visualization the download as png button is a bit hidden. Could this be made more prominent? slightly larger and maybe at the top of the aesthetics controls?
This method would take an existing network as an input and annotate inferred statistics like betweenness to the nodes.
In the visualise-network
visualization, it would be nice if the aethetics settings remained locked when switching between tabs for easier comparison of groups. Actually I would say this is essential to ensure consistency if preparing a figure for publication (otherwise it would be easy to accidentally introduce inconsistencies if created a figure with all groups)
It would be useful to be able to extract the node-wise information that is available in the visualisation in tabular form.
when putting in #78 and #79 I decided to try using various makarsa outputs with other QIIME 2 actions to test the new functionality.
The outputs of louvain-communities
did not look quite like what I expected. I had understood that the node map would map features to modules... but the node and module IDs are both arbitrary IDs. This prevents the node map from having useful applications, e.g., to annotate or collapse features based on module identity.
@BenKaehler @rhernandvel is this expected? Shouldn't node IDs correspond to feature IDs? Are the feature IDs being replaced by arbitrary node IDs in louvain-communities
, or are these the node labels in the input Network
?
To reproduce:
Using the outputs from the readme tutorial, this action will show you the node and module IDs:
qiime metadata tabulate \
--m-input-file node-map.qza \
--o-visualization node-map.qzv
and this action fails, because the node IDs do not actually correspond to feature IDs:
qiime feature-table group \
--i-table sponge-feature-table.qza \
--p-axis feature \
--m-metadata-file node-map.qza \
--m-metadata-column COMMUNITY \
--p-mode sum \
--o-grouped-table grouped-table.qza
This could be done in the R script:
library(Matrix)
secor <- cov2cor(getOptCov(se.gl.amgut))
sebeta <- symBeta(getOptBeta(se.mb.amgut), mode='maxabs')
elist.gl <- summary(triu(secor*getRefit(se.gl.amgut), k=1))
elist.mb <- summary(sebeta)
elist.sparcc <- summary(sparcc.graph*sparcc.amgut$Cor)
hist(elist.sparcc[,3], main='', xlab='edge weights')
hist(elist.mb[,3], add=TRUE, col='forestgreen')
hist(elist.gl[,3], add=TRUE, col='red')
(taken from the SpiecEasi README)
Ideally we would like to be able to colour by taxonomic classification at a specified level.
Hi @rhernandvel and @nbokulich, could you please have a quick look at this?
There might be a quick fix that you can see straight away.
This was my shell session:
$ qiime makarsa louvain-communities --i-network-input pd-mouse-network.qza --o-community-out louvain-nodes.qza
Saved NodeMap to: louvain-nodes.qza
$ qiime makarsa visualise-network --i-network pd-mouse-network.qza --m-metadata-file louvain-nodes.qza --o-visualization louvain-network.qza
There was an issue with viewing the artifact 'louvain-nodes.qza' as QIIME 2 Metadata:
Artifacts with type NodeMap cannot be viewed as QIIME 2 metadata.
I can see the metadata registrations for NodeMaps in the plugin setup, so I guess some small component is missing.
So that you can reproduce the issue I've included the input. (I had to zip it because github doesn't like Q2 artifacts.)
Add link strength slider.
Turns out FlashWeave can do cross-domain inference.
check for old unused data
The JOSS reviewer criteria require
A description of how this software compares to other commonly-used packages in this research area.
We don't currently have that but we should.
If there are fewer than the maximum levels of taxonomy, we get a Nan at that level for that node.
So currently if you input multiply metadata files to to visualise-network, and metadata is missing for one of the nodes in one of the files, then you won't be able to see the metadata for that node that was in any of the other files.
This relates to how QIIME 2 merges metadata. There is a fix coming, which will require users to merge metadata using, say, an outer join before feeding it visualise-network. The fix will be after the next QIIME 2 release, however.
It would be good to figure out a work-around in the meantime, perhaps implement our own merge method until the official one is available.
Should be a fairly straightforward generalisation of cross-domain analysis.
Dependencies: #8
Prerequisites: None
Add the plugin to the QIIME 2 plugin library.
It's here: https://library.qiime2.org/plugins/.
Prerequisites:
Write an R script that loads a table exported from a FeatureTable[Frequency] type table and saves down an igraph when it has completed. It should expose the parameters of a normal call to SpiecEasi.
Example: https://github.com/qiime2/q2-dada2/blob/master/q2_dada2/assets/run_dada.R.
Hi,
thanks so much for wrapping FlashWeave - I'm super excited about this plugin! ๐
I installed q2-markarsa in a fresh qiime2-2023.2 environment and tried running it with my FeatureTable, without modifying any of the optional parameters but got the error:
An error was encountered while running FlashWeave in Julia (return code 127)
I saw that it might mean that the command is not found in Julia?
I hope it's okay that I'm already trying to use FlashWeave! ๐
Cheers!
Lena
This is the complete error message:
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.
Command: run_FlashWeave.jl --datapath /scratch/lfloerl/tmpdata/tmp2ah97p88/input-data.tsv --output /scratch/lfloerl/tmpdata/tmp2ah97p88/network.gml --max_k 3 --alpha 0.01 --conv 0.01 --max_tests 1000000 --hps 5 --n_obs_min -1 --time_limit -1.0 --prec 64 --update_interval 30 --verbose --sensitive --feed_forward --FDR --normalize --make_sparse
/usr/bin/env: julia: No such file or directory
Traceback (most recent call last):
File "/scratch/lfloerl/.condaenvs/qiime2-2023.2-new/lib/python3.8/site-packages/q2_makarsa/_flashweave.py", line 84, in flashweave
run_commands([cmd])
File "/scratch/lfloerl/.condaenvs/qiime2-2023.2-new/lib/python3.8/site-packages/q2_makarsa/_run_commands.py", line 19, in run_commands
subprocess.run(cmd, check=True)
File "/scratch/lfloerl/.condaenvs/qiime2-2023.2-new/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['run_FlashWeave.jl', '--datapath', '/scratch/lfloerl/tmpdata/tmp2ah97p88/input-data.tsv', '--output', '/scratch/lfloerl/tmpdata/tmp2ah97p88/network.gml', '--max_k', '3', '--alpha', '0.01', '--conv', '0.01', '--max_tests', '1000000', '--hps', '5', '--n_obs_min', '-1', '--time_limit', '-1.0', '--prec', '64', '--update_interval', '30', '--verbose', '--sensitive', '--feed_forward', '--FDR', '--normalize', '--make_sparse']' returned non-zero exit status 127.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/scratch/lfloerl/.condaenvs/qiime2-2023.2-new/lib/python3.8/site-packages/q2cli/commands.py", line 352, in __call__
results = action(**arguments)
File "<decorator-gen-398>", line 2, in flashweave
File "/scratch/lfloerl/.condaenvs/qiime2-2023.2-new/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self._callable_executor_(scope, callable_args,
File "/scratch/lfloerl/.condaenvs/qiime2-2023.2-new/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in _callable_executor_
output_views = self._callable(**view_args)
File "/scratch/lfloerl/.condaenvs/qiime2-2023.2-new/lib/python3.8/site-packages/q2_makarsa/_flashweave.py", line 86, in flashweave
raise Exception(
Exception: An error was encountered while running FlashWeave in Julia (return code 127), please inspect stdout and stderr to learn more.
Spiec-easi has a cross-domain interaction method. This could be useful to add as a separate method: https://github.com/zdk123/SpiecEasi#cross-domain-interactions
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.