Comments (6)
Hi,
there is no special implementation in the code to handle 1/0 data.
If you use presence/absence data, you probably would like to skip the normalization of read counts by using the option: -n no
The rest is based on Spearman's rank correlation and randomized matrix, so you still need to chose the null model that suits your data.
I have not tested to analyze 1/0 data, in microbiology we also have the depth bias but we consider that the relative abundance is still a valuable information. Log or square-root transformations of relative abundance is then recommended to reduce the importance of hyper-abundant taxa, sometimes due to PCR amplification bias (i.e. using option -n ratio_log or -n ratio_sqrt).
So, you might want to run NetworkNullHPC on a test dataset for which you are sure about the counts to investigate the potential impact of 1/0 transformation on the co-occurrence and co-exclusion results.
from networknullhpc.
Hello @lentendu,
Thank you for the quick response!
I am relatively new to Linux system and running program that uses a combination of different languages. I was wondering if you could help me with the process? I am trying to run this locally on my computer, and I am using WSL. I have installed R in WSL along with all the required packages
from networknullhpc.
As mentioned in the readme, this tool is only for Linux server with a SLURM job scheduler.
The individual r scripts are available in the rscripts directory if you want to re-implement it in a single script, but I cannot invest time in it.
Alternatives are the original code of Connor, Barberàn and Clauset (2017) in Matlab, or a different way to produce networks, e.g. using RMThreshold R package to detect the correct Spearman's rank corrlation threshold, see for example Bunick et al. (2021)
from networknullhpc.
Hello @lentendu,
I have had some preliminary success running the whole program (after some debugging and editing the script to fit the HPC I use), and I guess the next step for me will be playing around with adjusting the parameters to see how they affect my results.
Meanwhile, I want to double-check if I have the format for the OTU table correctly. You mentioned in readme
that rows will be samples and columns will be OTUs, correct? Since usually, the OTU table output from the bioinformatics pipeline (say vsearch
) will have OTUs as rows and samples/locations as columns.
from networknullhpc.
Besides, I am also curious about how you visualize the network after you obtain the edge list as the final output. In the paper, you plotted the network where each node represents one OTU and an edge between two nodes represents significant co-occur. I was wondering if you ever had other thoughts or intuitions while exploring the data?
Right now, using all default options, I am only able to obtain approx. 10 pairs of OTU which have significant co-occur patterns (not ideal for visualizing using network methods). However, the median Spearman's rank correlation value for those pairs are all above 0.9. I was wondering if it's possible to select/filter/adjust for the threshold? E.g., all pairs with correlation value above 0.5 or 0.8 will be retained.
from networknullhpc.
Hi @timz0605 ,
here are my replies to your last questions:
- the OTU table format follows standard in the R vegan package, that is site as rows and OTU/ASV/species as columns. You can easily transpose your matrix in R with function t() if needed.
- for visualization, you can use igraph and ggnetwork packages in R, or other softwares like cytoscape or gephi
- the heart of this co-occurrence network computation approach is to learn the appropriate Spearman's rank correlation threshold from your data, that is correlation not originating from random co-occurrence. The threshold can vary a lot depending on the size (number of sites and species) of your matrix. With small matrices or when using presence/absence data, the threshold will be relatively high. You should really avoid setting hard threshold. I do not know your data, but it might just be that only 10 pairs of OTU have non-random co-occurrences across your samples.
from networknullhpc.
Related Issues (20)
- Include the median correlation value as third column in the edge list outputs
- Adapt requested time and memory to data size for threshold, edges and network steps
- Avoid error if co-occurrence or co-exclusion network are empty, just report HOT 1
- Add an option to select the type of OTU matrix normalization
- Add an option to control hdf5 storage chunk size
- Change OTU matrix cleaning order
- Fix Slurm job names
- Null model 1 transpose the randomize matrix
- Option -m not correctly set in case
- Add an option to control the threshold of percent total OTUs included in the largest connected component from randomly permuted matrices
- Add SLURM partition name option -p
- Any possibilities to support qsub? HOT 4
- Add clr transformation
- Avoid nodelist option for non-array jobs
- Allow fine tuning of memory and CPU request
- hdf5r vs. rhdf5 HOT 1
- graph.data.frame() was deprecated in igraph 2.0.0.
- -m columns option not working, Error in match.arg HOT 1
- choosing null model using options from permatful in vegan HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from networknullhpc.