Hello, First of all thanks for the code and package! It is something

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

OTU Table Read Abundance vs. Present/Absent Data about networknullhpc HOT 6 OPEN

timz0605 commented on July 21, 2024

OTU Table Read Abundance vs. Present/Absent Data

from networknullhpc.

Comments (6)

lentendu commented on July 21, 2024

Hi,

there is no special implementation in the code to handle 1/0 data.
If you use presence/absence data, you probably would like to skip the normalization of read counts by using the option: -n no
The rest is based on Spearman's rank correlation and randomized matrix, so you still need to chose the null model that suits your data.
I have not tested to analyze 1/0 data, in microbiology we also have the depth bias but we consider that the relative abundance is still a valuable information. Log or square-root transformations of relative abundance is then recommended to reduce the importance of hyper-abundant taxa, sometimes due to PCR amplification bias (i.e. using option -n ratio_log or -n ratio_sqrt).
So, you might want to run NetworkNullHPC on a test dataset for which you are sure about the counts to investigate the potential impact of 1/0 transformation on the co-occurrence and co-exclusion results.

from networknullhpc.

timz0605 commented on July 21, 2024

Hello @lentendu，

Thank you for the quick response!

I am relatively new to Linux system and running program that uses a combination of different languages. I was wondering if you could help me with the process? I am trying to run this locally on my computer, and I am using WSL. I have installed R in WSL along with all the required packages

from networknullhpc.

lentendu commented on July 21, 2024

As mentioned in the readme, this tool is only for Linux server with a SLURM job scheduler.

The individual r scripts are available in the rscripts directory if you want to re-implement it in a single script, but I cannot invest time in it.

Alternatives are the original code of Connor, Barberàn and Clauset (2017) in Matlab, or a different way to produce networks, e.g. using RMThreshold R package to detect the correct Spearman's rank corrlation threshold, see for example Bunick et al. (2021)

from networknullhpc.

timz0605 commented on July 21, 2024

Hello @lentendu,

I have had some preliminary success running the whole program (after some debugging and editing the script to fit the HPC I use), and I guess the next step for me will be playing around with adjusting the parameters to see how they affect my results.

Meanwhile, I want to double-check if I have the format for the OTU table correctly. You mentioned in readme that rows will be samples and columns will be OTUs, correct? Since usually, the OTU table output from the bioinformatics pipeline (say vsearch) will have OTUs as rows and samples/locations as columns.

from networknullhpc.

timz0605 commented on July 21, 2024

Besides, I am also curious about how you visualize the network after you obtain the edge list as the final output. In the paper, you plotted the network where each node represents one OTU and an edge between two nodes represents significant co-occur. I was wondering if you ever had other thoughts or intuitions while exploring the data?

Right now, using all default options, I am only able to obtain approx. 10 pairs of OTU which have significant co-occur patterns (not ideal for visualizing using network methods). However, the median Spearman's rank correlation value for those pairs are all above 0.9. I was wondering if it's possible to select/filter/adjust for the threshold? E.g., all pairs with correlation value above 0.5 or 0.8 will be retained.

from networknullhpc.

lentendu commented on July 21, 2024

Hi @timz0605 ,
here are my replies to your last questions:

the OTU table format follows standard in the R vegan package, that is site as rows and OTU/ASV/species as columns. You can easily transpose your matrix in R with function t() if needed.
for visualization, you can use igraph and ggnetwork packages in R, or other softwares like cytoscape or gephi
the heart of this co-occurrence network computation approach is to learn the appropriate Spearman's rank correlation threshold from your data, that is correlation not originating from random co-occurrence. The threshold can vary a lot depending on the size (number of sites and species) of your matrix. With small matrices or when using presence/absence data, the threshold will be relatively high. You should really avoid setting hard threshold. I do not know your data, but it might just be that only 10 pairs of OTU have non-random co-occurrences across your samples.

from networknullhpc.

OTU Table Read Abundance vs. Present/Absent Data about networknullhpc HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent