Giter VIP home page Giter VIP logo

Comments (6)

lentendu avatar lentendu commented on July 21, 2024

Hi,

there is no special implementation in the code to handle 1/0 data.
If you use presence/absence data, you probably would like to skip the normalization of read counts by using the option: -n no
The rest is based on Spearman's rank correlation and randomized matrix, so you still need to chose the null model that suits your data.
I have not tested to analyze 1/0 data, in microbiology we also have the depth bias but we consider that the relative abundance is still a valuable information. Log or square-root transformations of relative abundance is then recommended to reduce the importance of hyper-abundant taxa, sometimes due to PCR amplification bias (i.e. using option -n ratio_log or -n ratio_sqrt).
So, you might want to run NetworkNullHPC on a test dataset for which you are sure about the counts to investigate the potential impact of 1/0 transformation on the co-occurrence and co-exclusion results.

from networknullhpc.

timz0605 avatar timz0605 commented on July 21, 2024

Hello @lentendu

Thank you for the quick response!

I am relatively new to Linux system and running program that uses a combination of different languages. I was wondering if you could help me with the process? I am trying to run this locally on my computer, and I am using WSL. I have installed R in WSL along with all the required packages

from networknullhpc.

lentendu avatar lentendu commented on July 21, 2024

As mentioned in the readme, this tool is only for Linux server with a SLURM job scheduler.

The individual r scripts are available in the rscripts directory if you want to re-implement it in a single script, but I cannot invest time in it.

Alternatives are the original code of Connor, Barberàn and Clauset (2017) in Matlab, or a different way to produce networks, e.g. using RMThreshold R package to detect the correct Spearman's rank corrlation threshold, see for example Bunick et al. (2021)

from networknullhpc.

timz0605 avatar timz0605 commented on July 21, 2024

Hello @lentendu,

I have had some preliminary success running the whole program (after some debugging and editing the script to fit the HPC I use), and I guess the next step for me will be playing around with adjusting the parameters to see how they affect my results.

Meanwhile, I want to double-check if I have the format for the OTU table correctly. You mentioned in readme that rows will be samples and columns will be OTUs, correct? Since usually, the OTU table output from the bioinformatics pipeline (say vsearch) will have OTUs as rows and samples/locations as columns.

from networknullhpc.

timz0605 avatar timz0605 commented on July 21, 2024

Besides, I am also curious about how you visualize the network after you obtain the edge list as the final output. In the paper, you plotted the network where each node represents one OTU and an edge between two nodes represents significant co-occur. I was wondering if you ever had other thoughts or intuitions while exploring the data?

Right now, using all default options, I am only able to obtain approx. 10 pairs of OTU which have significant co-occur patterns (not ideal for visualizing using network methods). However, the median Spearman's rank correlation value for those pairs are all above 0.9. I was wondering if it's possible to select/filter/adjust for the threshold? E.g., all pairs with correlation value above 0.5 or 0.8 will be retained.

from networknullhpc.

lentendu avatar lentendu commented on July 21, 2024

Hi @timz0605 ,
here are my replies to your last questions:

  • the OTU table format follows standard in the R vegan package, that is site as rows and OTU/ASV/species as columns. You can easily transpose your matrix in R with function t() if needed.
  • for visualization, you can use igraph and ggnetwork packages in R, or other softwares like cytoscape or gephi
  • the heart of this co-occurrence network computation approach is to learn the appropriate Spearman's rank correlation threshold from your data, that is correlation not originating from random co-occurrence. The threshold can vary a lot depending on the size (number of sites and species) of your matrix. With small matrices or when using presence/absence data, the threshold will be relatively high. You should really avoid setting hard threshold. I do not know your data, but it might just be that only 10 pairs of OTU have non-random co-occurrences across your samples.

from networknullhpc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.