lentendu / networknullhpc Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 4.0 89 KB

OTU co-occurrence network inferrence base on null model for HPC

License: MIT License

Shell 51.89% R 48.11%

otu networks null-model r

networknullhpc's Issues

Null model 1 transpose the randomize matrix

Add an option to control hdf5 storage chunk size

For example -c chunk_size, default value: 1e5

A lower value would increase parallelisation with lower memory request for the edge step (currently 12G for 1 thread).

Need to adapt slurm memory request to this chunk size for the edge step.

Avoid nodelist option for non-array jobs

If more than one node provided to nodelist, one task jobs will complain and/or not be queued

Any possibilities to support qsub?

It will be wonderful if you can or we can supply a version to support qsub based HPC. I think that is not so difficult to do.

Integrate environmental matrix to observed correlations

Fix Slurm job names

replace %x by actual jobnames, this only works with sbatch

graph.data.frame() was deprecated in igraph 2.0.0.

Hello,

While running the program, I encountered this warning message in the log file:

Warning message:
`graph.data.frame()` was deprecated in igraph 2.0.0.
ℹ Please use `graph_from_data_frame()` instead.

The program works fine. Just an FYI perhaps for future updates? Thank you!

Change OTU matrix cleaning order

First remove low occurrence OTUs

Then remove low abundance samples

The other way around can maintain samples with low abundance when most OTUs in these samples have low occurrence OTUs, which are subsequently removed

OTU Table Read Abundance vs. Present/Absent Data

Hello,

First of all thanks for the code and package! It is something I've been thinking of and trying to do, and love to see there have been work done in the past.

For the input OTU table, I was wondering if it only considers read counts data? We all know that many potential biases could be introduced during the PCR process and bioinformatics pipeline. Therefore, for many metazoan metabarcoding studies, people convert the read counts data to present/absent data (1 vs. 0) for downstream analyses. So, I am curious about what approaches this code takes.

Adapt requested time and memory to data size for threshold, edges and network steps

choosing null model using options from permatful in vegan

Hello @lentendu,

Thank you again for creating the script, and I have been making progress and getting results along the way. I have a quick question about the -m null_model when using the permatfull function from vegan. Although I have been looking for documentation regarding different options for fixedmar, almost none talked about the rationale of choosing one over the other. E.g., when to choose row vs. column vs. both.

Therefore, I was wondering if you have had experienced with the options before? Also, I was curious of what you think about the options, for example, which one is better suited for analyzing OTU table? Thank you!

Allow fine tuning of memory and CPU request

Could be a limiting factor on some HPC.
Allow to fix the maximum memory per job per CPU for parallel and array jobs

Add an option to control the threshold of percent total OTUs included in the largest connected component from randomly permuted matrices

Default is 1%

Option -p for percent

Avoid error if co-occurrence or co-exclusion network are empty, just report

Add an option to select the type of OTU matrix normalization

-m option for null model
-n option for normalization

ratio: count ratio scaled to sum to same total (as defined by option -d) in each sample and rounded
log_ratio: ratio, then log transform (log method of decostand), then scale
sqrt_ratio: ratio, then sqrt transform (aka. hellinger transformation), then scale
no: no normalization

Implements Grid Engine batch-queueing system

Include the median correlation value as third column in the edge list outputs

hdf5r vs. rhdf5

Hello!

It has come to my attention that there seem to be two hdf5 packages available in R. One is hdf5r, while the other one is rhdf5. In the readme file, you mentioned that we should install the hdf5r package. While in the script, it is trying to read R library calling library(rhdf5). I was wondering if you could clarify this a bit more? Thank you!

Add SLURM partition name option -p

-m columns option not working, Error in match.arg

Hello,

While I was trying the -m columns option for choosing the null model, I encountered the below error code which causes the job submitted showing a status of DependencyNeverSatisfied

Error in match.arg(fixedmar, c("none", "rows", "columns", "both")) : 
  'arg' must be NULL or a character vector
Calls: permatfull -> match.arg
Execution halted

I was wondering if you could help look into this? Thank you!

Option -m not correctly set in case

Add clr transformation

Group spearman bootstraps for small matrices

Compute observed and null matrices Spearman's rho for multiple seeds in each array job to reduce time associated with job queueing (initialisation, completion).
For example, compute 10 seeds per array for matrices with less than 1000 OTUs.
The runtime should almost reach (but not overpass) the maximum time limit for the short queue.

lentendu / networknullhpc Goto Github PK

networknullhpc's Issues

Recommend Projects

Recommend Topics

Recommend Org