microbiome / nmgs Goto Github PK

Neutral model

Perl 5.47% Python 5.44% R 12.98% MATLAB 10.41% Shell 0.43% C 65.08% Makefile 0.19%

nmgs's Introduction

microbiome R package

NOTE While we continue to maintain this R package, the development has been discontinued as we have shifted to supporting methods development based on the new TreeSummarizedExperiment data container, which provides added capabilities for multi-omics data analysis. Check the miaverse project for details.

Tools for the exploration and analysis of microbiome profiling data sets.

This R package extends the phyloseq data container. The package is actively maintened but we have discontinued the development and shifted to support methods development based on the (Tree)SummarizedExperiment data containers, see microbiome.github.io for more details.

Installation and use

See the package tutorial.

Kindly cite as follows: "Leo Lahti, Sudarshan Shetty et al. (Bioconductor, 2017). Tools for microbiome analysis in R. Microbiome package version 1.23.1. URL: http://microbiome.github.com/microbiome. See also the relevant references listed in the manual page of each function.

Contribute

Contributions and feedback are very welcome:

Issue Tracker
Pull requests
Subscribe to the mailing list ([email protected])
Gitter chat room
Star us on the Github page

Publications using the microbiome package

Below some publications that utilize the tools implemented in this package. The list of publications is not exhaustive. Let us know if you know of further publications using the microbiome package; we are collecting these on the website.

Intestinal microbiome landscaping: Insight in community assemblage and implications for microbial modulation strategies. Shetty S, Hugenholtz F, Lahti L, Smidt H, de Vos WM, Danchin A. FEMS Microbiology Reviews fuw045, 2017.

Metagenomics meets time series analysis: unraveling microbial community dynamics Faust K, Lahti L, Gonze D, de Vos WM, Raes J. Current Opinion in Microbiology 15:56-66 2015.

Tipping elements in the human intestinal ecosystem Lahti L, Salojärvi J, Salonen A, Scheffer M, de Vos WM. Nature Communications 5:4344, 2014.

Fat, Fiber and Cancer Risk in African, Americans and Rural Africans O’Keefe S, Li JV, Lahti L, Ou J, Carbonero F, Mohammed K, Posma JM, Kinross J, Wahl E, Ruder E, Vipperla K, Naidoo V, Mtshali L, Tims S, Puylaert PGB, DeLany J, Krasinskas A, Benefiel AC, Kaseb HO, Newton K, Nicholson JK, de Vos WM, Gaskins HR, Zoetendal EG. Nature Communications 6:6342, 2015.

Associations between the human intestinal microbiota, Lactobacillus rhamnosus GG and serum lipids indicated by integrated analysis of high-throughput profiling data Lahti L, Salonen A, Kekkonen RA, Salojärvi J, Jalanka-Tuovinen J, Palva A, Orešič M, de Vos WM. PeerJ 1:e32, 2013.

The adult intestinal core microbiota is determined by analysis depth and health status Salonen A, Salojärvi J, Lahti L, and de Vos WM. Clinical Microbiology and Infection 18(S4):16 20, 2012.

Acknowledgements

Main developer: Leo Lahti

Main co-authors: Sudarshan Shetty

Contributors

Thanks for [@johanneskoester] and [@nick-youngblut] for contributing Bioconda installation recipe.

The work has been supported by the following bodies:

Academy of Finland (grants 256950, 295741, 307127)
University of Turku, Department of Mathematics and Statistics
Molecular Ecology group, Laboratory of Microbiology, Wageningen University, Netherlands

This work extends the independent phyloseq package and data structures for R-based microbiome analysis.

nmgs's People

Stargazers

Watchers

Forkers

beadyallen ml-lab nejcstopno apascualgarcia gaberoo myanggh

nmgs's Issues

Required number of iterations (C code?)

NMGS only generates the file_out_m.csv and the file_out_s.csv files if we do 50000 iterations of the model. It only worked when I ran it for 50000! It's not too bad for my smaller files, but the larger one I was running got killed at about iteration 27000. Think it's to do with memory starvation or something.

NAs and computation for very very large datasets

Hello,
I´ve downloaded the scripts and run the readme.md 'tutorial' fine in a
large computer cluster.

I´ve been able to produce results from a mock (greatly reduced)
dataset, however I get
nan nan 0 1964 0.000000
nan nan 0 1963 0.000000
when I run the ./Scripts/Sig.pl 1 3 and ./Scripts/Sig.pl 2 3 on the
results (I checked that there were no rows with sum = 0 [i.e. empty
rows]).

I´ve also used one of the (very large) datasets but trying to reduce the computing effort (b 10 and t 20) just to see how it goes. While each iteration takes few seconds the 'sampling fit....' step is taking forever.
are there any limits to dataset size? or any suggestions on which parameteer to use?

thanks,
D

Segmentation Fault

Hi,
I'm running NMGS with microbial datasets. It runs OK with one containing 7000 OTUs and 122 samples using 20000 iterations (burnin 10000). Then, when I try another dataset with 122 samples but 17000 OTUs I get a Segmentation Fault error, even though there's plenty of RAM. I've tried reducing the number of iterations to see if that's the problem but same error pops up. Now I'm running it with more iterations to see what happens. Any idea/suggestion?
I've compiled it Ubuntu, and there are a couple of warnings during compilation, but that doesn't seem to affect the program, as the test run functions OK.
Do you have a binary linux version that I could try just to check it is not a compilation issue?
thanks in advance,
Ramiro

out_s.csv has "nan" in the third and fourth columns

Hi,

I have run NMGS on the Simulation.csv and another file of my experimental data without problems. However, when I ran it with a new set of data I was having an output out_s.csv file that had "nan"s in the LL and LO columns.

Example:
25000,-43423.860267,nan,nan,7.833056,8.455062,9.456408,591,512,488

the out.csv and out_m.csv files both look normal to me ( no "nan"s in either file). Any idea what might have gone wrong?

Thanks,
Fangqiong

Memory overload if very abundant OTUs are present (~10^5)

Hi Chris,

I found a problem with matrices having OTUs with large abundances, I attach a reproducible example. Basically, with the attached matrix the script allocates a lot of memory (>32GB), I guess this is because the size of any of the structures generated depends on the OTU abundances. Simply deleting the OTU 18 will make it run normally, this guy has abundances of order 10⁵.

Thanks in advance for your help,

Alberto

Example: test5.txt

HDPSSample.py with identation error

Hello,

I tried:
HDPSSample.py -t 21.39 -n 50 -o Trial.csv -s 1 -m 2.19

and got an identationError on line 132 ('else:)'

thanks,
D

nmgs_metapopulation_average for the full model

Dear @antagomir,

I was wondering if you figured out how to reconstruct the composition of the metacommunity for the full neutral model, as far as I can see in the R vignette this problem was not solved, you said:

Metacommunity distribution for the full neutral model (q) - are the species indices directly comparable, or how to combine across MCMC samples?

And as far as I can see the current version of nmgs_metapopulation_average returns TODO. I guess it is needed that the NMGS script returns an indexed vector or so, are you planning to tackle this problem?

Thanks in advance,