xy-repo / q2-repeat-rarefy Goto Github PK

License: BSD 3-Clause "New" or "Revised" License

Python 96.03% TeX 3.97%

q2-repeat-rarefy's Introduction

q2-repeat-rarefy: QIIME2 plugin for generating the average rarefied table for library size normalization using repeated rarefaction

When handling a sparse dataset, I noticed that the rare taxa were easily ignored by the traditional one-shot rarefaction.
To deal with this problem, I proposed the "Average Rarefied Table" method and wrote a very simple plugin (reference: https://github.com/qiime2/q2-feature-table/tree/master/q2_feature_table/_normalize.py)).
Repeat rarefy simply runs random rarefaction N times, and computes the average count (floats are round up) of each OTU (ASV/feature) to generate the final average rarefied OTU table.
It proves that comparing with the one-shot rarefaction, using repeat rarefy to normalize library size can keep significantly more OTUs (unpublished results).
As the float average count of OTU is round up, the total OTU count of each sample may not be exactly the same.
This method has the potential to be an ideal alternative to the current one-shot rarefaction, as it can keep information and avoid variation of composition.
In addition to OTU (ASV/feature) table, the "Average Rarefied Table" method can also be extended to other profile tables (e.g., taxonomic profile table, gene profile table).

Installing

conda activate qiime2-2020.11
pip install git+https://github.com/yxia0125/q2-repeat-rarefy.git

Type "qiime repeat-rarefy" to test if the installation is successful.

Uninstalling

pip uninstall q2-repeat-rarefy

Using

qiime repeat-rarefy repeat-rarefy --i-table table.qza \
                                  --p-sampling-depth 2000 \
                                  --p-repeat-times 100 \
                                  --o-rarefied-table average_rarefied_table.qza

The above example rarefied the 'table.qza', with the sampling depth of 2000 and the repeat times of 100, to 'average_rarefied_table.qza'.
You can set the sampling depth based on your own dataset and increase repeat times to 1,000, 10,000 ...

Citing

If you are interested to use this method, please include the following citation:

Yao Xia, q2-repeat-rarefy: QIIME2 plugin for generating the average rarefied table for library size normalization using 
repeated rarefaction, (2021), GitHub repository, https://github.com/yxia0125/q2-repeat-rarefy.

q2-repeat-rarefy's People

Contributors

Stargazers

Watchers

q2-repeat-rarefy's Issues

I think doing repeated rarefaction is statistically incorrect

Hi,
I saw your q2-repeat-rarefy qiime2 plugin and really appreciate your contribution to the microbiome community.
However, I think the idea of multiple rarefaction is incorrect from a statistical point of view:
The purpose of rarefaction is to remove the effects which are due to different read depths in the different samples. For example, lets take the situation where we have a single biological sample, and we sequence it to two depths (and assume for each depth we have 10 technical repeats): 10 repeats with 10k reads and 10 repeats with 20k reads.
If we rarify all repeats to 10k reads/repeat, and then look for difference between the repeats originating from 10k reads and originating from 20k reads, we will get no significant differences, as we would expect.
However, if we apply instead the repeat-rarefy procedure to the 20k reads repeats, and then look for difference between the repeats originating from 10k reads and originating from 20k reads, I think we may get some bacteria different between the 2 groups.
To explain why i think this will happen, lets assume we have some rare bacteria (say 100) that are in the (true) frequency of 1/10000 in the original sample. In the 10k reads/sample repeats, we expect to get approx. 50 of the rare bacteria with 1 read, and 50 with 0 reads. In the 20k reads/sample repeats, we expect to get approx. all the rare bacteria with 1 read/bacteria.
If we just rarify to 10k reads/sample, we will lose approx 50 of these rare bacteria and keep the other 50 (similar to the 10k reads/sample repeats).
However, if we do repeat-rarefy, we will get approx. 0.5 read/sample for these 100 rare bacteria. Then (if we round up), we will get 1 read/sample for the 100 rare bacteria. And therefore, it will be different compared to the 10k reads/sample repeats.

Another way to think about it is that doing infinite number of repeat-rarefy is equivalent to total-sum-scaling (i.e. inifinite repeat-rarefaction to 10k reads is similar to normalizing by dividing by the original number of reads in the sample and multiplying by 10k.

Will be happy to continue the discussion.
And please do not let this discourage you from continuing to contribute to the microbiome and qiime2 community!
Amnon

¿How to cite this software?

Hi,
I wanted to ask whether it would be possible for you to generate a doi for the software, for instance using Zenodo, https://zenodo.org, which will facilitate citing the relevant version of the software.

Many thanks in advance,

Inti Pedroso

Repeat rarefaction using a phyloseq object in R

Hi @yxia0125. I was wondering if you have a similar function in R that essentially gives out an averaged rarified table using a phyloseq object in R. I have used a couple of functions, but they have very different objective and hence a different output than what I am looking for. I am aware that the phyloseq object can be converted and imported in the qiime environment, however I am trying to avoid a lot of back and forth in my analysis pipeline. Any help is appreciated!
Thanks!

How this script deal with sample depth lower than the setting.

Dear Xia

How will this script work if the sample depth is lower than the set number, e.g. the setting is 10,000 but one sample only has 8000 reads. Will all the 8000 reads be used or this sample will be discarded?

Thanks

xy-repo / q2-repeat-rarefy Goto Github PK

q2-repeat-rarefy's Introduction

q2-repeat-rarefy: QIIME2 plugin for generating the average rarefied table for library size normalization using repeated rarefaction

Installing

Uninstalling

Using

Citing

q2-repeat-rarefy's People

Contributors

Stargazers

Watchers

q2-repeat-rarefy's Issues

I think doing repeated rarefaction is statistically incorrect

¿How to cite this software?

Repeat rarefaction using a phyloseq object in R

How this script deal with sample depth lower than the setting.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent