Giter VIP home page Giter VIP logo

q2-repeat-rarefy's Introduction

q2-repeat-rarefy: QIIME2 plugin for generating the average rarefied table for library size normalization using repeated rarefaction

  • When handling a sparse dataset, I noticed that the rare taxa were easily ignored by the traditional one-shot rarefaction.
  • To deal with this problem, I proposed the "Average Rarefied Table" method and wrote a very simple plugin (reference: https://github.com/qiime2/q2-feature-table/tree/master/q2_feature_table/_normalize.py)).
  • Repeat rarefy simply runs random rarefaction N times, and computes the average count (floats are round up) of each OTU (ASV/feature) to generate the final average rarefied OTU table.
  • It proves that comparing with the one-shot rarefaction, using repeat rarefy to normalize library size can keep significantly more OTUs (unpublished results).
  • As the float average count of OTU is round up, the total OTU count of each sample may not be exactly the same.
  • This method has the potential to be an ideal alternative to the current one-shot rarefaction, as it can keep information and avoid variation of composition.
  • In addition to OTU (ASV/feature) table, the "Average Rarefied Table" method can also be extended to other profile tables (e.g., taxonomic profile table, gene profile table).

Installing

conda activate qiime2-2020.11
pip install git+https://github.com/yxia0125/q2-repeat-rarefy.git

Type "qiime repeat-rarefy" to test if the installation is successful.

Uninstalling

pip uninstall q2-repeat-rarefy

Using

qiime repeat-rarefy repeat-rarefy --i-table table.qza \
                                  --p-sampling-depth 2000 \
                                  --p-repeat-times 100 \
                                  --o-rarefied-table average_rarefied_table.qza

The above example rarefied the 'table.qza', with the sampling depth of 2000 and the repeat times of 100, to 'average_rarefied_table.qza'.
You can set the sampling depth based on your own dataset and increase repeat times to 1,000, 10,000 ...

Citing

If you are interested to use this method, please include the following citation:

Yao Xia, q2-repeat-rarefy: QIIME2 plugin for generating the average rarefied table for library size normalization using 
repeated rarefaction, (2021), GitHub repository, https://github.com/yxia0125/q2-repeat-rarefy.

q2-repeat-rarefy's People

Contributors

xy-repo avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

q2-repeat-rarefy's Issues

I think doing repeated rarefaction is statistically incorrect

Hi,
I saw your q2-repeat-rarefy qiime2 plugin and really appreciate your contribution to the microbiome community.
However, I think the idea of multiple rarefaction is incorrect from a statistical point of view:
The purpose of rarefaction is to remove the effects which are due to different read depths in the different samples. For example, lets take the situation where we have a single biological sample, and we sequence it to two depths (and assume for each depth we have 10 technical repeats): 10 repeats with 10k reads and 10 repeats with 20k reads.
If we rarify all repeats to 10k reads/repeat, and then look for difference between the repeats originating from 10k reads and originating from 20k reads, we will get no significant differences, as we would expect.
However, if we apply instead the repeat-rarefy procedure to the 20k reads repeats, and then look for difference between the repeats originating from 10k reads and originating from 20k reads, I think we may get some bacteria different between the 2 groups.
To explain why i think this will happen, lets assume we have some rare bacteria (say 100) that are in the (true) frequency of 1/10000 in the original sample. In the 10k reads/sample repeats, we expect to get approx. 50 of the rare bacteria with 1 read, and 50 with 0 reads. In the 20k reads/sample repeats, we expect to get approx. all the rare bacteria with 1 read/bacteria.
If we just rarify to 10k reads/sample, we will lose approx 50 of these rare bacteria and keep the other 50 (similar to the 10k reads/sample repeats).
However, if we do repeat-rarefy, we will get approx. 0.5 read/sample for these 100 rare bacteria. Then (if we round up), we will get 1 read/sample for the 100 rare bacteria. And therefore, it will be different compared to the 10k reads/sample repeats.

Another way to think about it is that doing infinite number of repeat-rarefy is equivalent to total-sum-scaling (i.e. inifinite repeat-rarefaction to 10k reads is similar to normalizing by dividing by the original number of reads in the sample and multiplying by 10k.

Will be happy to continue the discussion.
And please do not let this discourage you from continuing to contribute to the microbiome and qiime2 community!
Amnon

¿How to cite this software?

Hi,
I wanted to ask whether it would be possible for you to generate a doi for the software, for instance using Zenodo, https://zenodo.org, which will facilitate citing the relevant version of the software.

Many thanks in advance,

Inti Pedroso

Repeat rarefaction using a phyloseq object in R

Hi @yxia0125. I was wondering if you have a similar function in R that essentially gives out an averaged rarified table using a phyloseq object in R. I have used a couple of functions, but they have very different objective and hence a different output than what I am looking for. I am aware that the phyloseq object can be converted and imported in the qiime environment, however I am trying to avoid a lot of back and forth in my analysis pipeline. Any help is appreciated!
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.