Giter VIP home page Giter VIP logo

Comments (14)

gavinmdouglas avatar gavinmdouglas commented on August 22, 2024

Hi there,

My guess is that you have insufficient memory to run that command (at least with that many cores). How much RAM do you have?

Thanks,

Gavin

from picrust2.

YuZhang-learner avatar YuZhang-learner commented on August 22, 2024

Hi there,

My guess is that you have insufficient memory to run that command (at least with that many cores). How much RAM do you have?

Thanks,

Gavin

image

from picrust2.

YuZhang-learner avatar YuZhang-learner commented on August 22, 2024

Hi there,

My guess is that you have insufficient memory to run that command (at least with that many cores). How much RAM do you have?

Thanks,

Gavin

If the problem is caused by insufficient memory, can I group the samples and analyze them separately? And compare their results?

from picrust2.

gavinmdouglas avatar gavinmdouglas commented on August 22, 2024

Hey @YuZhang-learner,

Hmm with that much memory you wouldn't normally run into memory limitations with PICRUSt2 unless you had an incredibly large amount of input data. How many ASVs and samples are you inputting?

You could split the input data, but unfortunately there can be some batch effects when using PICRUSt2 (e.g., the predictions can differ slightly depending on what input sequences are input together, which is due to variation at the sequence placement step). So the differences shouldn't be too big, but something to keep in mind.

All the best,

Gavin

from picrust2.

YuZhang-learner avatar YuZhang-learner commented on August 22, 2024

Thanks. Unfortunately, my ASV quantity has reached nearly 40W from Novaseq. I don't believe it either, but it's true.

(e.g., the predictions can differ slightly depending on what input sequences are input together, which is due to variation at the sequence placement step). So the differences shouldn't be too big, but something to keep in mind.

At this point, I successfully ran the following code with the all the sequence of ASV in the entire dataset.

place_seqs.py \
-s rs_rsingle_rep_seq.fna \
-o out.tre -p 20 \
--intermediate intermediate/place_seqs

 
hsp.py \
-i 16S \
-t out.tre \
-o marker_predicted_and_nsti.tsv.gz \
-p 20 \
-n


#EC
hsp.py \
-i EC \
-t out.tre \
-o EC_predicted.tsv.gz \
-p 20



# KO

hsp.py \
-i KO \
-t out.tre \
-o KO_predicted.tsv.gz \
-p 20`

But the next step
pathway_pipeline.py \ -i EC_metagenome_out/pred_metagenome_contrib.tsv.gz \ -o pathways_out \ -p 20
reported an error.

Therefore, can the batch effect be minimized by running the first few steps with the entire ASV sequence and then the subsequent steps with the grouped data separately?

from picrust2.

gavinmdouglas avatar gavinmdouglas commented on August 22, 2024

Hi there,

Aha yes, when I put this pipeline together I was envisioning much smaller datasets, so I bet it is a memory problem actually.

But yes, running the first few steps with all ASVs and then splitting the data into different samples should work fine.

from picrust2.

YuZhang-learner avatar YuZhang-learner commented on August 22, 2024

Yes, this works, but does it help narrow the batch effect?

But yes, running the first few steps with all ASVs and then splitting the data into different samples should work fine.

from picrust2.

gavinmdouglas avatar gavinmdouglas commented on August 22, 2024

Yes, there should not be any batch effect as long as the placement + prediction steps are done with all the ASVs together.

Gavin

from picrust2.

YuZhang-learner avatar YuZhang-learner commented on August 22, 2024

Excuse me, under this question, let's talk about the issue that we closed before. https://github.com/picrust/picrust2/issues/258

Hi@gavinmdouglas, Thank you for your patience. I seem to understand what you mean. Is it as follows ?

place_seqs.py -s all_tax_rep_seqs.fna -o out.tre -p 20 --intermediate intermediate/place_seqs

hsp.py -i 16S -t out.tre -o marker_predicted_and_nsti.tsv.gz -p 20 -n

This two step run my whole dataset sequence "all_tax_rep_seqs.fna".

metagenome_pipeline.py -i rare_otu.biom -m marker_predicted_and_nsti.tsv.gz -f EC_predicted.tsv.gz -o EC_metagenome_out --strat_out

However, this step starts with the OTU table that I get part of the data set, for example "rare_otu.biom". When I have run all the following code, I will run another data set "abundant_otu.biom" from this step?

If I get PICRUST results for rare, abundant, and all taxa using the method we discussed above, how do I compare? Can the results of rare and abundant taxa be translated into relative abundance? For example, I want to calculate the contribution of rare taxa to a particular metabolic function, that is, the ratio of rare taxa to all taxa in a given metabolic function.

from picrust2.

gavinmdouglas avatar gavinmdouglas commented on August 22, 2024

Hey there,

So it sounds like you have already classified ASVs into rare and common, based on the relative abundances independent on PICRUSt2, correct?

If you are interested in comparing the predicted genome content of these different ASV sets then I would restrict the analysis to the genome prediction tables output at the hsp.py step. You could run Fisher's exact tests for instance for each function to see if there are different proportions of taxa that do or do not encode each function within each ASV set. You could also just get the proportion of taxa that encode a given function that you classified as rare.

Cheers,

Gavin

from picrust2.

YuZhang-learner avatar YuZhang-learner commented on August 22, 2024

So it sounds like you have already classified ASVs into rare and common, based on the relative abundances independent on PICRUSt2, correct?

Yes, I run the place_seq and prediction step on the seqs of all the ASV. Then I already classified ASVs into rare and abunsant ASV, and obtained three table for all, rare, and abundant ASV, respectively.
I would like to ask if the steps of annotating the three ASV tables separately and then the three results can be compared? Or whether it is possible to calculate the contribution of rare or abundant ASVs in a particular metabolic pathway to the overall ASV table.

If you are interested in comparing the predicted genome content of these different ASV sets then I would restrict the analysis to the genome prediction tables output at the hsp.py step. You could run Fisher's exact tests for instance for each function to see if there are different proportions of taxa that do or do not encode each function within each ASV set. You could also just get the proportion of taxa that encode a given function that you classified as rare.

Excuse me, can you give me an example and explain it in detail? I don't quite follow you.

from picrust2.

gavinmdouglas avatar gavinmdouglas commented on August 22, 2024

Hi again,

To clarify, what I meant was that in this case it sounds like you are interested in whether the rare vs common ASVs are enriched for any different functions. So rather than looking at the relative abundance of functions across samples it would be clearer to just direcrly compare what genes/functions are encoded in each ASV's predicted genome (which is what hsp.py outputs for the prediction steps).

So for a given function X you could create a contingency table of these four categories:

  • the number of rare ASVs that encode X
  • the number of rare ASVs that do not encode X
  • the number of common ASVs that encode X
  • the number of common ASVs that do not encode X

Hopefully that's clearer.

Cheers,

Gavin

from picrust2.

YuZhang-learner avatar YuZhang-learner commented on August 22, 2024

Hi again,

To clarify, what I meant was that in this case it sounds like you are interested in whether the rare vs common ASVs are enriched for any different functions. So rather than looking at the relative abundance of functions across samples it would be clearer to just direcrly compare what genes/functions are encoded in each ASV's predicted genome (which is what hsp.py outputs for the prediction steps).

So for a given function X you could create a contingency table of these four categories:

  • the number of rare ASVs that encode X
  • the number of rare ASVs that do not encode X
  • the number of common ASVs that encode X
  • the number of common ASVs that do not encode X

Hopefully that's clearer.

Cheers,

Gavin

Do you mean that I can compare the counts of encodind the function X between rara or common taxa directly? But I also want to compare whether there was a difference in the function of the rare taxa between different sample groups, and therefore whether it would be more reliable to translate the relative abundance ?For example,number of rare ASVs that encode X / number of all ASVs that encode X.The number of rare ASVs that encode X I got by running the annotation step separately.

from picrust2.

gavinmdouglas avatar gavinmdouglas commented on August 22, 2024

Sorry I missed your reply - I don't think I understand what you're trying to do exactly. So it sounds like you have two sets of taxa: rare and common. And you want to know whether the rare taxa are enriched for a given function compared to the common taxa?

from picrust2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.