Posted here from another issue from <a class="user-mention notranslate" data-hovercard

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Sequence ids don't overlap at metagenome pipeline step about picrust2 HOT 11 CLOSED

picrust commented on August 22, 2024

Sequence ids don't overlap at metagenome pipeline step

from picrust2.

Comments (11)

JCSzamosi commented on August 22, 2024 1

Oh, hah, good catch! Thanks.

Seems to be working when I remove the consensus lineage column and use pre-pended strings. Sorry about the bug recurring :(

from picrust2.

gavinmdouglas commented on August 22, 2024

@itiago - This step takes in the relative abundances of OTUs or ASVs and multiplies the abundance of all predicted gene families by this relative abundance. It then outputs a table of function abundances for each sample (both stratified and unstratified by which sequence contributed that function). The abundance of marker genes is also used to normalize the abundances of the input OTUs/ASVs as well.

I believe the problem you're running into here is that the cluster ids in the mothur output file don't match the ids in the fasta file you placed into the tree. Is this correct? For instance, is the sequence "M01028_125_000000000-AN36D_1_1101_9843_5463" the name of an OTU in the mothur output file? If so I'm not sure why this error is coming up and it would be great if you could send me the input files you're trying to use privately.

Thanks,

Gavin

from picrust2.

gavinmdouglas commented on August 22, 2024

I believe this problem is due to confusion about which sequences should be added into the tree (i.e. that it should be OTU representative sequences in this case) so I'm closing it for now.

from picrust2.

itiago commented on August 22, 2024

Agree, will rerun the workflow with representatives of each otu, and rename the otus designation accordingly. Thanks for the help, A terça, 29/05/2018, 17:36, Gavin Douglas <[email protected]> escreveu:

…

I believe this problem is due to confusion about which sequences should be added into the tree (i.e. that it should be OTU representative sequences in this case) so I'm closing it for now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQAOmf6iScA_EzGuopT7c0PXbatdS8hmks5t3Xa1gaJpZM4UQUTl> .

from picrust2.

pedres commented on August 22, 2024

Hi Gavin,

I am having the same problem when running picrust2. I have installed the last version and run the full pipeline command with tutorial files without problems. However, when I tried to run the pipeline with my data set the process fails. I used dada2 to process my samples (6456 ASVs in 24 samples) and run the pipeline with the following command: picrust2_pipeline.py -s ASV_raref.fa -i ASV_raref.txt -o picrust2_out_MENCIA --threads 24 -n

The error is:
Traceback (most recent call last):
File "/home/fulgencio/miniconda3/envs/picrust2/bin/picrust2_pipeline.py", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 227, in
main()
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 220, in main
verbose=args.verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 195, in full_pipeline
verbose=verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 357, in metagenome_pipeline_steps
output_normfile=True)
File "/home/fulgencio/picrust2/picrust2/metagenome_pipeline.py", line 64, in run_metagenome_pipeline
pred_marker)
File "/home/fulgencio/picrust2/picrust2/util.py", line 300, in three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.

I think that in my case representative sequences are not a problem because I used dada2 (nor de novo OTUs). I have checked that sequence IDs in fasta and table are the same. I have attached the files I used.

Thank you very much in advance.

Manuel

fasta_and_table.zip

from picrust2.

gavinmdouglas commented on August 22, 2024

Hi Gavin,

I am having the same problem when running picrust2. I have installed the last version and run the full pipeline command with tutorial files without problems. However, when I tried to run the pipeline with my data set the process fails. I used dada2 to process my samples (6456 ASVs in 24 samples) and run the pipeline with the following command: picrust2_pipeline.py -s ASV_raref.fa -i ASV_raref.txt -o picrust2_out_MENCIA --threads 24 -n

The error is:
Traceback (most recent call last):
File "/home/fulgencio/miniconda3/envs/picrust2/bin/picrust2_pipeline.py", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 227, in
main()
File "/home/fulgencio/picrust2/scripts/picrust2_pipeline.py", line 220, in main
verbose=args.verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 195, in full_pipeline
verbose=verbose)
File "/home/fulgencio/picrust2/picrust2/pipeline.py", line 357, in metagenome_pipeline_steps
output_normfile=True)
File "/home/fulgencio/picrust2/picrust2/metagenome_pipeline.py", line 64, in run_metagenome_pipeline
pred_marker)
File "/home/fulgencio/picrust2/picrust2/util.py", line 300, in three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.

I think that in my case representative sequences are not a problem because I used dada2 (nor de novo OTUs). I have checked that sequence IDs in fasta and table are the same. I have attached the files I used.

Thank you very much in advance.

Manuel

fasta_and_table.zip

This is a cross-post of https://groups.google.com/forum/#!topic/picrust-users/HdZjZtYHRbQ and was resolved.

from picrust2.

JCSzamosi commented on August 22, 2024

I am getting this same error, but when I look at my three input files, the sequences IDs are, in fact, identical across all three files. I've attached the three files (subsetted down to their first 9 sequences, but the error still persists)

16S_head.txt
KO_head.txt
asvtab_head.txt

here.

from picrust2.

gavinmdouglas commented on August 22, 2024

Hey @JCSzamosi,

What version of PICRUSt2 are you using? I think the issue is that the ids are being interpreted as a string in one case and as integers in the other cases. This problem should be fixed in the latest release though. A quick fix should be to add a string to the beginning of each sequence id though (like "seq1" rather than "1").

from picrust2.

JCSzamosi commented on August 22, 2024

Thanks so much for the quick response!

I'm using version 2.2.0_b, which I assume is the latest since it's what I got from following the instructions here.

Prepending the sequence IDs with a string has created a new error. Log and input files attached below:

16S_head.txt
asvtab_head.txt
KO_head.txt
metagenome_pipe.log

from picrust2.

gavinmdouglas commented on August 22, 2024

Hmm that's annoying that the problem resurfaced in v2.2.0-b - I thought it was resolved.

Anyway I think the new issue is because of the "Consensus Lineage" column in your BIOM table - the script is expecting only numeric columns.

Gavin

from picrust2.

manterd commented on August 22, 2024

I too had this problem when using a mothur shared file but converting this to a biom file with mothur 'make.biom(shared=<shared_file>)' fixed the problem. Seems like the extra shared column files are the problem...

from picrust2.

Sequence ids don't overlap at metagenome pipeline step about picrust2 HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent