Deion of the bug Phyloseq object creation will fail if any s

Thanks for promptly looking into this <a class="user-mention notranslate" data-hoverca

Phyloseq object creation will fail if any samples have all reads removed by the tax filtering step about ampliseq HOT 5 CLOSED

agrier-wcm commented on September 20, 2024

Phyloseq object creation will fail if any samples have all reads removed by the tax filtering step

from ampliseq.

Comments (5)

d4straub commented on September 20, 2024

Thanks for the report, that indeed requires fixing.

from ampliseq.

d4straub commented on September 20, 2024

I tested:
nextflow run nf-core/ampliseq -r 2.7.1 -profile test_failed,singularity --outdir result_test_failed_2-7-1 that is loosing a sample during early QC. This sample is missing in the ASV table but phyloseq works just fine.
nextflow run nf-core/ampliseq -r 2.7.1 -profile test_failed,singularity --outdir result_test_failed_2-7-1 --min_frequency 135 --exclude_taxa Moraxellaceae --diversity_rarefaction_depth 100 --skip_diversity_indices --skip_ancom to make it also loose ASV so that an additional sample will loose all counts during taxonomic filtering (QIIME2_FILTERTAXA), and consequentely only 3 samples are left in the ASV table. But the pipeline is still fine and the phyloseq object is created.

@agrier-wcm I'll need more information here, I cannot reproduce the problem.

from ampliseq.

agrier-wcm commented on September 20, 2024

Thanks for promptly looking into this @d4straub. I have a relatively small dataset that works fine with 2.6.1, but I get this error with 2.7.1:

Error executing process > 'NFCORE_AMPLISEQ:AMPLISEQ:PHYLOSEQ_WORKFLOW:PHYLOSEQ (dada2)'

Caused by:
  Process `NFCORE_AMPLISEQ:AMPLISEQ:PHYLOSEQ_WORKFLOW:PHYLOSEQ (dada2)` terminated with an error exit status (1)

Command executed:

  #!/usr/bin/env Rscript
  
  suppressPackageStartupMessages(library(phyloseq))
  
  otu_df  <- read.table("reformat_filtered-table.tsv", sep="\t", header=TRUE, row.names=1)
  tax_df  <- read.table("ASV_tax_species.silva.tsv", sep="\t", header=TRUE, row.names=1)
  otu_mat <- as.matrix(otu_df)
  tax_mat <- as.matrix(tax_df)
  
  OTU     <- otu_table(otu_mat, taxa_are_rows=TRUE)
  TAX     <- tax_table(tax_mat)
  phy_obj <- phyloseq(OTU, TAX)
  
  if (file.exists("Metadata.tsv")) {
      sam_df  <- read.table("Metadata.tsv", sep="\t", header=TRUE, row.names=1)
      SAM     <- sample_data(sam_df)
      phy_obj <- merge_phyloseq(phy_obj, SAM)
  }
  
  if (file.exists("")) {
      TREE    <- read_tree("")
      phy_obj <- merge_phyloseq(phy_obj, TREE)
  }
  
  saveRDS(phy_obj, file = paste0("dada2", "_phyloseq.rds"))
  
  # Version information
  writeLines(c("\"NFCORE_AMPLISEQ:AMPLISEQ:PHYLOSEQ_WORKFLOW:PHYLOSEQ\":",
      paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),
      paste0("    phyloseq: ", packageVersion("phyloseq"))),
      "versions.yml"
  )

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error in validObject(.Object) : invalid class “phyloseq” object: 
   Component sample names do not match.
   Try sample_names()
  Calls: merge_phyloseq ... do.call -> new -> initialize -> initialize -> validObject
  Execution halted

It does have a sample that loses all reads to the tax filtering step, so I assumed that was the problem because I thought for phyloseq object creation the set of samples in the Metadata file would have to match the set in the OTU table. Maybe there's something else going on. I'm testing a couple other small datasets now with 2.7.1 to see if I can reproduce the error with another dataset. Here is the command I was using, for reference:

nextflow run ampliseq -profile singularity --input ./SampleSheet.tsv --FW_primer GTGYCAGCMGCCGCGGTAA --RV_primer CCGYCAATTYMTTTRAGTTT --metadata ./Metadata.tsv --outdir ./results --email my.email@me --dada_ref_taxonomy silva --ignore_empty_input_files --ignore_failed_trimming --min_frequency 10 --retain_untrimmed --trunclenf 240 --trunclenr 160 --metadata_category_barplot "TumorLocation,SampleTissueType" --tax_agglom_max 7 --max_memory 32.GB

Does your test also use silva as the taxonomic database? I will have more information shortly from the tests that I am running, but I'm not exactly sure what to interrogate if the problem is not the tax filter thing. I will update this evening.

from ampliseq.

agrier-wcm commented on September 20, 2024

Well, it ended up being something stupid. In the Metadata file, some samples had a space at the end of the sample names, which they did not have in the SampleSheet. Somewhat interestingly, this did not cause any problems in 2.6.1, even for the comparisons/group assignments. Sorry to waste your time @d4straub , but thank you for looking into it. Closing.

from ampliseq.

d4straub commented on September 20, 2024

I see, happens to the best of us ;)
Thanks for returning feedback.

from ampliseq.

Phyloseq object creation will fail if any samples have all reads removed by the tax filtering step about ampliseq HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent