Giter VIP home page Giter VIP logo

nf-core / ampliseq Goto Github PK

View Code? Open in Web Editor NEW
157.0 157.0 105.0 15.23 MB

Amplicon sequencing analysis workflow using DADA2 and QIIME2

Home Page: https://nf-co.re/ampliseq

License: MIT License

HTML 0.49% R 3.94% Python 5.13% Nextflow 88.75% Shell 1.22% Groovy 0.27% CSS 0.20%
16s 18s amplicon-sequencing edna illumina iontorrent its metabarcoding metagenomics microbiome nextflow nf-core pacbio pipeline qiime2 rrna taxonomic-classification taxonomic-profiling workflow

ampliseq's People

Contributors

a4000 avatar apeltzer avatar asafpr avatar colindaven avatar d4straub avatar danclaytondev avatar danilodileo avatar dariader avatar diegobrambilla avatar drpatelh avatar emnilsson avatar erikrikarddaniel avatar ewels avatar ggabernet avatar johnne avatar jtangrot avatar kevinmenden avatar lokeshbio avatar matthewjm96 avatar maxulysse avatar nf-core-bot avatar philpalmer avatar sateeshperi avatar sminot avatar tillenglert avatar vaulot avatar vsmalladi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ampliseq's Issues

Add demultiplexing step

Hi,
A very helpful feature to add would be the demultiplexing of the reads as an optional step. This function has already been developed on QIIME2 and, as such, it should be possible to add it to rrna-ampliseq pipeline.

Add PICRUSt2 analysis

PICRUSt2 pipeline to get EC, KO, and MetaCyc pathway predictions based on 16S data

Requirements:
- qiime2 version needs upgrade to >2018.8
- container has to include PICRUSt2 for QIIME2

Edit: Use Picrust2 outside of QIIME2, because now with DSL2 the pipeline uses biocontainers and the QIIME2 biocontainer does not contain picrust by default. Also, using Picrust2 independently from QIIME2 allows to skip QIIME2 and use DADA2 output.

Provide list with citations in docs/

Make a document that lists all citations that should be acknowledged when running the pipeline.
Such as FastQC, MultiQC, DADA2, QIIME2, ANCOM, q2-modules that were involved, ...

Enhancement: add quality control options

Hi there,

Here's another new feature request: adding quality control options.

As part of the quality control, negative (blank extraction/library) and positive (mock) controls are often processed together with the samples. Therefore, allowing input of the mock composition will help to evaluate the reliability of the whole pipeline (wet and dry lab), which can be implemented by the q2-quality-control plugin. For the use of negative controls, the feature table with taxonomy can be generated and exported. Users will then need to supply a tsv file containing the featureIDs of contaminant sequences for the filtering in the next flow pipeline. Alternatively, users can provide a "Sample_DNA_concentraion.tsv" file, which can be used to identify the contaminant sequences by tools like the decontam package. The sample DNA is ideally measured via qPCR using universal primers targetting 16s rRNA gene, which provides accurate quantification of bacterial DNA from host-associated samples.

Cheers,
Yanxian

unzip: cannot find zipfile directory in one of Silva_132_release.zip

Hi, I went into an issue as shown below. I am running the docker standard profile. Any suggestions on solving this issue?

[2d/5ce92b] Submitted process > make_SILVA_132_16S_classifier (1)
[ac/9130c2] Submitted process > metadata_category_all (1)
ERROR ~ Error executing process > 'make_SILVA_132_16S_classifier (1)'

Caused by:
Process make_SILVA_132_16S_classifier (1) terminated with an error exit status (9)

Command executed:

unzip -qq Silva_132_release.zip

      fasta="SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna"
      taxonomy="SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt"

      if [ "false" = "true" ]; then
	    sed 's/#//g' $taxonomy >taxonomy-99_removeHash.txt
	    taxonomy="taxonomy-99_removeHash.txt"
	    echo "

######## WARNING! The taxonomy file was altered by removing all hash signs!"
fi

    ### Import
    qiime tools import --type 'FeatureData[Sequence]' 		--input-path $fasta 		--output-path ref-seq-99.qza
    qiime tools import --type 'FeatureData[Taxonomy]' 		--source-format HeaderlessTSVTaxonomyFormat 		--input-path $taxonomy 		--output-path ref-taxonomy-99.qza

    #Extract sequences based on primers
    qiime feature-classifier extract-reads 		--i-sequences ref-seq-99.qza 		--p-f-primer ACTCCTACGGGAGGCAGCA 		--p-r-primer GGACTACHVGGGTWTCTAAT 		--o-reads ACTCCTACGGGAGGCAGCA-GGACTACHVGGGTWTCTAAT-99-ref-seq.qza         --quiet

    #Train classifier
    qiime feature-classifier fit-classifier-naive-bayes 		--i-reference-reads ACTCCTACGGGAGGCAGCA-GGACTACHVGGGTWTCTAAT-99-ref-seq.qza 		--i-reference-taxonomy ref-taxonomy-99.qza 		--o-classifier ACTCCTACGGGAGGCAGCA-GGACTACHVGGGTWTCTAAT-99-classifier.qza         --quiet

Command exit status:
9

Command output:
(empty)

Command error:
[Silva_132_release.zip]
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of Silva_132_release.zip or
Silva_132_release.zip.zip, and cannot find Silva_132_release.zip.ZIP, period.

Report stats after taxa filtering (commit ee4425b)

Run count_table_filter_stats.py in main.nf after process filter_taxa to report how many counts were filtered. Current output is a table to stdout, that might need improvement for larger experiments.

change params.metadata to optional parameter because it isnt always required

change params.metadata as not strictly required, because:

  • params.metadata is "only" required for processes:
    barplot
    alpha_rarefaction
    diversity_core
    alpha_diversity
    metadata_category_all
    metadata_category_pairwise
    beta_diversity
    beta_diversity_ordination
    ancom

  • so in turn it isn't required if --untilQ2import or --onlyDenoising or if all of the above are not executed.

Read Input Parameter as Folder

params.reads should only specify the folder, but only files like *_L001_R{1,2}_001.fastq.gz are chosen (QIIME2 PE input requires *_L001_R{1,2}_001.fastq.gz format, so this is required anyway)

Bonus tasks (later make individual issues out of this ticket...)

  • Bonus: multiple sequencing runs (DOESNT WORK!)
  • Analysis of multiple sequencing runs:
    • input are multiple folders, one per sequencing run (data folder with multiple subfolders (a,b,c) each containing sequencing read files and a metadatafile with comlumn new_name)
    • params.metadata still required, contains merged metadata for all samples with ID column containing new_name values of all metadata files in subfolders
    • trimming on each folder
    • qiime_import on each trimming folder
    • dada_multi on each qiime_import folder
    • mergeDADA to combine all sequencing runs, than continue as usual.
  • Dream: final report
    • make some sort of final report like extended multiqc report?
    • interesting output (all in params.outdir and multiqc/cutadapt report (%pairs retained) could get a link/representation in final report?
    • .html reports (e.g. from alpha_diversity, beta_diversity, beta_diversity_ordination, ancom) could get a link/representation in final report?

diversity related processes only run for one channel element

Diversity related processes produce results only for the first of several files in the input channel "qiime_diversity_core_for_*", because there is only one element in the second input channel "ch_metadata_for_*".
This is true for the following processes:
alpha_diversity
beta_diversity
beta_diversity_ordination

Enhancement: more quality control options

Hi Alex,

Thank you for developing the next flow pipeline to do fully reproducible analysis of 16S rRNA amplicon data.

I have some new feature requests:

  1. Enable filtering singletons and low prevalent features (only present in one sample) from the feature table. If samples were denoised on a per sample basis, filtering singletons is probably a good option as explained by the DADA2 developers here. Filtering low-prevalent features can also further reduce false sequence variants as suggested in the QIIME2 forum.

  2. Add quality control options. As part of the quality control, negative (blank extraction/library) and positive (mock) controls are often processed together with the samples. Therefore, allowing input of the mock composition will help to evaluate the reliability of the whole pipeline, which can be implemented by the q2-quality-control plugin. For the negative control, the feature table with taxonomy can be generated and exported. Users will then need to supply a tsv file containing the featureIDs of contaminant sequences for the filtering in the next flow pipeline. Alternatively, users can provide a "Sample_DNA_concentraion.tsv" file, which can be used to identify the contaminant sequences by tools like the decontam package.

Regards,
Yanxian

Keep certain key .qza files

always publish the following files in the result folder:
table.qza
rep-seqs.qza
taxonomy.qza
rooted-tree.qza

Set up NF-Core Syncing

As this was just created using nf-core create, we shouldn't have bigger issues in setting things up properly here!

Write Documentation

We should document all parts of the pipeline and some of the reasoning behind it in markdown, accompanying the usage.
Also a detailed explanation what which reports mean in general should follow too.

Read "Manifest" file format?

Hi,
There is another way to import single- and paired-end demultiplexed sequences in QIIME2 aside from the Casava format that is not picky with names. The "Manifest" file format (https://docs.qiime2.org/2018.11/tutorials/importing/?highlight=manifest#fastq-manifest-formats) is a text file that does not pose limitations to file naming. It is up to users to prepare it but I think that with some coding the preparation of manifest files can be automatized and thus can enhance the flexibility of the pipeline in accepting input files. Another way could also be that the pipeline would accept both casava file formats and manifest files manually created by users. This way, downstream analyses like taxonomy or alpha diversity would benefit from having clearer ID names.

Pipeline error

Hi,

I tried to run the nextflow pipeline for a small dataset but encountered an error.

The commands I used:

nextflow run nf-core/rrna-ampliseq \
  -profile standard,docker \
  -name "test1" \
  -r 1.0.0 \
  --reads '/home/nutrition_group/desktop/data/Yanxian/misc/beta-conglycinin/16s/casava-18-paired-end-demultiplexed' \
  --untilQ2import  \
  --Q2imported  \
  --FW_primer GTGCCAGCMGCCGCGGTAA \
  --RV_primer GGACTACHVGGGTWTCTAAT \
  --trunclenf 239 \
  --trunclenr 230 \
  --retain_untrimmed \
  --metadata "/home/nutrition_group/desktop/data/Yanxian/misc/beta-conglycinin/16s/metadata.tsv"\
  --metadata_category "Diet,Compartment" \
  --exclude_taxa "mitochondria,chloroplast" \
  --outdir "/home/nutrition_group/desktop/data/Yanxian/misc/beta-conglycinin/16s/nextflow/" \
  --email "[email protected]" \
  --max_memory '16.GB' \
  --max_cpus 12 

The error message:

ERROR ~ No signature of method: static nextflow.Channel.fromFile() is applicable for argument types: (org.codehaus.groovy.runtime.GStringImpl) values: [true]
Possible solutions: from([Ljava.lang.Object;), from(java.util.Collection), fromPath(java.lang.Object)

 -- Check script 'main.nf' at line: 129 or see '.nextflow.log' file for more details

My java version:

java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

Regards,
Yanxian

Sample Name issues for parsing reads in qiimi

seems a reads name parsing problem in my situation
image
My command is

nextflow run ampliseq \
 --reads "/data3/zqf/16S_Anaysis/rawdata/28892494"  \
 --FW_primer GTGYCAGCMGCCGCGGTAA --RV_primer GGACTACNVGGGTWTCTAAT \
 --metadata "/data3/zqf/16S_Anaysis/rawdata/Metadata.tsv"\
 --outdir /data3/zqf/16S_Anaysis/results/28892494/ \
 -profile docker --genomes greengenes -resume

my reads file is like 2095566_L001_R1_001.fastq.gz ,
dose sample name need a _ inside ?

Process "trimming" doesnt provide required input for process "qiime_import"

Process "qiime_import" expects as "--input-path $trimmed" a path/to/folder containing all files with trimmed reads. The files with trimmed reads need to follow the naming scheme "*_L001_R{1,2}_001.fastq.gz".

Problems - process "trimming":

  • outputs files with the naming scheme "*_L001_R{1,2}_001.fastq.gz.trimmed" instead of "*_L001_R{1,2}_001.fastq.gz"
  • outputs files are not collected in a folder that is accessible with the process ""qiime_import" parameter "--input-path $trimmed"
  • publishing all trimmed files would create significant data overhead

Problems - process "qiime_import":

  • "--input-path $trimmed" is a channel containing all files from process "trimming" instead of a folder containing all these files

Support for analysing multiple sequencing runs

Currently, only data from one sequencing run can be analyzed comfortably.

Starting point:

  • input are multiple folders, one per sequencing run, specified as --reads "folder1,folder2,..,folderN"
  • each input folder is containing sequencing read files of samples name such as "sample1, ... , sampleN"
  • params.metadata points to metadata for all samples of all sequencing runs with sample IDs such as "folder1-sample1"

Idea:

  • trimming and QC on all files
  • qiime_import on each input folder
  • dada_multi on each qiime_import folder
  • sample IDs have to be renamed to e.g. "folder1-sample1" to avoid overlap
  • mergeDADA to combine all sequencing runs, than continue as usual.

This is part of #13.

Problem running offline

I'm running the pipeline in an offline cluster. I downloaded the repository and the singularity image.
I initialize the script with: export NXF_OFFLINE='TRUE'

Running as
nextflow run path/to/ampliseq -with-singularity "path/to/img"

I got this warning:

WARN: Unable to stage foreign file: https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip (try 1) -- Cause: Connection timed out (Connection timed out)

Can i predownload the archive and place it somewhere?

Temp File Handling

all files in params.outdir are valuable output, all files in params.temp_dir are temporary files that can be hidden, only needed when resuming e.g. with --Q2imported

Enhancement: allow more feature table filtering options

Hi there,

I have a new feature request: enable filtering singletons and low prevalent features (only present in one sample) from the feature table.

According to the DADA2 developers, if samples were denoised on a per sample basis (like in QIIME2-2018.11), filtering singletons is probably a good option as these singletons are likely real biological sequence variants as they're artifacts. However, if the samples were to be denoised using the pooling or pseudo-pooling model in the future QIIME2 releases, filtering singletons would be invalid. Relevant discussions can be found here.

Filtering low-prevalent features can further reduce false sequence variants as suggested in the QIIME2 forum.

Cheers,
Yanxian

Dependency hell...

Had to polish the environmen.yaml quite a bunch of times unfortunately:

  • ncurses is requiring 5.9 but doesn#t specify directly from where, prefixing with conda-forge::ncurses=5.9 solved that issue

Similar things happen then when running qiime_import:

ampliseq-1.0dev/lib/python3.5/site-packages/matplotlib/backends/qt_editor/figureoptions.py", line 20, in <module>
      import matplotlib.backends.qt_editor.formlayout as formlayout
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/matplotlib/backends/qt_editor/formlayout.py", line 54, in <module>
      from matplotlib.backends.qt_compat import QtGui, QtWidgets, QtCore
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/matplotlib/backends/qt_compat.py", line 140, in <module>
      from PyQt5 import QtCore, QtGui, QtWidgets
  ImportError: libGL.so.1: cannot open shared object file: No such file or directory
  
  An unexpected error has occurred:
  
    libGL.so.1: cannot open shared object file: No such file or directory
  
  See above for debug info.

Work dir:
  /home/alex/IDEA/nf-core/rrna-ampliseq/work/be/06192acf88d8197b617ef9e3f8d064

Apparently the qt library installed doesn't install the required libGL.so.1 shared objects.

No more space error on classifier

Current path   : /home/alex/IDEA/nf-core/rrna-ampliseq
Script dir     : /home/alex/IDEA/nf-core/rrna-ampliseq
Config Profile : test,docker
=========================================
[warm up] executor > local
[8f/1bdbbf] Cached process > get_software_versions
[23/9a0ff7] Cached process > output_documentation
[08/bc4aca] Cached process > metadata_category_all (1)
[f1/6e41af] Cached process > metadata_category_pairwise (1)
[e0/1388c4] Cached process > fastqc (1a_S103)
[14/1981e3] Cached process > trimming (1a_S103)
[3e/3d4ff6] Cached process > trimming (2a_S115)
[f2/c6524d] Cached process > fastqc (2a_S115)
[40/c66b2f] Cached process > trimming (1_S103)
[7f/99415d] Cached process > fastqc (1_S103)
[46/12c4bd] Cached process > fastqc (2_S115)
[1a/047962] Cached process > trimming (2_S115)
[a0/8ebb9c] Cached process > qiime_import
[75/0fc607] Cached process > qiime_demux_visualize
[12/3e642e] Cached process > multiqc
[0a/954abc] Cached process > dada_trunc_parameter
[ec/712c2c] Cached process > dada_single
[3f/19e7c9] Submitted process > classifier (1)
ERROR ~ Error executing process > 'classifier (1)'

Caused by:
  Process `classifier (1)` terminated with an error exit status (1)

Command executed:

  qiime feature-classifier classify-sklearn  	--i-classifier GTGYCAGCMGCCGCGGTAA-GGACTACNVGGGTWTCTAAT-classifier.qza  	--p-n-jobs "-1"  	--i-reads rep-seqs.qza  	--o-classification taxonomy.qza  	--verbose
  
  qiime metadata tabulate  	--m-input-file taxonomy.qza  	--o-visualization taxonomy.qzv  	--verbose
  
  #produce "taxonomy/taxonomy.tsv"
  qiime tools export taxonomy.qza  	--output-dir taxonomy
  
  qiime tools export taxonomy.qzv  	--output-dir taxonomy

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Traceback (most recent call last):
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
      results = action(**arguments)
    File "<decorator-gen-292>", line 2, in classify_sklearn
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 232, in bound_callable
      output_types, provenance)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 367, in _callable_executor_
      output_views = self._callable(**view_args)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 215, in classify_sklearn
      confidence=confidence)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 45, in predict
      for chunk in _chunks(reads, chunk_size)) for m in c)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 789, in __call__
      self.retrieve()
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 699, in retrieve
      self._output.extend(job.get(timeout=self.timeout))
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/multiprocessing/pool.py", line 644, in get
      raise self._value
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/multiprocessing/pool.py", line 424, in _handle_tasks
      put(task)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/pool.py", line 371, in send
      CustomizablePickler(buffer, self._reducers).dump(obj)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/pool.py", line 240, in __call__
      for dumped_filename in dump(a, filename):
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 484, in dump
      NumpyPickler(f, protocol=protocol).dump(value)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/pickle.py", line 408, in dump
      self.save(obj)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 278, in save
      wrapper.write_array(obj, self)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 93, in write_array
      pickler.file_handle.write(chunk.tostring('C'))
  OSError: [Errno 28] No space left on device
  
  Plugin error from feature-classifier:
  
    [Errno 28] No space left on device
  
  See above for debug info.

Work dir:
  /home/alex/IDEA/nf-core/rrna-ampliseq/work/3f/19e7c9e31a840d983cf984979ce1bc

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
Execution cancelled -- Finishing pending tasks before exit
[nf-core/rrna-ampliseq] Pipeline Complete

Remove SE mode for now

also, all --singleEnd options would not work at the moment. qiime import is expecting paired end, we would need to implement single end processing in several steps.

Container for entire pipeline

one container for whole pipeline (remove lines such as "singularity exec ${params.qiimeimage} ", change "~/PROGRAMS/FastQC/fastqc", adjust params.qiimeimage = "$baseDir/qiime2_2018.6.simg")

Export dada2 report

Dada2 report might be valuable for trouble shooting. This report includes information such as reaching convergence when calculating the run specific sequencing error model.

publish demux.qza when --untilQ2import

change

process qiime_import {
        publishDir "${params.outdir}/demux", mode: 'copy', 
saveAs: {params.keepIntermediates ? filename : null}

to

	process qiime_import {
        publishDir "${params.outdir}/demux", mode: 'copy', 
saveAs: {params.untilQ2import ? filename : null}

related to #55

Taxonomy classification raises an ValueError for some datasets. This is a known but unsolved bug in QIIME2.

There is an value error when specific sequences are in a sample related to specific taxonomies of the classifier:
ValueError: CategoricalMetadataColumn does not support strings with leading or trailing whitespace characters:

As long as this issue isnt solved in QIIME2, a way to analyze these data sets anyway will be implemented.

The idea is to modify the file with taxonomy strings, most likely hash tags in this file are causing this issue.

Replace Bash Loops

bash for-loops seem inefficient: RelativeAbundanceReducedTaxa, alpha_diversity, beta_diversity, beta_diversity_ordination, ancom

Missing "unzip" in the silva classifier step

Process make_SILVA_132_16S_classifier slipped our test runs.
Test command for fast processing e.g.:
nextflow run rrna-ampliseq -profile test,singularity --classifier false --dereplication 90

from #33

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.