nf-core / ampliseq Goto Github PK

View Code? Open in Web Editor NEW

157.0 145.0 105.0 15.23 MB

Amplicon sequencing analysis workflow using DADA2 and QIIME2

Home Page: https://nf-co.re/ampliseq

License: MIT License

HTML 0.49% R 3.94% Python 5.13% Nextflow 88.75% Shell 1.22% Groovy 0.27% CSS 0.20%

amplicon-sequencing 16s nf-core nextflow pipeline workflow metagenomics rrna 18s its

ampliseq's Issues

No more space error on classifier

Current path   : /home/alex/IDEA/nf-core/rrna-ampliseq
Script dir     : /home/alex/IDEA/nf-core/rrna-ampliseq
Config Profile : test,docker
=========================================
[warm up] executor > local
[8f/1bdbbf] Cached process > get_software_versions
[23/9a0ff7] Cached process > output_documentation
[08/bc4aca] Cached process > metadata_category_all (1)
[f1/6e41af] Cached process > metadata_category_pairwise (1)
[e0/1388c4] Cached process > fastqc (1a_S103)
[14/1981e3] Cached process > trimming (1a_S103)
[3e/3d4ff6] Cached process > trimming (2a_S115)
[f2/c6524d] Cached process > fastqc (2a_S115)
[40/c66b2f] Cached process > trimming (1_S103)
[7f/99415d] Cached process > fastqc (1_S103)
[46/12c4bd] Cached process > fastqc (2_S115)
[1a/047962] Cached process > trimming (2_S115)
[a0/8ebb9c] Cached process > qiime_import
[75/0fc607] Cached process > qiime_demux_visualize
[12/3e642e] Cached process > multiqc
[0a/954abc] Cached process > dada_trunc_parameter
[ec/712c2c] Cached process > dada_single
[3f/19e7c9] Submitted process > classifier (1)
ERROR ~ Error executing process > 'classifier (1)'

Caused by:
  Process `classifier (1)` terminated with an error exit status (1)

Command executed:

  qiime feature-classifier classify-sklearn  	--i-classifier GTGYCAGCMGCCGCGGTAA-GGACTACNVGGGTWTCTAAT-classifier.qza  	--p-n-jobs "-1"  	--i-reads rep-seqs.qza  	--o-classification taxonomy.qza  	--verbose
  
  qiime metadata tabulate  	--m-input-file taxonomy.qza  	--o-visualization taxonomy.qzv  	--verbose
  
  #produce "taxonomy/taxonomy.tsv"
  qiime tools export taxonomy.qza  	--output-dir taxonomy
  
  qiime tools export taxonomy.qzv  	--output-dir taxonomy

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Traceback (most recent call last):
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
      results = action(**arguments)
    File "<decorator-gen-292>", line 2, in classify_sklearn
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 232, in bound_callable
      output_types, provenance)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 367, in _callable_executor_
      output_views = self._callable(**view_args)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 215, in classify_sklearn
      confidence=confidence)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 45, in predict
      for chunk in _chunks(reads, chunk_size)) for m in c)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 789, in __call__
      self.retrieve()
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 699, in retrieve
      self._output.extend(job.get(timeout=self.timeout))
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/multiprocessing/pool.py", line 644, in get
      raise self._value
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/multiprocessing/pool.py", line 424, in _handle_tasks
      put(task)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/pool.py", line 371, in send
      CustomizablePickler(buffer, self._reducers).dump(obj)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/pool.py", line 240, in __call__
      for dumped_filename in dump(a, filename):
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 484, in dump
      NumpyPickler(f, protocol=protocol).dump(value)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/pickle.py", line 408, in dump
      self.save(obj)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 278, in save
      wrapper.write_array(obj, self)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 93, in write_array
      pickler.file_handle.write(chunk.tostring('C'))
  OSError: [Errno 28] No space left on device
  
  Plugin error from feature-classifier:
  
    [Errno 28] No space left on device
  
  See above for debug info.

Work dir:
  /home/alex/IDEA/nf-core/rrna-ampliseq/work/3f/19e7c9e31a840d983cf984979ce1bc

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
Execution cancelled -- Finishing pending tasks before exit
[nf-core/rrna-ampliseq] Pipeline Complete

Valide Input Files for metadata / Q2 Imported Files

validate input files for params.Q2imported params.metadata

replace bash code in processes alpha_rarefaction and diversity_core

use bin/count_table_minmax_reads.py to replace bash code in processes alpha_rarefaction and diversity_core

Support for analysing multiple sequencing runs

Currently, only data from one sequencing run can be analyzed comfortably.

Starting point:

input are multiple folders, one per sequencing run, specified as --reads "folder1,folder2,..,folderN"
each input folder is containing sequencing read files of samples name such as "sample1, ... , sampleN"
params.metadata points to metadata for all samples of all sequencing runs with sample IDs such as "folder1-sample1"

Idea:

trimming and QC on all files
qiime_import on each input folder
dada_multi on each qiime_import folder
sample IDs have to be renamed to e.g. "folder1-sample1" to avoid overlap
mergeDADA to combine all sequencing runs, than continue as usual.

This is part of #13.

Set up NF-Core Syncing

As this was just created using nf-core create, we shouldn't have bigger issues in setting things up properly here!

Offscreen renderer backend in Singularity

Singularity transparently mounts required directories into the container using features such as overlayfs2.
This is nice when it comes down to temporary files, but bad as we require to configure the backend of matplotlib to "Agg" for not using QT5 in Qiime2 (see #25 for details why) , this should be set a bit more appropriately 👍

https://matplotlib.org/users/customizing.html#the-matplotlibrc-file

Temp File Handling

all files in params.outdir are valuable output, all files in params.temp_dir are temporary files that can be hidden, only needed when resuming e.g. with --Q2imported

Read Input Parameter as Folder

params.reads should only specify the folder, but only files like *_L001_R{1,2}_001.fastq.gz are chosen (QIIME2 PE input requires *_L001_R{1,2}_001.fastq.gz format, so this is required anyway)

Report error when dada output table has count 0 for a sample

When choosing too high or too low dada2 truncation values (manually or automatically), all reads of a sample can get lost. This results in a cryptic error message, better report that properly and exit.

fastqc doesnt work

fastqc doesnt work, see attachemnt
fastqc_command.log

Enhancement: allow more feature table filtering options

Hi there,

I have a new feature request: enable filtering singletons and low prevalent features (only present in one sample) from the feature table.

According to the DADA2 developers, if samples were denoised on a per sample basis (like in QIIME2-2018.11), filtering singletons is probably a good option as these singletons are likely real biological sequence variants as they're artifacts. However, if the samples were to be denoised using the pooling or pseudo-pooling model in the future QIIME2 releases, filtering singletons would be invalid. Relevant discussions can be found here.

Filtering low-prevalent features can further reduce false sequence variants as suggested in the QIIME2 forum.

Cheers,
Yanxian

Issue with TMP folder inside container

https://forum.qiime2.org/t/plugin-error-from-feature-classifier-no-space-left-on-device/3719/3

Keep certain key .qza files

always publish the following files in the result folder:
table.qza
rep-seqs.qza
taxonomy.qza
rooted-tree.qza

Pipeline error

Hi,

I tried to run the nextflow pipeline for a small dataset but encountered an error.

The commands I used:

nextflow run nf-core/rrna-ampliseq \
  -profile standard,docker \
  -name "test1" \
  -r 1.0.0 \
  --reads '/home/nutrition_group/desktop/data/Yanxian/misc/beta-conglycinin/16s/casava-18-paired-end-demultiplexed' \
  --untilQ2import  \
  --Q2imported  \
  --FW_primer GTGCCAGCMGCCGCGGTAA \
  --RV_primer GGACTACHVGGGTWTCTAAT \
  --trunclenf 239 \
  --trunclenr 230 \
  --retain_untrimmed \
  --metadata "/home/nutrition_group/desktop/data/Yanxian/misc/beta-conglycinin/16s/metadata.tsv"\
  --metadata_category "Diet,Compartment" \
  --exclude_taxa "mitochondria,chloroplast" \
  --outdir "/home/nutrition_group/desktop/data/Yanxian/misc/beta-conglycinin/16s/nextflow/" \
  --email "[email protected]" \
  --max_memory '16.GB' \
  --max_cpus 12

The error message:

ERROR ~ No signature of method: static nextflow.Channel.fromFile() is applicable for argument types: (org.codehaus.groovy.runtime.GStringImpl) values: [true]
Possible solutions: from([Ljava.lang.Object;), from(java.util.Collection), fromPath(java.lang.Object)

 -- Check script 'main.nf' at line: 129 or see '.nextflow.log' file for more details

My java version:

java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

Regards,
Yanxian

Helper Processes might not need separate processes

better solution for small helper processes: qiime_existing_demux, use_existing_classifier, dada_trunc_parameter (second script), skip_filter_taxa, metadata_category_all (second script)

Consider rename

We should probably get rid of the rrna- prefix and just do ampliseq.

https://nf-co.re/guidelines#workflow-name

Bonus tasks (later make individual issues out of this ticket...)

Bonus: multiple sequencing runs (DOESNT WORK!)
Analysis of multiple sequencing runs:
- input are multiple folders, one per sequencing run (data folder with multiple subfolders (a,b,c) each containing sequencing read files and a metadatafile with comlumn new_name)
- params.metadata still required, contains merged metadata for all samples with ID column containing new_name values of all metadata files in subfolders
- trimming on each folder
- qiime_import on each trimming folder
- dada_multi on each qiime_import folder
- mergeDADA to combine all sequencing runs, than continue as usual.
Dream: final report
- make some sort of final report like extended multiqc report?
- interesting output (all in params.outdir and multiqc/cutadapt report (%pairs retained) could get a link/representation in final report?
- .html reports (e.g. from alpha_diversity, beta_diversity, beta_diversity_ordination, ancom) could get a link/representation in final report?

Process "trimming" doesnt provide required input for process "qiime_import"

Process "qiime_import" expects as "--input-path $trimmed" a path/to/folder containing all files with trimmed reads. The files with trimmed reads need to follow the naming scheme "*_L001_R{1,2}_001.fastq.gz".

Problems - process "trimming":

outputs files with the naming scheme "*_L001_R{1,2}_001.fastq.gz.trimmed" instead of "*_L001_R{1,2}_001.fastq.gz"
outputs files are not collected in a folder that is accessible with the process ""qiime_import" parameter "--input-path $trimmed"
publishing all trimmed files would create significant data overhead

Problems - process "qiime_import":

"--input-path $trimmed" is a channel containing all files from process "trimming" instead of a folder containing all these files

Report stats after taxa filtering (commit ee4425b)

Run count_table_filter_stats.py in main.nf after process filter_taxa to report how many counts were filtered. Current output is a table to stdout, that might need improvement for larger experiments.

Replace Bash Loops

bash for-loops seem inefficient: RelativeAbundanceReducedTaxa, alpha_diversity, beta_diversity, beta_diversity_ordination, ancom

Enhancement: add quality control options

Hi there,

Here's another new feature request: adding quality control options.

As part of the quality control, negative (blank extraction/library) and positive (mock) controls are often processed together with the samples. Therefore, allowing input of the mock composition will help to evaluate the reliability of the whole pipeline (wet and dry lab), which can be implemented by the q2-quality-control plugin. For the use of negative controls, the feature table with taxonomy can be generated and exported. Users will then need to supply a tsv file containing the featureIDs of contaminant sequences for the filtering in the next flow pipeline. Alternatively, users can provide a "Sample_DNA_concentraion.tsv" file, which can be used to identify the contaminant sequences by tools like the decontam package. The sample DNA is ideally measured via qPCR using universal primers targetting 16s rRNA gene, which provides accurate quantification of bacterial DNA from host-associated samples.

Cheers,
Yanxian

Provide list with citations in docs/

Make a document that lists all citations that should be acknowledged when running the pipeline.
Such as FastQC, MultiQC, DADA2, QIIME2, ANCOM, q2-modules that were involved, ...

LogFile with some features

make nice looking log with stuff written currently to stdout ("echo" etc.)

Container for entire pipeline

one container for whole pipeline (remove lines such as "singularity exec ${params.qiimeimage} ", change "~/PROGRAMS/FastQC/fastqc", adjust params.qiimeimage = "$baseDir/qiime2_2018.6.simg")

Enhancement: more quality control options

Hi Alex,

Thank you for developing the next flow pipeline to do fully reproducible analysis of 16S rRNA amplicon data.

I have some new feature requests:

Enable filtering singletons and low prevalent features (only present in one sample) from the feature table. If samples were denoised on a per sample basis, filtering singletons is probably a good option as explained by the DADA2 developers here. Filtering low-prevalent features can also further reduce false sequence variants as suggested in the QIIME2 forum.
Add quality control options. As part of the quality control, negative (blank extraction/library) and positive (mock) controls are often processed together with the samples. Therefore, allowing input of the mock composition will help to evaluate the reliability of the whole pipeline, which can be implemented by the q2-quality-control plugin. For the negative control, the feature table with taxonomy can be generated and exported. Users will then need to supply a tsv file containing the featureIDs of contaminant sequences for the filtering in the next flow pipeline. Alternatively, users can provide a "Sample_DNA_concentraion.tsv" file, which can be used to identify the contaminant sequences by tools like the decontam package.

Regards,
Yanxian

Problem running offline

I'm running the pipeline in an offline cluster. I downloaded the repository and the singularity image.
I initialize the script with: export NXF_OFFLINE='TRUE'

Running as
nextflow run path/to/ampliseq -with-singularity "path/to/img"

I got this warning:

WARN: Unable to stage foreign file: https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_132_release.zip (try 1) -- Cause: Connection timed out (Connection timed out)

Can i predownload the archive and place it somewhere?

Dependency hell...

Had to polish the environmen.yaml quite a bunch of times unfortunately:

ncurses is requiring 5.9 but doesn#t specify directly from where, prefixing with conda-forge::ncurses=5.9 solved that issue

Similar things happen then when running qiime_import:

ampliseq-1.0dev/lib/python3.5/site-packages/matplotlib/backends/qt_editor/figureoptions.py", line 20, in <module>
      import matplotlib.backends.qt_editor.formlayout as formlayout
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/matplotlib/backends/qt_editor/formlayout.py", line 54, in <module>
      from matplotlib.backends.qt_compat import QtGui, QtWidgets, QtCore
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/matplotlib/backends/qt_compat.py", line 140, in <module>
      from PyQt5 import QtCore, QtGui, QtWidgets
  ImportError: libGL.so.1: cannot open shared object file: No such file or directory
  
  An unexpected error has occurred:
  
    libGL.so.1: cannot open shared object file: No such file or directory
  
  See above for debug info.

Work dir:
  /home/alex/IDEA/nf-core/rrna-ampliseq/work/be/06192acf88d8197b617ef9e3f8d064

Apparently the qt library installed doesn't install the required libGL.so.1 shared objects.

change params.metadata to optional parameter because it isnt always required

change params.metadata as not strictly required, because:

params.metadata is "only" required for processes:
barplot
alpha_rarefaction
diversity_core
alpha_diversity
metadata_category_all
metadata_category_pairwise
beta_diversity
beta_diversity_ordination
ancom
so in turn it isn't required if --untilQ2import or --onlyDenoising or if all of the above are not executed.

Indicate exact regex for sequencing file names

Indicate '.+_.+_L[0-9][0-9][0-9]_R[12]_001.fastq.gz' as required format.
To prevent errors as in #56

Preferentially check that regex before even starting the analysis.

test with real data

test with real data on binac

Missing "unzip" in the silva classifier step

Process make_SILVA_132_16S_classifier slipped our test runs.
Test command for fast processing e.g.:
nextflow run rrna-ampliseq -profile test,singularity --classifier false --dereplication 90

from #33

Metadata double check path

--metadata seems to require the absolute path instead of the relative path

Nicer plots?

Should have a look at this one here:

https://github.com/bbarad/matplotlibrc

Super easy to adapt that for the pipeline - simply take the matplotlibrc there, adapt it to our needs and supplement it with ampliseq ;-)

Remove SE mode for now

also, all --singleEnd options would not work at the moment. qiime import is expecting paired end, we would need to implement single end processing in several steps.

Sanity check for input values

expand sanity check for input values, also --metadata_category could be verified: subset of metadata_category_all

Make CI Tests work

Get Testdata from daniel :-)
Upload Dataset to https://github.com/nf-core/test-datasets
Use them in the CI testing!
Polish CI tests to work for sample data

publish demux.qza when --untilQ2import

change

process qiime_import {
        publishDir "${params.outdir}/demux", mode: 'copy', 
saveAs: {params.keepIntermediates ? filename : null}

	process qiime_import {
        publishDir "${params.outdir}/demux", mode: 'copy', 
saveAs: {params.untilQ2import ? filename : null}

related to #55

unzip: cannot find zipfile directory in one of Silva_132_release.zip

Hi, I went into an issue as shown below. I am running the docker standard profile. Any suggestions on solving this issue?

[2d/5ce92b] Submitted process > make_SILVA_132_16S_classifier (1)
[ac/9130c2] Submitted process > metadata_category_all (1)
ERROR ~ Error executing process > 'make_SILVA_132_16S_classifier (1)'

Caused by:
Process make_SILVA_132_16S_classifier (1) terminated with an error exit status (9)

Command executed:

unzip -qq Silva_132_release.zip

      fasta="SILVA_132_QIIME_release/rep_set/rep_set_16S_only/99/silva_132_99_16S.fna"
      taxonomy="SILVA_132_QIIME_release/taxonomy/16S_only/99/consensus_taxonomy_7_levels.txt"

      if [ "false" = "true" ]; then
	    sed 's/#//g' $taxonomy >taxonomy-99_removeHash.txt
	    taxonomy="taxonomy-99_removeHash.txt"
	    echo "

######## WARNING! The taxonomy file was altered by removing all hash signs!"
fi

    ### Import
    qiime tools import --type 'FeatureData[Sequence]' 		--input-path $fasta 		--output-path ref-seq-99.qza
    qiime tools import --type 'FeatureData[Taxonomy]' 		--source-format HeaderlessTSVTaxonomyFormat 		--input-path $taxonomy 		--output-path ref-taxonomy-99.qza

    #Extract sequences based on primers
    qiime feature-classifier extract-reads 		--i-sequences ref-seq-99.qza 		--p-f-primer ACTCCTACGGGAGGCAGCA 		--p-r-primer GGACTACHVGGGTWTCTAAT 		--o-reads ACTCCTACGGGAGGCAGCA-GGACTACHVGGGTWTCTAAT-99-ref-seq.qza         --quiet

    #Train classifier
    qiime feature-classifier fit-classifier-naive-bayes 		--i-reference-reads ACTCCTACGGGAGGCAGCA-GGACTACHVGGGTWTCTAAT-99-ref-seq.qza 		--i-reference-taxonomy ref-taxonomy-99.qza 		--o-classifier ACTCCTACGGGAGGCAGCA-GGACTACHVGGGTWTCTAAT-99-classifier.qza         --quiet

Command exit status:
9

Command output:
(empty)

Command error:
[Silva_132_release.zip]
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of Silva_132_release.zip or
Silva_132_release.zip.zip, and cannot find Silva_132_release.zip.ZIP, period.

Fix categoryPairwise.r script

Could you have a look at that script and fix it in bin?

Compute Resources for individual processes

define compute resources for each process, also see params.tree_cores and params.diversity_cores

Write Documentation

We should document all parts of the pipeline and some of the reasoning behind it in markdown, accompanying the usage.
Also a detailed explanation what which reports mean in general should follow too.

Add support for multiple reference databases

Add support for multiple reference databases.

Infos for path to zipped files (db_zip), path to fasta and taxonomy file in zip file are in "conf/ref_databases.config"

Add PICRUSt2 analysis

PICRUSt2 pipeline to get EC, KO, and MetaCyc pathway predictions based on 16S data

~~Requirements:~~
~~- qiime2 version needs upgrade to >2018.8~~
~~- container has to include PICRUSt2 for QIIME2~~

Edit: Use Picrust2 outside of QIIME2, because now with DSL2 the pipeline uses biocontainers and the QIIME2 biocontainer does not contain picrust by default. Also, using Picrust2 independently from QIIME2 allows to skip QIIME2 and use DADA2 output.

Taxonomy classification raises an ValueError for some datasets. This is a known but unsolved bug in QIIME2.

There is an value error when specific sequences are in a sample related to specific taxonomies of the classifier:
ValueError: CategoricalMetadataColumn does not support strings with leading or trailing whitespace characters:

As long as this issue isnt solved in QIIME2, a way to analyze these data sets anyway will be implemented.

The idea is to modify the file with taxonomy strings, most likely hash tags in this file are causing this issue.

Add demultiplexing step

Hi,
A very helpful feature to add would be the demultiplexing of the reads as an optional step. This function has already been developed on QIIME2 and, as such, it should be possible to add it to rrna-ampliseq pipeline.

diversity related processes only run for one channel element

Diversity related processes produce results only for the first of several files in the input channel "qiime_diversity_core_for_*", because there is only one element in the second input channel "ch_metadata_for_*".
This is true for the following processes:
alpha_diversity
beta_diversity
beta_diversity_ordination

Export dada2 report

Dada2 report might be valuable for trouble shooting. This report includes information such as reaching convergence when calculating the run specific sequencing error model.

Sample Name issues for parsing reads in qiimi

seems a reads name parsing problem in my situation

My command is

nextflow run ampliseq \
 --reads "/data3/zqf/16S_Anaysis/rawdata/28892494"  \
 --FW_primer GTGYCAGCMGCCGCGGTAA --RV_primer GGACTACNVGGGTWTCTAAT \
 --metadata "/data3/zqf/16S_Anaysis/rawdata/Metadata.tsv"\
 --outdir /data3/zqf/16S_Anaysis/results/28892494/ \
 -profile docker --genomes greengenes -resume

my reads file is like 2095566_L001_R1_001.fastq.gz ,
dose sample name need a _ inside ?

Read "Manifest" file format?

Hi,
There is another way to import single- and paired-end demultiplexed sequences in QIIME2 aside from the Casava format that is not picky with names. The "Manifest" file format (https://docs.qiime2.org/2018.11/tutorials/importing/?highlight=manifest#fastq-manifest-formats) is a text file that does not pose limitations to file naming. It is up to users to prepare it but I think that with some coding the preparation of manifest files can be automatized and thus can enhance the flexibility of the pipeline in accepting input files. Another way could also be that the pipeline would accept both casava file formats and manifest files manually created by users. This way, downstream analyses like taxonomy or alpha diversity would benefit from having clearer ID names.

nf-core / ampliseq Goto Github PK

ampliseq's Issues

Recommend Projects

Recommend Topics

Recommend Org