Giter VIP home page Giter VIP logo

demultiplex's Introduction

nf-core/demultiplex

GitHub Actions CI Status GitHub Actions Linting StatusAWS CICite with Zenodo

Nextflow run with conda run with docker run with singularity Launch on Nextflow Tower nf-test

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTube

Introduction

nf-core/demultiplex is a bioinformatics pipeline used to demultiplex the raw data produced by next generation sequencing machines. The following platforms are supported:

  1. Illumina (via bcl2fastq or bclconvert)
  2. Element Biosciences (via bases2fastq)
  3. Singular Genomics (via sgdemux)
  4. FASTQ files with user supplied read structures (via fqtk)

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources.The results obtained from the full-sized test can be viewed on the nf-core website.

Pipeline summary

  1. Demultiplexing
  • bcl-convert - converting bcl files to fastq, and demultiplexing (CONDITIONAL)
  • bases2fastq - converting bases files to fastq, and demultiplexing (CONDITIONAL)
  • bcl2fastq - converting bcl files to fastq, and demultiplexing (CONDITIONAL)
  • sgdemux - demultiplexing bgzipped fastq files produced by Singular Genomics (CONDITIONAL)
  • fqtk - a toolkit for working with FASTQ files, written in Rust (CONDITIONAL)
  1. fastp - Adapter and quality trimming
  2. Falco - Raw read QC
  3. md5sum - Creates an MD5 (128-bit) checksum of every fastq.
  4. MultiQC - aggregate report, describing results of the whole pipeline

subway map

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

nextflow run nf-core/demultiplex --input samplesheet.csv --outdir <OUTDIR> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
nextflow run nf-core/demultiplex \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

The nf-core/demultiplex pipeline was written by Chelsea Sawyer from The Bioinformatics & Biostatistics Group for use at The Francis Crick Institute, London.

The pipeline was re-written in Nextflow DSL2 and is primarily maintained by Matthias De Smet(@matthdsm) from Center For Medical Genetics Ghent, Ghent University and Edmund Miller(@edmundmiller) from Element Biosciences

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #demultiplex channel (you can join with this invite).

Citations

If you use nf-core/demultiplex for your analysis, please cite it using the following doi: 10.5281/zenodo.7153103

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

demultiplex's People

Contributors

adamrtalbot avatar aratz avatar cmgg-be avatar csawye01 avatar drpatelh avatar edmundmiller avatar glichtenstein avatar kevinmenden avatar matthdsm avatar maxulysse avatar nf-core-bot avatar nh13 avatar nvnieuwk avatar robsyme avatar sam-white04 avatar sateeshperi avatar thanhleviet avatar tomkellygenetics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

demultiplex's Issues

What is the status of the test_full?

Description of feature

Hi,
Was looking into test_full profile vs test, and it lloks like the flowcell is the same.
Is there a plan for a real test_full?
Also https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/testdata/MiSeq/SampleSheet.csv doesn't seem like proper csv to me.

bases2fastq doesn't use generate_fastq_meta so fcid is null

Description of the bug

example output:

tree null/
null/
├── 20220601_PLT-03_BBS-0174_1.fastp.fastq.gz
├── 20220601_PLT-03_BBS-0174_1_fastqc.html
├── 20220601_PLT-03_BBS-0174_1_fastqc.zip
├── 20220601_PLT-03_BBS-0174_2.fastp.fastq.gz
├── 20220601_PLT-03_BBS-0174_2_fastqc.html
├── 20220601_PLT-03_BBS-0174_2_fastqc.zip
├── 20220601_PLT-03_BBS-0174.fastp.html
├── 20220601_PLT-03_BBS-0174.fastp.json
├── 20220601_PLT-03_BBS-0174.fastp.log
├── DefaultSample_R1.fastq.gz.md5
├── DefaultSample_R2.fastq.gz.md5
├── Unassigned_R1.fastq.gz.md5
└── Unassigned_R2.fastq.gz.md5```

### Command used and terminal output

_No response_

### Relevant files

_No response_

### System information

_No response_

Handle different demultiplexers in the same workflow

Description of feature

Rather than params.demultiplexer maybe we can handle this just off the sample sheet, either by name of the run or by the presence of a SampleSheet.csv or a RunManifest.csv and maybe parsing them a bit to infer the type. I'm thinking the latter is going to be the most elegant.

Test Groovy Functions

Description of feature

  • MAIN::extract_csv
  • MAIN::parse_flowcell_csv
  • BCL_DEMULTIPLEX: :generate_fastq_meta
  • BCL_DEMULTIPLEX: :readgroup_from_fastq
  • BASES_DEMULTIPLEX:: generate_fastq_meta
  • BASES_DEMULTIPLEX:: readgroup_from_fastq

Regex does not parse sample names correctly

Description of the bug

Having a SampleSheet.csv that contains Sample_IDs with characters that partly matches the [Undetermined] regex will cause those files to not be recognized as outputs of the BCLCONVERT process

Steps to reproduce:
Edit the SampleSheet.csv that comes with the test dataset by changing the Sample_ID to something like 'Sample-Unspiked'

Example SampleSheet.csv:
[Header]

[Reads]
110

[Settings]

[Data]
Sample_ID,Sample_Name,Description,Sample_Project
Sample-Unspiked,Sample-Unspiked,,

Running the nf-core/demultiplex test with this SampleSheet.csv will cause the error:
Missing output file(s) **[!Undetermined]_S*_R?_00?.fastq.gz expected by process NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:BCLCONVERT (220422_M11111_0222_000000000-K9H97.1)

Command used and terminal output

No response

Relevant files

No response

System information

No response

Fix bclconvert image

Description of the bug

Command error:
  Unable to find image 'nfcore/bclconvert:3.9.3' locally
  3.9.3: Pulling from nfcore/bclconvert
  c229119241af: Pulling fs layer
  2c12762f1e6a: Pulling fs layer
  2c12762f1e6a: Verifying Checksum
  2c12762f1e6a: Download complete
  c229119241af: Verifying Checksum
  c229119241af: Download complete
  c229119241af: Pull complete
  2c12762f1e6a: Pull complete
  Digest: sha256:abece369e059b22c415f340142e64e1adf4c90c5470136ffc3e23d6ee832ecd9
  Status: Downloaded newer image for nfcore/bclconvert:3.9.3
  Command 'ps' required by nextflow to collect task metrics cannot be found```

### Command used and terminal output

_No response_

### Relevant files

_No response_

### System information

_No response_

bcl2fastq installation error

Description of the bug

The bcl2fastq module is not correctly installed in the docker image. When we run the pipeline with any dataset, including the bcl2fast test profile, we see this warning in the .command.err file:

I/O warning : failed to load external entity "/usr/local/share/xsl/demux/GenerateReport.xsl"
  error
  xsltParseStylesheetFile : cannot parse /usr/local/share/xsl/demux/GenerateReport.xsl
  2023-02-02 18:28:14 [1743880] WARNING: Could not compute and write statistics for this conversion: libxslt failure
  2023-02-02 18:28:14 [1743880] Processing completed with 0 errors and 1 warnings.

Because of this error, bcl2fastq isn't able to generate any reports, and the Reports directory is empty. When we run the pipeline with S3 in the backend, Nextflow tries to upload the output on S3 but since the Reports directory is empty it gets confused and throws another error that breaks the pipeline.

Error executing process > 'NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:BCL2FASTQ (220422_M11111_0222_000000000-K9H97.1)'

Caused by:
  Missing output file(s) `Reports` expected by process `NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:BCL2FASTQ (220422_M11111_0222_000000000-K9H97.1)`

It looks like only the executable for bcl2fastq is extracted from the rmp, and the rest of the package is not fully installed: https://github.com/nf-core/demultiplex/tree/master/modules/nf-core/bcl2fastq and https://github.com/nf-core/modules/tree/master/modules/nf-core/bcl2fastq

Command used and terminal output

nextflow run 'https://github.com/nf-core/demultiplex' \
		 -name berserk_engelbart \
		 -params-file 'https://dev-nf-tower.tesseratx.internal/api/ephemeral/32X5Sn6bbRn7gWdllRq-0g.json' \
		 -with-tower 'https://dev-nf-tower.tesseratx.internal/api' \
		 -profile docker,test_bcl2fastq

Relevant files

No response

System information

Nextflow version: 22.10.1
Hardware: Nextflow Tower
Executor: awsbatch
Container: Docker

Check Samplesheet expects files to be remote

Description of the bug

request.urlopen(row[self._samplesheet_col]).getcode() == 200

This pattern doesn't work because it breaks for users running on local files and not downloading them.

Command used and terminal output

Using this samplesheet 

flowcell,samplesheet,lane,run_dir
miseq,https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/testdata/MiSeq/SampleSheet.csv,1,https://github.com/nf-core/test-datasets/raw/demultiplex/testdata/MiSeq/220422_M11111_0222_000000000-K9H97.tar.gz
aviti,./sim-data/RunManifest.csv,1,./sim-data/

It broke when I was trying to use local files

Relevant files

No response

System information

No response

bcl2fastq installation error

Have you checked the docs?

Description of the bug

The bcl2fastq module is not correctly installed in the docker image. When we run the demultiplex nf-core pipeline with any dataset, including the bcl2fast test profile, we see this warning in the .command.err file:

I/O warning : failed to load external entity "/usr/local/share/xsl/demux/GenerateReport.xsl"
  error
  xsltParseStylesheetFile : cannot parse /usr/local/share/xsl/demux/GenerateReport.xsl
  2023-02-02 18:28:14 [1743880] WARNING: Could not compute and write statistics for this conversion: libxslt failure
  2023-02-02 18:28:14 [1743880] Processing completed with 0 errors and 1 warnings.

Because of this error, bcl2fastq isn't able to generate any reports, and the Reports directory is empty. When we run the pipeline with S3 in the backend, Nextflow tries to upload the output on S3 but since the Reports directory is empty it gets confused and throws another error that breaks the pipeline.

Error executing process > 'NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:BCL2FASTQ (220422_M11111_0222_000000000-K9H97.1)'

Caused by:
  Missing output file(s) `Reports` expected by process `NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:BCL2FASTQ (220422_M11111_0222_000000000-K9H97.1)`

It looks like only the executable for bcl2fastq is extracted from the rmp, and the rest of the package is not fully installed: https://github.com/nf-core/demultiplex/tree/master/modules/nf-core/bcl2fastq and https://github.com/nf-core/modules/tree/master/modules/nf-core/bcl2fastq

Command used and terminal output

nextflow run 'https://github.com/nf-core/demultiplex' \
		 -name berserk_engelbart \
		 -params-file 'https://dev-nf-tower.tesseratx.internal/api/ephemeral/32X5Sn6bbRn7gWdllRq-0g.json' \
		 -with-tower 'https://dev-nf-tower.tesseratx.internal/api' \
		 -profile docker,test_bcl2fastq

These are the params:

params {
   input = 's3://fvl58-dev-pipelines-execution-data/inputs/nf-demux/flowcell_input.csv'
   demultiplexer = 'bcl2fastq'
   trim_fastq = true
   skip_tools = []
   multiqc_config = null
   multiqc_title = null
   multiqc_logo = null
   max_multiqc_email_size = '25.MB'
   multiqc_methods_description = null
   outdir = 's3://fvl58-dev-pipelines-execution-data/results/nf-demux/nf_tests'
   custom_config_version = 'master'
   custom_config_base = 'https://raw.githubusercontent.com/nf-core/configs/master'
   max_cpus = 16
   max_memory = '128.GB'
   max_time = '240.h'
   help = false
   publish_dir_mode = 'copy'
   plaintext_email = false
   monochrome_logs = false
   tracedir = '${params.outdir}/pipeline_info'
   validate_params = true
   show_hidden_params = false
   enable_conda = false
   email = null
   email_on_fail = null
   hook_url = null
   version = false
   schema_ignore_params = 'genomes'
   config_profile_description = 'Minimal test dataset to check pipeline function with bcl2fastq'
   config_profile_contact = null
   config_profile_url = null
   config_profile_name = 'Test bcl2fastq profile'
}

Relevant files

No response

System information

System information
Nextflow version: 22.10.1
Hardware: Nextflow Tower
Executor: awsbatch
Container: Docker

-profile test failing due to relative path in flowcell_input.csv

Description of the bug

As discussed in Slack, the test profile for this pipeline results in failure due to the relative path to the SampleSheet.csv in flowcell_input.csv. The pipeline seems to look for the SampleSheet.csv in a path relative to the current working directory.

Updating the path to the SampleSheet.csv to be absolute worked to resolve this issue, but there's probably a more generalizable solution. I'm new to nextflow / nf-core, so apologies if this should have been obvious.

Command used and terminal output

(nf-core-3.10.10) [jgbaum@node035 jgbaum]$ nextflow run nf-core/demultiplex -profile test,singularity --outdir test6
N E X T F L O W  ~  version 22.10.7
Launching `https://github.com/nf-core/demultiplex` [small_gates] DSL2 - revision: 8e42b6cb4e [master]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/demultiplex v1.1.0-g8e42b6c
------------------------------------------------------
Core Nextflow options
  revision                  : master
  runName                   : small_gates
  containerEngine           : singularity
  launchDir                 : /mnt/hpcscratch/jgbaum
  workDir                   : /mnt/hpcscratch/jgbaum/work
  projectDir                : /home/jgbaum/.nextflow/assets/nf-core/demultiplex
  userName                  : jgbaum
  profile                   : test,singularity
  configFiles               : /home/jgbaum/.nextflow/assets/nf-core/demultiplex/nextflow.config

Workflow options
  skip_tools                : []

Input/output options
  input                     : /home/jgbaum/.nextflow/assets/nf-core/demultiplex/assets/inputs/flowcell_input.csv
  outdir                    : test6

Demultiplexing options
  demultiplexer             : bclconvert

Institutional config options
  config_profile_name       : Test profile
  config_profile_description: Minimal test dataset to check pipeline function

Max job request options
  max_cpus                  : 2
  max_memory                : 6.GB
  max_time                  : 6.h

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/demultiplex for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.7153103

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/demultiplex/blob/master/CITATIONS.md
------------------------------------------------------
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:UNTAR                       -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:UNTAR       -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:BCLCONVERT  -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:FASTP                       -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:FALCO                       -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:MD5SUM                      -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:CUSTOM_DUMPSOFTWAREVERSIONS -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:UNTAR                       -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:UNTAR       -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:BCL_DEMULTIPLEX:BCLCONVERT  -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:FASTP                       -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:FALCO                       -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:MD5SUM                      -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:CUSTOM_DUMPSOFTWAREVERSIONS -
[-        ] process > NFCORE_DEMULTIPLEX:DEMULTIPLEX:MULTIQC                     -
No such file: /mnt/hpcscratch/jgbaum/assets/inputs/SampleSheet.csv

 -- Check script '/home/jgbaum/.nextflow/assets/nf-core/demultiplex/./workflows/demultiplex.nf' at line: 306 or see '.nextflow.log' file for more details

Relevant files

No response

System information

No response

Documentation and file type for SampleSheet.csv

Description of feature

This is related to #81, which states (but is not the main concern of the linked issue)

Also https://raw.githubusercontent.com/nf-core/test-datasets/demultiplex/testdata/MiSeq/SampleSheet.csv doesn't seem like proper csv to me.

Would it be possible to add some documentation on how to write such a SampleSheet.csv file, and (probably) fix the extension to TOML?

I tried to look for it on https://nf-co.re/demultiplex/1.1.0/usage, but there was no mention of this.

Ambiguous sampleSheet.csv structure

Description of the bug

Hello,

I am trying to run this pipeline to demultiplex an illimina miseq dataset of ITS amplicons.
I am kind of confused what should be in the samplesheet. In the .nextflow/assets/nf-core/demultiplex/docs/example_input.csv is reported as

flowcell,samplesheet,lane,run_dir
DDMMYY_SERIAL_NUMBER_FC,/path/to/SampleSheet.csv,1,/path/to/sequencer/output
DDMMYY_SERIAL_NUMBER_FC,/path/to/SampleSheet.csv,2,/path/to/sequencer/output
DDMMYY_SERIAL_NUMBER_FC2,/path/to/SampleSheet2.csv,1,/path/to/sequencer/output2

while in the main page of the nf-core is

id,samplesheet,lane,flowcell
DDMMYY_SERIAL_NUMBER_FC,/path/to/SampleSheet.csv,1,/path/to/sequencer/output
DDMMYY_SERIAL_NUMBER_FC,/path/to/SampleSheet.csv,2,/path/to/sequencer/output
DDMMYY_SERIAL_NUMBER_FC2,/path/to/SampleSheet2.csv,1,/path/to/sequencer/output2
DDMMYY_SERIAL_NUMBER_FC3,/path/to/SampleSheet3.csv,3,/path/to/sequencer/output3

and the description is even more confusing

Column	Description
flowcell	flowcell id
samplesheet	Full path to the SampleSheet.csv file containing the sample information and indexes
lane	Optional lane number. When a lane number is provided, only the given lane will be demultiplexed
run_dir	Full path to the Illumina sequencer output directory or a tar.gz file containing the contents of said directory

I tried both version but I got this an error back. What is the right way to specify this paramter?
Thanks a lot,
Gian

Command used and terminal output

nextflow run \
	nf-core/demultiplex \
	--input samplesheet.csv \
	--outdir demux_results \
	-profile singularity

Relevant files

No response

System information

22.10.6

The process stop or cannot find correct file path when add NoLaneSplitting, true

Description of the bug

I add NoLaneSplitting, true command in samplesheet to merge the Lane 1 and Lane 2. However, after I add this command, the BCLCONVERT demultiplexing show the file or path could not be found. I think it caused by Missing output file(s) **[!Undetermined]_S*_L00?_R?_00?.fastq.gz expected by process. But, the Undetermined reads are actually on there, but the sample I generated the full name is Undetermined_S0_R2_001.fastq.gz. I think there is no L00? in the name because I merged the lane together.

This error are not only for the undetermined reads, when you try to merge the lane together, all the output file will be generated with name *_S_00?_R?_00?.fastq.gz or *_S_00?_I?_00?.fastq.gz. Therefore, I think if it is possible to add or make change on the globs to fix this issue in order to make sure when the people use the NoLaneSplitting, true command, the process could work well?

Command used and terminal output

No response

Relevant files

No response

System information

No response

It looks like samplesheet.csv doesn't exist.

Description of the bug

At this site: https://nf-co.re/demultiplex/usage

When you click on "example samplesheet" a 404 is returned stating:

It looks like https://nf-co.re/demultiplex/1.0.0/assets/samplesheet.csv doesn't exist.

Command used and terminal output

When running the test on my terminal I received: 

WARN: There's no process matching config selector: CELLRANGER_MKFASTQ
No such file: /home/gxl674/sdp/sdp-nextflow/tests/assets/inputs/SampleSheet.csv

 -- Check script '/home/gxl674/.nextflow/assets/nf-core/demultiplex/./workflows/demultiplex.nf' at line: 223 or see '.nextflow.log' file for more details


The command was:
 nextflow run nf-core/demultiplex -profile test,docker --outdir results_demultiplex/

Relevant files

No response

System information

No response

RFC: Convert to stand-alone subworkflow(s)

Description of feature

Future plans for demultiplex workflow:

Attempt to convert into atomic subworkflows:

  1. wrap different demux strategies into subworkflows:
  • demultiplex_illumina (bcl-convert, bcl2fastq)
  • demultiplex_element (bases2fastq)
  • demultiplex_singlecell (cellranger, contributed by @apeltzer)
  1. wrap subworkflows into self contained subworkflow and submit to modules
  2. Wrap subworkflow into stand-alone demutliplex workflow with qc steps etc, baked in.

Open for comment!

cfr https://nfcore.slack.com/archives/CL88J906S/p1665042215379769

Input file name collision occurs for fastp and falco

Description of the bug

When I ran a pipeline (v1.1.0), this error occurs at beginning of fastp transaction.

ERROR ~ Error executing process > 'NFCORE_DEMULTIPLEX:DEMULTIPLEX:FASTP (7)'

Caused by:
  Process `NFCORE_DEMULTIPLEX:DEMULTIPLEX:FASTP` input file name collision -- There are multiple input files for each of the following file names:  [[NAMES OF fastq.gz]]

Also, on the Nextflow monitoring screen, the number of tasks assigned to Fastp is one less than the original number of samples, and it seems that two fastq.gz files are assigned to one sample and an error has occurred.

What should I do about this error? I am a beginner with Nextflow and don’t know how to debug it.

Command used and terminal output

No response

Relevant files

No response

System information

Nextflow version: version 23.04.1 build 5866
Hardware: HPC
Executor: local
Container engine: Singularity
OS: CentOS Linux
Version of nf-core/demultiplex: 1.1.0

Test Dataset

  • Where to store test dataset
  • Find a small test dataset
  • Can be made publicly available

Add UniverSC to support other technologies

We've recently released an open-source tool to expand the functionality of Cell Ranger to apply to other technologies. We provide a Docker container on an open-source license (see #2) which can be used to run Cell Ranger on other technologies without violating the 10X Genomics EULA. Note that this only applies to scRNA-Seq techniques at this stage.

GitHub repo: https://github.com/minoda-lab/universc
Docker container: https://hub.docker.com/r/tomkellygenetics/universc
Manuscript: https://www.biorxiv.org/content/10.1101/2021.01.19.427209v1

I'm willing to prepare a PR to call UniverSC in place of Cell Ranger is there is interest in doing this.

Add support for `WatchPath`

Description of feature

Add support for WatchPath to automatically trigger demultiplexing on run completion.

demultiplex: Convert parameter docs to json schema

Hi!

this is not necessarily an issue with the pipeline, but in order to streamline the documentation group next week for the hackathon, I'm opening issues in all repositories / pipeline repos that might need this update to switch from parameter docs to auto-generated documentation based on the JSON schema.

This will then supersede any further parameter documentation, thus making things a bit easier :-)

If this doesn't apply (anymore), please close the issue. Otherwise, I'm hoping to have some helping hands on this next week in the documentation team on Slack https://nfcore.slack.com/archives/C01QPMKBYNR

MultiQC report is empty

Description of the bug

Successful run yields empty MQC report

Command used and terminal output

No response

Relevant files

No response

System information

No response

Remove confusing samplesheet language

Description of the bug

The Demux tool "samplesheet" and the nf-core "samplesheet" are confusing to discuss.

This is not to get rid of either, just to rename one of them. I'm biased towards Element's "RunManifest" instead of "samplesheet" but I acknowledge if we adopted that we'd probably confuse more people.

So changes to the nf-core "samplesheet" terminology is what I think the way forward is.

Command used and terminal output

No response

Relevant files

No response

System information

No response

Add code-owners

Description of feature

Since we've got so many people supporting different pieces of software.

Make trimming optional

Description of feature

  • Add param to make trimming optional, while still keeping the QC report
  • Add param to skip fastp altogether

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.