carpentries-incubator / workflows-nextflow Goto Github PK

Workflow management with Nextflow and nf-core

Home Page: https://carpentries-incubator.github.io/workflows-nextflow/

License: Other

Nextflow 99.32% Python 0.68%

lesson carpentries-incubator english pre-alpha rna-seq-analysis workflow-management bulk-rna-seq nextflow nf-core

workflows-nextflow's Introduction

This lesson is an introduction to the workflow manager Nextflow, and nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

Nextflow enables scalable and reproducible scientific workflows using software containers such as Docker and Singularity. It allows the adaptation of pipelines written in the most common scripting languages such as R and Python. Nextflow is a Domain Specific Language (DSL) that simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

This lesson also introduces nf-core: a framework that provides a community-driven, peer reviewed platform for the development of best practice analysis pipelines written in Nextflow.

This lesson motivates the use of Nextflow and nf-core as a development tool for building and sharing computational pipelines that facilitate reproducible (data) science workflows.

Contributing

We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way.

We'd like to ask you to familiarize yourself with our Contribution Guide and have a look at the [more detailed guidelines][lesson-example] on proper formatting, ways to render the lesson locally, and even how to write new episodes.

Please see the current list of [issues][https://github.com/carpentries-incubator/workflows-nextflow/issues] for ideas for contributing to this repository. For making your contribution, we use the GitHub flow, which is nicely explained in the chapter Contributing to a Project in Pro Git by Scott Chacon. Look for the tag . This indicates that the maintainers will welcome a pull request fixing this issue.

Maintainer(s)

Current maintainers of this lesson are

Graeme R. Grimes
Mahesh Binzer-Panchal

Authors

A list of contributors to the lesson can be found in AUTHORS

Citation

To cite this lesson, please consult with CITATION

Useful Incubator links

workflows-nextflow's People

Contributors

Stargazers

Watchers

workflows-nextflow's Issues

Add episode for workflow design

Sketch a diagram of your workflow.
Make a test data set (vital for making reproducible errors).
Start small.
Don't plan too much.
Quick and dirty vs Production level considerations.

Applicable material: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009809
Large scale data processing:

Malformed exercise in Reporting episode

Two exercise boxes in the Reporting episode seem to be malformed, presumably as an artifact from the Workbench transition. I would have opened a PR to fix, but I am not sure if they should be a single exercise or two.

Troubleshooting Nextflow: List of error examples, and their fixes.

I thought it would be a nice idea to keep track of various error messages from Nextflow, and how to solve them.

This should go nicely also with Phil's bytesize talk on troubleshooting workflows.

Another ref: https://www.nature.com/articles/d41586-022-00217-0

Scripting error.

Error executing process > 'krona_chart_kraken'

Caused by:
  Not a valid path value type: nextflow.script.ProcessDef (ProcessDef[process kraken2nt_reads])

This error indicates that a process is receiving the wrong type of input in the workflow block.

error causing code:

krona_chart_kraken(kraken2nt_reads)

where kraken2nt_reads is also a process which has been called.

solution:

krona_chart_kraken( kraken2nt_reads.out )

The process krona_chart_kraken is expecting output from a process kraken2nt_reads which has been run previously. To
access output from another process, the .out suffix is needed.

Nextflow review @bobturneruk ep1

ep1
I get this output from the script, which is different to the course material. Might be good if the output number wasn't 0?

(nf-training) bobturner@tubby:~/nf-training$ nextflow run word_count.nf
N E X T F L O W  ~  version 20.10.0
Launching `word_count.nf` [boring_nobel] - revision: 72656509cb
executor >  local (1)
[14/523859] process > NUM_LINES (1) [100%] 1 of 1 ✔
SRR2584863_1.fastq.gz   0

Perhaps this helps? bobturneruk@6be012f
I think there are some duplicate items in the list of script contents. I suggest bobturneruk@0878421

Link suggestions from 15-16 delivery

Suggest adding a links glossary with the following

Nf-core pipelines: https://nf-co.re/pipelines
Also, a really nice step-by-step usage guide to nf-core pipelines: https://nf-co.re/usage/introduction
Troubleshooting guide: https://nf-co.re/usage/troubleshooting
Example nf-core institutional config file: https://github.com/nf-core/configs/blob/master/conf/eddie.config
If you wanted to check if your institute has nf-core docs (for example a config file), they have a list of the systems here: https://github.com/nf-core/configs#documentation
This is a good example of how you can usefully use log.info in a larger pipeline https://github.com/nextflow-io/rnaseq-nf/blob/3b5b49f/main.nf#L41-L48
For advanced or specific questions, nextflow have a slack space - there's a blog post which explains how to join that and what their other communication channels are: https://www.nextflow.io/blog/2022/nextflow-is-moving-to-slack.html
https://nf-co.re/join
Here's some details on a nextflow issue relating to cancelling pipelines that are running. nextflow-io/nextflow#1441
Cool paper on how a ribosome profiling software went about choosing a workflow management system: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008622

Feedback on Course materials from Tomas Larsson

Notes for Mahesh with comments on Nextflow course material (https://carpentries-incubator.github.io/workflows-nextflow/):

SETUP section:

For creating the training environment there is no environment.yml file

Under the heading "Add Nextflow binary to your user’s PATH:"
the section "Check the correct installation running the following command:"
is accidentally repeated twice.

Getting started with nextflow

General comment: Pictures are rather small with small font.

Your first script section:
"Open the file wc.nf in the script directory with your favourite text editor."
could instead be
"Open the file "nf-training/scripts/introductions/wc.nf" with your favourite text editor.

Weird formatting of last comment in CODE section. (line breaking in "without")

Since the environment.yml file in introduction does not exist the reference to the nextflow
environment in nextflow config does not work as it is written now.

"We will learn how exactly nextflow using work directory to execute processes in the following sections."
could be:
"We will learn how exactly nextflow is using the work directory to execute processes in the following sections."

Nextflow scripting

This section would probably benefit from some more exercises with solutions. Otherwise fine.

Workflow parameterization

First section: "Objectives"
"Add a pipeline parameters to a Nextflow script."
Should be:
"Add pipeline parameters to a Nextflow script.

"Pipeline parameters"
"wild cards" should be replaced with "wildcards", right?

Channels

no comments on this section

Processes

no comments on this section

#Processes Part 2
"Output values"
"...if we want to share a value input into one process as input to another process we would need..."
this wording is confusing to me (might have misunderstood this), shouldn't it read something like:
"...if we want to share a value output from one process as input to another process we would need..."

"Output files"
the example code is missing the "script:" line (should probably be there to be consistant with the rest of the examples)

Workflow

no comments on this section

Operators

The groovy keyword $it "(‘it’ is for ‘item’)" is first explained in this section although it is used several times in earlier examples.
Maybe introduce it like this first time it is encountered in the lectures.

"Grouping contents of a channel by a key."
Shouldn't the output of the first code example here be: "[wt, [wt_1.fq, wt_2.fq]]"
instead of "[wt, [wt_1.fq, wt_1.fq]]"?

Nextflow configuration

The section "Configuring Nextflow vs Configuring a Nextflow workflow"
is repeated twice by mistake.

"Inspecting the Nextflow configuration"
missing example output.

Simple RNA-Seq pipeline

Under objectives:
"Produce an execution report and generates run metrics from a pipeline run."
should read:
"Produce an execution report and generate run metrics from a pipeline run."

Had to change multiqc version in environment.yml (instructions point to old version that doesn't work on my mac) in order to get it to work
under "MultiQC report".

Modules

no comments on this section

Sub-workflows

Under heading "Workflow outputs":
"The output QUANT.out is assigned the name read_quant The the result of the above snippet can accessed using:"

should read:
The output QUANT.out is assigned the name read_quant. The result of the above snippet can accessed using:"

Under heading "Workflow composition":

"After which thet can then be invoked" should be "After which they can then be invoked"

Reporting

no comments on this section

Workflow caching and checkpointing

no comments on this section

Deploying nf-core pipelines

under heading "Sorting available nf-core pipelines":

"Archived pipelines are not returned by default. To include them, use the --show_archived flag."
should read:
"Archived pipelines are not returned by default. To include them, use the --show-archived flag."

Under heading "Running nf-core pipelines" and "Development Releases"
"For pipelines with a stable release this the default branch is..."
should be:
"For pipelines with a stable release the default branch is..."

under heading "Config files" point 2 in the list:

"There two test profile,..." should read "There are two test profiles,..."

Add difference between channel.empty and []

Explain this difference.
Is this in channels episode?

Add learner profile for undergraduate

Add a learner profile based on an undergraduate honours student, Kevin Chen.

Self-assigning to me as I have students like that in my lab and an idea what to write.

Related to #78

Nextflow review @bobturneruk ep2 sleep parameters

The first exercise in ep2 has a lot of content shared directly with the preceding bit of live coding - we've already talked participants through how to add a sleep parameter, then this is repeated as an exercise.

An alternative exercise might be to ask for sleep_before and sleep_after parameters?

Incorrect module name?

Is it possible that the second example in the include module section has a small error?

E.g. it current reads:

include { index; index as index2 } from './modules/rnaseq.nf'

workflow {
  transcriptome_ch = channel.fromPath('/data/yeast/transcriptome/*.fa'
  index(transcriptome)
  salmon_index(transcriptome)
}

using salmon_index as in the previous example. Maybe that should be index2 instead? For example

include { index; index as index2 } from './modules/rnaseq.nf'

workflow {
  transcriptome_ch = channel.fromPath('/data/yeast/transcriptome/*.fa'
  index(transcriptome)
  index2(transcriptome)
}

Suggestion: Limit RNAseq tools to a single episode

One suggestion that may help with uptake is to limit the RNAseq workflow tools and scripts to a single episode.

@sateeshperi would like to use a variant calling workflow instead for his episodes. This could be included here then as a supplementary episode which could be a drop-in replacement for the RNAseq workflow scripting episode.

It would also allow us to incorporate other domain specific workflow episodes such as for image analysis, proteomics, etc so teachers can deliver this course more easily to their domain.

This may also then help with environment setup if the scripts are then changed to only use bash commands, such that only nextflow, java, and nf-core are necessary requirements. Each domain specific episode could have it's own conda/package manager environment setup too.

Learner profiles?

Hello,
I was trying to look at your learner profiles to find out who the lesson is aimed at.

It would be very helpful to add some: https://cdh.carpentries.org/deciding-what-to-teach.html

The material and figures look very clear!
Edward

Reordering description of the 'First Script' description

The reordering and re-configuring allows for the learner to follow the script along with the description in a more appropriate, line by line- rather than searching through the description for the description line that matches the section of code. Removed some lines of code that were repeated.

Here is the revised text:

An optional interpreter directive (“Shebang”) line, specifying the location of the Nextflow interpreter.
2.nextflow.enable.dsl=2to enable DSL2 syntax.
A multi-line Nextflow comment, written using C style block comments, followed by a single line comment.
A pipeline parameter params.input which is given a default value, of the relative path to the location of a compressed fastq file, as a string.
An unnamed workflow execution block, which is the default workflow to run.
A Nextflow channel used to read in data to the workflow.
A call to the process NUM_LINES.
An operation on the process output, using the channel operator .view().
A Nextflow process block named NUM_LINES, which defines what the process does.
An input definition block that assigns the input to the variable read, and declares that it should be interpreted as a file path.
An output definition block that uses the Linux/Unix standard output stream stdout from the script block.
A script block that contains the bash commands printf '${read}' and gunzip -c ${read} | wc -l.

Add work folder structure and resuming with cached results to key points.

For episode 1 add work folder structure to key points.

JOSE Submission Requirements

Create manuscript

Create paper.md
Create paper.bib

files as described in the JOSE submission guide for learning modules

Update setup instructions

Update the setup instructions to include

Required Software and versions in a table
A link to conda environment.yml file
Replace atom editor with visual studio code setup

Engineering workflows to apply processes to a second/nth set of inputs seems to cause confusion.

Nextflow does not allow processes/ subworkflows to be reused with the same name. This causes troubles engineering workflows when a learner wants to apply the same process to another output channel.
e.g. https://twitter.com/yokofakun/status/1466101898625363968?t=Iw-opIManrMXGjKLLlmhxw&s=03 (slightly different case)

This use case perhaps needs to be more explicit in the possible solutions.

Often we think of a workflow as a series of steps, e.g. read QC, trimming, post-trimming QC, resulting in trying:

workflow {
    FASTQC( read_ch )
    TRIM_READS( read_ch )
    FASTQC( TRIM_READS.out.trimmed_reads )
}

It's not often intuitive that one can swap the position of lines 2 and 3, and then use a channel operator like mix to merge channels for processing.

workflow {
    TRIM_READS( read_ch )
    FASTQC( read_ch.mix( TRIM_READS.out.trimmed_reads) )
}

The alternative is using an alias and separating the process/subworkflow into a module. This option also enhances readability and helps follow a writers natural mental workflow flow, and allows for additional configuration using process selectors.

Change Training Datasets for Nextflow lesson

Change the existing RNA-Seq data used for as the training dataset to data in https://github.com/ggrimes/nextflow_rnaseq_training_dataset

Nextflow review @bobturneruk ep3 DSL1 reference

Can the DSL1 reference be removed?

I assume the course is DSL2 (the current DSL) specific and that people doing it are new to Nextflow, so DSL1 might not be relevant to them.

Remove conda step episode 09 Simple_Rna-Seq_pipeline

Remove the conda step from episode 09 simple rnaseq pipeline script02.nf

wc.nf script is missing?

The first lesson refers to a wc.nf script in the scripts folder:

Open the file wc.nf in the script directory with your favourite text editor.

It seems this folder is not included in the material downloaded from Seqera.

ls
data		env.yml		hello.nf	parsing		script3.nf	script6.nf
docs		exercises	mail.config	script1.nf	script4.nf	script7.nf
dsl2		hands-on	nextflow.config	script2.nf	script5.nf	setup

Perhaps it has moved? Or perhaps I misunderstood and learners are actually expected to create a scripts folder and the wc.nf script within it?

Thanks a ton for creating the nextflow training material, super useful!
Thomas

Add Alt text to Getting Started with Nextflow episode images

The first three figures in Getting Started with Nextflow are lacking alternative text descriptions.

Episode 1 introduces too many new concepts.

What a workflow or pipeline is.
There is a mention of programming logic
workflow management system
wfms features:
- run time management
- software management
- portability
- interoperability
- reproducibility
- reentrancy
- checkpointing
- prototyping
DSL and syntax versions
data flow programming model
Groovy and Java
processes, channels, workflows
workflow execution.
batch schedulers, cloud platforms, various execution platforms
writing a script.

Perhaps there are more things too that someone taking this course might be unfamiliar with as well.
I think material here needs to be separated and some of these concepts explained better.

Add Atom editor to setup guide.

Add the setup of the Atom editor in the setup guide so learners can have syntax highlighting for nextflow scripts during the lesson.

Add to channels episode to use `file` to make a value channel.

Often one wants to make a value channel for a file input.
The channel factory fromPath however creates a queue channel. One can either do:

input_ch = fromPath( params.input, checkIfExists: true ).collect()
// OR
input_ch = file ( params.input, checkIfExists: true )

The difference here is that the first is a true channel, while the second is a file object. However,
in DSL2, non-channel objects are implicitly converted into a value channel when used as input to a process.

Should we add a Handling and recovering from errors episode?

I was wondering whether we should add an episode specific to handling and recovering from errors.

It would include how to use the .command.sh and .command.run in the working directories to debug.
Perhaps mention some useful directives, e.g. beforeScript or afterScript.

Or should we have something instead on bad coding practice, e.g. downloading files using wget/curl instead of
Nextflow's file inbuilt file staging mechanism?

Clarify that directories should be searched recursively?

The Channel glob exercise currently reads:

Use the Channel.fromPath method to create a channel containing all files in the data/yeast/ directory

It doesn't specify that files in the subdirectories should be listed as well. (I only realized that when I looked at the solution.) Perhaps add "recursively" to the task or - with less jargon - "including those in subdirectories" to the task's description?

Mental model confusion example for channels.

I'm not sure where to record this yet, so I decided to store it as an issue here for now.

I was just demonstrating how to use Nextflow and I came across this difficulty one had in building their mental model. Might be useful for review later.

script:

#! /usr/bin/env nextflow

nextflow.enable.dsl = 2

workflow {

    channel.of(1, 2, 3, 4).set { input_ch }
    FOO(input_ch)

}

process FOO {

    input:
    val nums
  
    output:
    path "*.txt"

    script:
    """
    echo "$nums" > ${nums}.txt
    """
}

The confusion was why did we need both input_ch and nums in the script ( solution was explained by their colleague - peer instruction ). Their explanation: input_ch is a variable that has all the inputs, while nums only takes one value. nums is like an i in a bash for loop.

$basedir variable might need to be introduced

When the publishDir directive is introduced the $basedir variable is mentioned. Perhaps it would be useful to introduce it as well?

Explaining the tools-software we are going to use in exercises

Based on the learner feedback we got after the first training it may be a good idea to explain the tools-software we are using in the examples/exercises (e.g. samtools, salmon, multiqc, fastqc etc.) or since it is argued that Nextflow is not only designed for the use of bioinformaticians the examples/exercises should be replaced to use more general tools/software (e.g. bash commands)

I am more in favour of explaining the tools we are using currently in the pipeline. But, also accept that it is a valid argument and should be probably discussed here or via Slack.

Config priority confusion issue.

Config priority is listed as:

Parameters specified on the command line (--something value)
Parameters provided using the -params-file option
Config file specified using the -c my_config option
The config file named nextflow.config in the current directory
The config file named nextflow.config in the workflow project directory
The config file $HOME/.nextflow/config
Values defined within the pipeline script itself (e.g. main.nf)

Then code written such as this:
main.nf:

nextflow.enable.dsl = 2

workflow {
    if ( params.foo ) {
        FOO ( params.message )
    } else {
        BAR ( params.message )
    }
}

nextflow.config:

params {
    message = 'Hello'
    foo = false
}
if ( params.foo ){
    process {
        withName: 'FOO' {
            echo = true
        }
    }
} else {
    process {
        withName: 'BAR' {
            echo = true
        }
    }
}

Then override params.foo with a custom config -c.

process.foo = true

The output of this, is that the process FOO runs, but it doesnt echo. A confusing issue here is that the custom.confighas a higher priority than thenextflow.config, so why does FOO execute but not echo`?

The logic appears to be that first the command-line parameters are read in. Then comes the -params-file, but not overriding anything that has a value. Then comes the nextflow.config, in which the code is evaluated before being stored. Then comes the custom config, which is again evaluated before, but then overrides anything set in nextflow.config.

This then means that when nextflow.config is read in, params.foo is false, setting the configuration for BAR. After which the custom config is read in, overriding params.foo, so FOO executes in the workflow.
This leads to unexpected behaviour as -c is thought to have higher priority, and while this is true for the workflow execution, it's not true for the config "execution".

wc.nf script duplicates output lines

When I run the wc.nf script with the specified version of nextflow, the output (number of lines + filename) is duplicated:

nextflow run wc.nf
N E X T F L O W  ~  version 20.10.0
Launching `wc.nf` [goofy_hugle] - revision: 46c287748f
executor >  local (1)
[14/39d7d2] process > numLines (1) [100%] 1 of 1 ✔
    3628 ref1_1.fq.gz

    3628 ref1_1.fq.gz

When I rerun it after commenting out the num_out.view() line in the script, the output is only shown once:

nextflow run wc.nf
N E X T F L O W  ~  version 20.10.0
Launching `wc.nf` [kickass_stone] - revision: 0ccac4963b
executor >  local (1)
[51/c915be] process > numLines (1) [100%] 1 of 1 ✔
    3628 ref1_1.fq.gz

Add more explanation to caching behaviour with and without using '-resume'

In the training we had recently, there was a question:

if the hash path under work directory is calculated using the following:

Inputs values

Input files

Command line string

Container ID

Conda environment

Environment modules

Any executed scripts in the bin directory

Then how come if we run the same pipeline twice without -resume the calculated hashes are different

I can speculate that if we run it without -resume then probably nextflow also takes into account some other value (e.g. current time) but I do not know for sure and could not easily find the answer. But it maybe a good idea to add some section to avoid confusion

Nextflow configuration

When configuring workflows, need to explain that workflow parameters should be overriden in a file params.yml and passed to -params-file, instead of using -c with a custom config. There is an important distinction here since params are evaluated immediately as they are read in.

Episode 1: Key features of wfms is very similar to Nextflow core features.

Episode 1 seems to have a lot of redundancy when presenting key features of a wfms, and Nextflow core features.

Can we shorten this to reduce reading material? Also wfms may not support all of these things.

Add to channels about using `.out`

The channels episode does not currently describe how to access channel outputs or the difference when .out should be used.
e.g.

workflow {
    index = INDEX( input_ch )
}

workflow {
    INDEX( input_ch )
    index = INDEX.out
}

Tree command explanation

The lesson sometimes uses the unix tree command to display the folder structure.
Write a callout to explain the tree command in the first lesson.

Nf-tower episode

A supplement episode using Tower and it's CLI would be helpful

Channel broken mental model example.

I saw some people do this in the nf-core hackathon today.

This is valid syntax, but doesn't combine input as wanted, which some have tried to use:

workflow {

     input = [
        [ id: "test", single_end: false  ], 
        [ file('SRA_test_1.fastq.gz'), file('SRA_test_2.fastq.gz') ]
     ]
     FOO(input)
     
      next_input = [
          [ id: "test", single_end: false ],
          FOO.out          // This is invalid. Channels and data can not be combined like this.
      ]
      BAR(next_input)
}

Filter exercise tries to divide a string by an integer

Executing the solution for the "Filter a channel exercise" raises an error because the strings X and Y cannot be used for division.

chr_ch = channel
 .of( 1..22, 'X', 'Y' )
 .filter({ it %2 == 0 })
 .view()

returns

2
4
6
8
10
12
14
16
18
20
22
Unknown method invocation `mod` on String type

Typo in nfcore_config.png

In this image there is a typo in the config file path:

$HOME/.nextflow.config should be $HOME/.nextflow/config

I'm not sure this image is being used anymore, but in any case might be good to fix it.

Channel operator example: Merge vs Join

merge_vs_join_example.nf:

nextflow.enable.dsl = 2

include { SLEEP as SLEEP_ALPHA; SLEEP as SLEEP_BETA; MERGE_CHANNELS; JOIN_CHANNELS } from './nf_procs'

workflow {
	alpha_ch = Channel.of(['A', 1 ], ['B' ,2 ], ['C', 3 ])
	beta_ch  = Channel.of(['A', 4 ], ['B' ,5 ], ['C', 6 ])

	SLEEP_ALPHA( alpha_ch )
	SLEEP_BETA ( beta_ch )
	MERGE_CHANNELS( SLEEP_ALPHA.out, SLEEP_BETA.out      ).view()
	JOIN_CHANNELS ( SLEEP_ALPHA.out.join(SLEEP_BETA.out) ).view()
}

nf_procs.nf:

process SLEEP {

	input:
	tuple val(char_i), val(num_i)

	output:
	tuple val(char_i), val(num_i)

	script:
	"""
	sleep \$[ ( \$RANDOM % 10 )  + 1 ]s
	"""
}

process MERGE_CHANNELS {

	input:
	tuple val(char_i), val(num_i)
	tuple val(char_j), val(num_j)

	output:
	stdout()

	script:
	"""
	echo "${char_i},${num_i} + ${char_j},${num_j}"
	"""
}

process JOIN_CHANNELS {

	input:
	tuple val(char_ij), val(num_i), val(num_j)

	output:
	stdout()

	script:
	"""
	echo "${char_ij}, ${num_i}, ${num_j}"
	"""
}

output:

nextflow run merge_vs_join_example.nf 
N E X T F L O W  ~  version 21.04.0
Launching `merge_vs_join_example.nf` [jovial_pare] - revision: b7291d9907
executor >  local (12)
[7a/2b9736] process > SLEEP_ALPHA (2) [100%] 3 of 3 ✔
[e9/419ff2] process > SLEEP_BETA (3)  [100%] 3 of 3 ✔
[56/85577d] process > MERGE_CHANNELS (3)    [100%] 3 of 3 ✔
[a8/41a969] process > JOIN_CHANNELS (3)     [100%] 3 of 3 ✔
C,3 + B,5
A, 1, 4
A,1 + A,4
C, 3, 6
B, 2, 5
B,2 + C,6

Using this example, although clear, would entail introducing DSL2 modules before channel operators.

Add instructions for using Gitpod to learn Nextflow

The course can use the nf-core Gitpod container which will spin up cloud based dev environment that has Nextflow, and nf-core preinstalled along with some other tools, including docker, mamba, and conda.
Learners have a browser based file editor and terminal in one to learn Nextflow.
Add notes to make a subdirectory in /workspace, otherwise it's not saved if the session goes idle.
There are 16 cores to use, and some small amount of memory and diskspace.
It's able to run the test profile of some nf-core workflows ( using -profile test,docker).

Interesting style presentation.

https://twitter.com/bioinforad/status/1465208701883584513?t=mY7LqQwp_x0unXif1w1Htw&s=19

Thoughts on Core vs Supplementary material

What are your thoughts on the core curriculum and supplementary material?

Should nf-core be part of the core materials or a supplementary material?
There are a few episodes we could write on nf-core workflows.

How to run an nf-core workflow (perhaps this should be core material?)
How to use nf-core modules in your own workflow ( should this be supplementary, or is this beyond the scope of this lesson and we point to nf-core docs?)
Including nf-core workflows in your workflow. (same thoughts as above)

Multiple outputs in a Process.

Maybe include an example of multiple defined outputs in the Process episode

For example:

Below we create 2 files and then in the workflow scope mix the output channels and group together by tuple id
`
nextflow.enable.dsl = 2

process P1 {

output:
tuple val('tupleid'), path('p1.txt')
tuple val('tupleid'), path('p2.txt')

script:
"""
touch p1.txt
touch p2.txt
"""
}

workflow {

P1()
P1.out[0].mix(P1.out[1]).groupTuple().view()

}
`

Thoughts on the duration of training

We have completed the in-person training in Tartu on 22-23 November 2021.

This issue contains my thoughts on the timing of the whole course and for each individual episode:

First, per individual episode:
DAY 1

01-getting-started-with-nextflow -> 23 mins
02-nextflow_scripting -> 1 hour 13 mins
03-workflow_parameters -> 30 minutes
04-channels -> 57 minutes
05-processes-part1 -> 1 hour 55 mins
05-processes-part2 -> 1 hour 45 mins
====== (End of DAY 1) =======
OVERALL DAY 1: ~400 minutes = 6 hours 40 minutes

DAY 2

I switched the order of operators and workflow episodes because it makes more sense (e.g. we use collect operator in workflow episode which we learn in operators)

07-operators -> 1 hour 30 minutes
06-workflow -> 45 minutes
08-configuration -> 1 hour 30 minutes
09-Simple_Rna-Seq_pipeline -> 2 hour 10 minutes
10-Modules -> SKIPPED
11-subworkflows -> SKIPPED
12-reporting -> SKIPPED
13-workflow_checkpoint_caching -> 30 minutes
14-nfcore -> 40 minutes
====== (End of DAY 2) =======
OVERALL DAY 2: ~420 minutes = 7 hours

NOTES:

I waited for everyone to complete the exercises and not hurry them up.
I did not skip almost any exercises (except the ones in nf-core episode, I demonstrated all myself)
Attendees were happy with the pace of the training (not too fast, not too slow/boring)
These timings are based on the material I have slightly changed from the original https://github.com/kerimoff/workflows-nextflow
I am completely convinced that this training should be a 3-day training

That is how I would prefer it to do next time:
DAY 1

01-getting-started-with-nextflow -> 23 mins
02-nextflow_scripting -> 1 hour 13 mins
03-workflow_parameters -> 30 minutes
04-channels -> 57 minutes
07-operators -> 1 hour 30 minutes (adding some more operators and exercises + 20 minutes)
Give some introduction to Processes (40 minutes may be)
====== (End of DAY 1) =======
OVERALL DAY 1: ~270 + 20 + 40 minutes = 330 minutes = 5 hours 30 minutes

DAY 2

05-processes-part1 -> 1 hour 25 mins (since we did some introduction in DAY1)
05-processes-part2 -> 1 hour 45 mins
06-workflow -> 45 minutes
08-configuration -> 1 hour 30 minutes
13-workflow_checkpoint_caching -> 30 minutes
====== (End of DAY 2) =======
OVERALL DAY 2: ~360 minutes = 6 hours

DAY 3

09-Simple_Rna-Seq_pipeline -> 2 hour 10 minutes
10-Modules -> SKIPPED
11-subworkflows -> SKIPPED
NEW EPISODE - Debugging pipelines
12-reporting -> SKIPPED
14-nfcore -> 1 hour 20 minutes (instead of 40)
====== (End of DAY 3) =======
OVERALL DAY 3: ~210 minutes + UNKNOWN for 3 skipped episodes + overall questions and discussion= 6 hours

carpentries-incubator / workflows-nextflow Goto Github PK

workflows-nextflow's Introduction

Contributing

Maintainer(s)

Authors

Citation

Useful Incubator links

workflows-nextflow's People

Contributors

Stargazers

Watchers

Forkers

workflows-nextflow's Issues

SETUP section:

Getting started with nextflow

Nextflow scripting

Workflow parameterization

Channels

Processes

Workflow

Operators

Nextflow configuration

Simple RNA-Seq pipeline

Modules

Sub-workflows

Reporting

Workflow caching and checkpointing

Deploying nf-core pipelines

Recommend Projects

Recommend Topics

Recommend Org