nextflow-io / training Goto Github PK
View Code? Open in Web Editor NEWNextflow training material
Home Page: https://training.nextflow.io/
License: Other
Nextflow training material
Home Page: https://training.nextflow.io/
License: Other
Add some narrative, a couple of lines, then a table of contents
Replace Nextflow overview image with an image from Seqera Labs slides.
Make compute environment images smaller, see Seqera Slides
Describe each line of Your First Script (as in example1 on the Nextflow website).
Image: Switch to the top to bottom dag layout. Remove seqera logo stripes, replace them with the actual code for each task. Important to show that splitLetters is 1 process with 1 task and convertToUpper is 1 process with 2 tasks. Try in Figma.
For RNAseq, provide more background, 'I.e. there is a series of scripts, script1.nf to script7.nf, each building on the next.
Explain parameters: Parameters are inputs and options that can be changed when the pipeline is run.
TYPO "The second example" should say, "The second script"
TYPO In index process "process that creates a binary"
Change params.transcriptome to params.transcriptome_file, this way in the process there is no confusion.
TYPO "defines a *$*transcriptome variable
TYPO "-with-docker to launch each task of the execution as a Docker container run command"
In all scripts and docs, replace pair_id
with sample_id
Not essential, but it would be best if we have all the programs loaded as a single image.
At the moment, we have all the code being run in the .gitpod.yml init and script sections.
I assume it will be faster and simpler to have this prebuilt in a container
We need a section early on in the tutorial to explain this.
Plus, some basic screen shots to show where to find the newer versions and updates.
Currently we only have working training account on chriswyatt1 (without private Seqera material).
We need to get this fully on seqeralabs and off chriswyatt1. Had some issues with permissions before
The channel, processes and operators sections are pretty much copy and paste from the docs.
These sections could be way more interactive, building on the previous sections and be less dry than at present.
e.g. all channel types tested in the processes section, to see how they have been used.
Personally, I think I found this all tricky to grasp when I did the training. Having real world examples as exercises could help bring this section to life.
Finding weird behaviour of docs,
if a sub doc does not have at least three levels in the page (= == and ===), then it will break, and not print the lines correctly in the next doc.
In channel.adoc, if you have text longer than 326 lines, just adding: "The code below creates a channel containing 24 samples from a chromatin dynamics study (from SRA) and runs FASTQC on the resulting files." to the line 326, breaks the next section (processes)
We need documentation for how we set this up. Hosting etc.
Also, need instructions in the docs for how users access the gitpod run of the training
See exercise in section 4.3.1
We need a new setup all.sh with correct nextflow version.
## NF version export NXF_VER=20.04.1-edge echo "export NXF_VER=20.04.1-edge" >> ~/.bashrc mkdir -p ~/bin
The above was what we used, which meant dsl2 and running from git didn't work.
We also need documentation on how to :
Update the web training contents page
Create users for AWS.
How to setup the environment.
RNAseq pipeline:
Add val(pair_id)
not isolated on its own (default) pair_id
$baseDir/results
should be results
Use sampleId
and not pairId
for clarity
baseDir
is deprecated, should be projectDir
We also need documentation on how to :
Things to do:
If we switch to gitpod, singularity may not be possible to demonstrate.
I tried to find other repos with singularity. https://community.gitpod.io/t/singularity/5486/5
Looks like we would have to use a virtual machine to get around mount permissions.
See Snakemake example
Creating this issue here to keep track of the feedback
Abhinav:
Include pair programming to allow people to use their knowledge practically
Work on an advanced course
Except -c -C -ansi-log -with-tower -with-docker -resume etc.
We should point people to the docs. But also, mention other key cli options for run
in the examples. e.g.
-bg
-with-report
-with-trace
...
Probably could add many more
I think it didn't resume because of the usage of stdout
process convertToUpper {
input:
file y from letters.flatten()
output:
stdout into result
"""
cat $y | tr '[a-z]' '[A-Z]'
"""
}
I have ran into it a couple times earlier as well (with DSL2).
Python version needs to add brackets
Where is this documented, to get individual process information to screen, or is a typo.
Need to go through all docs and check that the code in the ascii docs is same as in the repo files
Resource requirements such as CPU and memory can change with different workflow executions and platforms. Nextflow can use $task.cpus
as a variable for the number of CPUs.
Setup: Improve instructions for cloud9 or switch to gitpot. Make it clear that "local" setup is not used in the cloud9 tutorial.
Talk
"embarassingly parrelization" sounds wrong. -> Change on slide to "Embarrasingly parallelizable"
What is idempotent? Maybe too technical
Mention that nf core pipelines are active and community owned.
Hello world
Mention what we are trying to achieve in this section of tutorial (toy example using text,,, but normally we use file inputs).
Where does task.cpus come from, we don't explain it (people always get confused and ask Q at this point). Either remove it, or explain it.
RNA-Seq
May need to rename the index to salmon_index for that process[done]
Make sure directives doc has all the links to cpus memory etc.[done, added link to docs in directives section]
Day3 :
Value and queue channels should be under different headers.
Weird function within the compression example. remove, and make sure it works.[done]
We should mention that most times you point to a script in a separate folder
Use of item or element used interchangable, and people get confused.
Day4
Listen to recording of best dsl2 section run thru, and rewrite to fit
Azure not in configuration docs
Random points
Always questions about quote marks. How to address this? Write a help section on it.
Gitpod:
Having nextflow install in init
section of gitpod doesn't work. We should just package the lot into a container.
ask users to login to gitpod the day before. It may remove Github issues, if they reenter a gitpod env. Need to test this.
Save chat after each zoom, useful to pick up useful questions, to help build the training (and docs)
Grep not finding file
Check answers to exercises:
https://github.com/seqeralabs/nf-training-public/blob/master/asciidocs/processes.adoc#L350
https://github.com/seqeralabs/nf-training-public/blob/master/asciidocs/processes.adoc#L476
https://github.com/seqeralabs/nf-training-public/blob/master/asciidocs/processes.adoc#L535
Currently the tower section follows the docs (https://github.com/seqeralabs/nf-training-public/blob/master/asciidocs/tower.adoc), with help showing how to run tower for the first time and basic functions etc.
We need to review what we have already here.
Maybe though, we should think about rewriting this section to follow how it is taught in the tutorial. With examples
Still many of the examples do not work, which is fine for hypothetical, but would be cooler if they actually process data too.
E.g. Blast run Section:6.4.
To complete the DSL2 part, we can create:
INCLUDE
and aliasesFor material see:
PDF attached
https://www.nextflow.io/docs/latest/dsl2.html
https://github.com/seqeralabs/nf-training-public/tree/master/nf-training/dsl2
Need to ensure this works for the tutorial outside of the AWS run.
Line 460- https://github.com/seqeralabs/nf-training-public/blob/Autumn_update/asciidocs/rnaseq_pipeline.adoc
There should be working local and gitpod setups.
Also, the AWS cloud 9 setup is different what we ask in the tutorial (Evan to update)
How to upload the tarball as a file to aws s3 bucket?
E.g. exercise in containers.adoc
What can we put for the answer for this exercise:
=== Exercise
Use the option -u $(id -u):$(id -g)
to allow Docker to create files with the right permission.
Include:
Exercise: Write a custom function that given the compressor name as a parameter, returns the command string to be executed. Then use this function as the process script body.
See here: https://github.com/seqeralabs/nf-training-public/blob/master/asciidocs/processes.adoc#L196
Not sure how to do this.
FYI: The exercise changed from being tcoffee to just compressing files, so that it will work without need to download software.
Once, we have an answer here, we can put in the exercise dropdown answer
I noticed that if you close the session, then it won't let you reopen the environment, try opening the following:
https://eu-west-3.console.aws.amazon.com/cloud9/ | IAM User | 195996028523 | user7-22 | Secret123!
Maybe some of the participants have the same issue and don't want to tell us
path
: instead of file in the training doc.run
. e.g. -bg -C myriad.configA declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.