Giter VIP home page Giter VIP logo

training's People

Contributors

a16n avatar abhi18av avatar atrigila avatar christopher-hakkaart avatar chriswyatt1 avatar drpatelh avatar dxu104 avatar evanfloden avatar ewels avatar fiuzatayna avatar flowuenne avatar iaradsouza1 avatar jorgeaguileraseqera avatar justicengom avatar jvfe avatar laribritto avatar llewellyn-sl avatar louislenezet avatar mariguilardi avatar mribeirodantas avatar munishchouhan avatar mythicalfish avatar pablo-aledo avatar pditommaso avatar robsyme avatar sateeshperi avatar swampie avatar vdauwera avatar xophmeister avatar yagdias avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

training's Issues

Basic review of section 1 to 3

  1. Add some narrative, a couple of lines, then a table of contents

  2. Replace Nextflow overview image with an image from Seqera Labs slides.

  3. Make compute environment images smaller, see Seqera Slides

  4. Describe each line of Your First Script (as in example1 on the Nextflow website).

  5. Image: Switch to the top to bottom dag layout. Remove seqera logo stripes, replace them with the actual code for each task. Important to show that splitLetters is 1 process with 1 task and convertToUpper is 1 process with 2 tasks. Try in Figma.

  6. For RNAseq, provide more background, 'I.e. there is a series of scripts, script1.nf to script7.nf, each building on the next.

  7. Explain parameters: Parameters are inputs and options that can be changed when the pipeline is run.

  8. TYPO "The second example" should say, "The second script"

  9. TYPO In index process "process that creates a binary"

  10. Change params.transcriptome to params.transcriptome_file, this way in the process there is no confusion.

  11. TYPO "defines a *$*transcriptome variable

  12. TYPO "-with-docker to launch each task of the execution as a Docker container run command"

  13. In all scripts and docs, replace pair_id with sample_id

Make a nf training docker image

Not essential, but it would be best if we have all the programs loaded as a single image.

At the moment, we have all the code being run in the .gitpod.yml init and script sections.

I assume it will be faster and simpler to have this prebuilt in a container

Create New gitpod repo on Seqera

Currently we only have working training account on chriswyatt1 (without private Seqera material).

We need to get this fully on seqeralabs and off chriswyatt1. Had some issues with permissions before

Channel, processes and operators sections

The channel, processes and operators sections are pretty much copy and paste from the docs.

These sections could be way more interactive, building on the previous sections and be less dry than at present.

e.g. all channel types tested in the processes section, to see how they have been used.

Personally, I think I found this all tricky to grasp when I did the training. Having real world examples as exercises could help bring this section to life.

Adoc missing sections

Finding weird behaviour of docs,

  1. if a sub doc does not have at least three levels in the page (= == and ===), then it will break, and not print the lines correctly in the next doc.

  2. In channel.adoc, if you have text longer than 326 lines, just adding: "The code below creates a channel containing 24 samples from a chromatin dynamics study (from SRA) and runs FASTQC on the resulting files." to the line 326, breaks the next section (processes)

Setup documentation Gitpod

We need documentation for how we set this up. Hosting etc.

Also, need instructions in the docs for how users access the gitpod run of the training

aws setup file needs updating with documentation

We need a new setup all.sh with correct nextflow version.

## NF version export NXF_VER=20.04.1-edge echo "export NXF_VER=20.04.1-edge" >> ~/.bashrc mkdir -p ~/bin

The above was what we used, which meant dsl2 and running from git didn't work.

We also need documentation on how to :

Update the web training contents page
Create users for AWS.
How to setup the environment.

RNAseq pipeline refinement

RNAseq pipeline:

Add val(pair_id) not isolated on its own (default) pair_id

$baseDir/results should be results

Use sampleId and not pairId for clarity

baseDir is deprecated, should be projectDir

AWS setup

We also need documentation on how to :

  • Update the web training contents page
  • Create users for AWS.
  • How to setup the environment.

dsl2 issues

Things to do:

  1. Are we happy with the generic foo bar examples, or should they be replaced with something "real"
  2. Can we at line 143 give an answer that prints to screen the contents of bar.txt, not just printing the path.
  3. Workflow section should include exercises
  4. Workflow section should have working code (not just examples foo bar).
  5. We need a "Best Approach" of converting dsl1 to dsl2

Feedback from seqera team

Creating this issue here to keep track of the feedback

Abhinav:

  1. Include pair programming to allow people to use their knowledge practically

  2. Work on an advanced course

  • Show basic errors to people, how to debug
  • How to develop a workflow and make it cloud ready ( + tower ready )
  • Transition a workflow from DSL1 to DSL2

Inclusion of more command line options

Except -c -C -ansi-log -with-tower -with-docker -resume etc.

We should point people to the docs. But also, mention other key cli options for run in the examples. e.g.
-bg
-with-report
-with-trace
...
Probably could add many more

Resumption issue for section-2.2

@evanfloden ,

I think it didn't resume because of the usage of stdout

process convertToUpper {

    input:
    file y from letters.flatten()

    output:
    stdout into result

    """
    cat $y | tr '[a-z]' '[A-Z]'
    """
}

I have ran into it a couple times earlier as well (with DSL2).

-ansi-log

Where is this documented, to get individual process information to screen, or is a typo.

EBI training issues

Setup: Improve instructions for cloud9 or switch to gitpot. Make it clear that "local" setup is not used in the cloud9 tutorial.

Talk
"embarassingly parrelization" sounds wrong. -> Change on slide to "Embarrasingly parallelizable"
What is idempotent? Maybe too technical
Mention that nf core pipelines are active and community owned.

Hello world
Mention what we are trying to achieve in this section of tutorial (toy example using text,,, but normally we use file inputs).
Where does task.cpus come from, we don't explain it (people always get confused and ask Q at this point). Either remove it, or explain it.

RNA-Seq
May need to rename the index to salmon_index for that process[done]
Make sure directives doc has all the links to cpus memory etc.[done, added link to docs in directives section]

Day3 :
Value and queue channels should be under different headers.
Weird function within the compression example. remove, and make sure it works.[done]
We should mention that most times you point to a script in a separate folder
Use of item or element used interchangable, and people get confused.

Day4

Listen to recording of best dsl2 section run thru, and rewrite to fit
Azure not in configuration docs

Random points
Always questions about quote marks. How to address this? Write a help section on it.

Training improvements 22_03_2022

Gitpod:

Having nextflow install in init section of gitpod doesn't work. We should just package the lot into a container.

ask users to login to gitpod the day before. It may remove Github issues, if they reenter a gitpod env. Need to test this.

Save chat after each zoom, useful to pick up useful questions, to help build the training (and docs)

Gitpod loads old deleted untracked files into the IDE

For some reason, we can see all old deleted files int he IDE, they are listed as untracked, but we need to get rid of them

image

The entries with a green U, are untracked, but they should not be there at all, in this master branch

Processes section working examples

Still many of the examples do not work, which is fine for hypothetical, but would be cooler if they actually process data too.
E.g. Blast run Section:6.4.

Presentation Day 1 issues raised

  1. Explain what the hello.nf script does first. What we want to produce from this script.
  2. Should we have a public version of the training, not a full version. but mini public git one.
  3. remove basedir from the outdir
  4. Mention 'path' different to 'val' and to 'file' as soon as we have these in "hello.nf"
  5. Good to have examples exploring the work directory to find the .command.sh
  6. Explain that nextflow channel factory input are a specific call from nextflow.
  7. Explain the rnaseq pipeline properly at the start. That we have rnaseq dataset for multiple tissues etc and a genome file we want to map to. I guess everybody at these courses has done RNAseq before.
  8. Check path index or 'index'. People getting confused.
  9. Really good recap, for multiqc. Really good to slow the pace and get everyone on board.
  10. Using the return and printing out exactly what is in each channel, is super helpful.
  11. Running nextflow on docker. Find a dsl1 or 2 version.

Local and Gitpod variant setups

There should be working local and gitpod setups.

Also, the AWS cloud 9 setup is different what we ask in the tutorial (Evan to update)

Upload to aws

How to upload the tarball as a file to aws s3 bucket?

Answer for docker user rights

What can we put for the answer for this exercise:

=== Exercise

Use the option -u $(id -u):$(id -g) to allow Docker to create files with the right permission.

Add a section for Tower

Include:

  • how to login (concept of auth),
  • how to add a pipeline (nextflow_schema.json)
  • concepts of credentials
  • concepts of compute environment
  • concept of orgs and workspaces

Answer to exercise (custom functions)

Exercise: Write a custom function that given the compressor name as a parameter, returns the command string to be executed. Then use this function as the process script body.

See here: https://github.com/seqeralabs/nf-training-public/blob/master/asciidocs/processes.adoc#L196

Not sure how to do this.

FYI: The exercise changed from being tcoffee to just compressing files, so that it will work without need to download software.

Once, we have an answer here, we can put in the exercise dropdown answer

Presentation Day 2 issues raised

  1. Make a new directory for the container to stop all the files in folder being added to the container.
  2. Explain the docker run options properly in the documentation.
  3. Explain docker launch within the script showing how docker is working within a nextflow .command.run file.
  4. Need version for SRA to be correct, do not need edge now.
  5. Make sure all the script are viewed, so they actually do something.
  6. Create many more exercise. Not to use all in the training, but useful for people learning in their own time.
  7. Add -process.echo to scripts as needed, so we can see output.
  8. When to use path : instead of file in the training doc.
  9. Section 6.2.4 should have drop down answers for all the tests.
  10. Section 6.3.3: Exercise should continue from previous one, rather than intriduce a new script.
  11. Do we need directives and publish dir in processes section, when already covered in RNA-Seq exp.
  12. Not sure .map is explained very well and the syntax, at start of operator section.
  13. Show full syntax and shortened, else it gets confusing (maybe).
  14. Show a full map function operation in both training and main nextflow docs. Real examples.
  15. Make sure nextflow flag options go before run. e.g. -bg -C myriad.config

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.