Giter VIP home page Giter VIP logo

iwc's Introduction

IWC - Intergalactic Workflow Commission

Galaxy Workflow Linting and Tests Gitter

The IWC maintains high-quality Galaxy Workflows

Workflows are categorized in the workflows directory, and listed in Dockstore and WorkflowHub.

All workflows are reviewed and tested before publication and with every new Galaxy release. Deposited workflows follow best practices and are versioned using github releases. Workflows also contain important metadata, such as:

  • License
  • Author
  • Institutes

Additionally the IWC will collect further best practices, tips and tricks, FAQs and assist the community in designing high-quality Galaxy workflows.

Importing Workflows into Galaxy

To import IWC workflows into your Galaxy instance use the TRS workflow search in the Galaxy interface. Click on "Workflows", "Import" and click on the "search form" link. Select a TRS server from the drop-down menu and enter organization:iwc (for workflowhub.eu) or organization:iwc-workflows (for Dockstore) in the text box.

Workflow Import GIF

Accessing IWC workflows via usegalaxy.* servers

All IWC workflows are automatically installed onto usegalaxy.* servers (i.e. Galaxy Main, Galaxy Europe, Galaxy Australia). They can be accessed via the following lists of published workflows:

Contributing a Workflow

Anyone can contribute a Galaxy Workflow. Please check out the Adding workflow guidelines.

If linting passes, tests pass, and human review passes, the PR is merged and

Becoming a IWC member

Everyone is welcome and can help out with reviewing workflows. Post a comment here with your expertise and we will add you to the IWC organization.

iwc's People

Contributors

annasyme avatar bebatut avatar bernt-matthias avatar bgruening avatar bwlang avatar clsiguret avatar debjyoti197 avatar delphine-l avatar drosofff avatar engynasr avatar gallardoalba avatar jxtx avatar kciy avatar kostrykin avatar lecorguille avatar lldelisle avatar martenson avatar mblue9 avatar mvdbeek avatar nagoue avatar nekrut avatar nsoranzo avatar pvanheus avatar rlibouba avatar simleo avatar simonbray avatar wm75 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

iwc's Issues

Maintenance of workflows

  • Replace subworkflow using a picker widget
  • Pull out subworkflows to stored workflows

Eventually allow references to workflows

Add version as tag?

I noticed that when people import a workflow it is difficult to find which version of the workflow it was. Do you think it would be a good idea to automatically (or not) add the version as a tag of the workflow?

Minutes from our first meeting

Codename: IWC

ACTION: Logo
ACTION: write a mission statement out of this document

  • Brad and Jen

Best practices

  • encourage good things, prevent bad things
  • recommendations
  • organizing workflows
  • cleaning up published workflows
  • what makes a good workflow usable
  • annotation of workflows
  • ACTION: EVERYONE - define a list of metadata
    • linter for workflow metadata, does it fulfill our recommendations
  • generic workflows, workflow templates, modules
  • example datasets for workflows

Discoverability

  • finding and discover workflows
  • central repository
  • EDAM
  • ACTION: everyone please think about it and come up with ideas (e.g. github, myexperiments)
    • start with a example repo -> jmchilton/workflow-testing
    • spin up a central-repo which is only to link to other repos

User

  • collecting use-cases (ACTION: everyone)
  • workflows are not used!!! Figure out why, and fix this, care about the best-practices later
  • user evaluation, usability
  • workflow language is not our thing - political nonsense
  • we should concentrate on the user perspective
  • workshops
  • testing
    • ACTION: documentation testing and running workflows from a command-line
    • Brad, with hints John

Diverse

  • recommendation of tools

  • tours for workflows

  • different more educational view of workflows

  • histories are workflows, we should consider this and define guideline when a history is useful and when a workflow is useful

  • responsible for the infrastructure, come up with ideas and push this to the core team if we can not implement this on our own

  • versioning, provenance of workflows

  • ACTION item: monthly update could be written. --> Brad and Jen.

Improvement of CI

Would be good if CI could:

  • check the workflow has a release
  • check the workflow release is in the CHANGELOG.md
  • check the .dockstore.yml has a testParameterFiles and a authors.

Does it make sense to create a Gitpod for workflow PRs?

We've got pretty sophisticated Galaxy test environments enabled on galaxyproject/galaxy using gitpod. It might be cool to have a configuration that would create Galaxy containers ready to run PRs with workflow populated and published and tools available.

This is a response to a question from @bwlang at the GCC 2021 Americas workflow session.

"Meaningful IWC review involves downloading the .ga file, uploading to a local galaxy, then running locally. Any plans to reduce the barrier to review?"

Run tests against several servers and implement blocklist

Benefits:

  1. ensures that iwc WFs are usable on more public Galaxy instances
  2. helps discover differences in installed tools on usegalaxy.* instances

We could start by testing on usegalaxy.org and usegalaxy.eu.

WFs that are known to run only on a specific server could have an accompanying blocklist in their tests (one option could be in the form of comment lines in the test.yml file).

Assemble a list of members

Please post to this issue if you would like to participate in the Intergalactic Workflow Commission. Everyone is welcome, we have big plans :)

Improve WF outputs parsing by workflowhub

@simleo Galaxy workflows use labels for workflow outputs. These are unique within a workflow and used to address the outputs in e.g. workflow tests.

            "workflow_outputs": [
                {
                    "label": "filtered_variants",
                    "output_name": "output",
                    "uuid": "d6db8dad-1774-4ec8-a858-26b17a287252"
                }

Currently, however, workflowhub reports Galaxy WF outputs like this:
Screenshot from 2022-02-18 09-37-18

i.e., it uses output_name both as ID and Name, while imo it would be more correct and more informative to use a WF output's label as its ID.

Workflow tutorials to write

  • Workflow Reports
  • workflow invocations
  • a good tutorial on using subworkflows and breaking up workflows to make them modular
  • all the tips & tricks of managing workflows (copying steps, updating tools in a WF, adding license/user metadata, what to beware of, how to resolve issues like converting a workflow to a collection type)
  • importing WFs from TRS/Dockstore/WFH

from galaxyproject/training-material#3134

iwc-workflows membership

After the merge of #74, we got individual workflow test histories on the LifeMonitor dashboard. Unfortunately, however, GitHub has a policy of automatically disabling periodic GitHub Actions workflows after 60 days since the last repository activity (see, for instance, sars-cov-2-pe-illumina-artic-variant-calling).

Can I join the iwc-workflows organization, with sufficient permissions to enable workflows? This would allow me to manually re-enable workflows as needed and would also pave the way for automation in the near future with the GitHub app we're developing (for that I'm going to also need permissions to install / configure GitHub apps, I guess).

CC: @kikkomep @ilveroluca

Periodic test for pe-wgs-ivar-analysis failing

This is for pangolin --update-data --datadir datadir && pangolin --threads ${GALAXY_SLOTS:-1} --datadir 'datadir' --outfile report.csv --max-ambig 0.5 --min-length 10000 '/tmp/tmp0w_tc3zq/files/000/dataset_34.dat' && csvtk csv2tab report.csv > '/tmp/tmp0w_tc3zq/files/000/dataset_35.dat':

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/http/client.py", line 556, in _get_chunk_left
    chunk_left = self._read_next_chunk_size()
  File "/usr/local/lib/python3.8/http/client.py", line 523, in _read_next_chunk_size
    return int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/http/client.py", line 573, in _readall_chunked
    chunk_left = self._get_chunk_left()
  File "/usr/local/lib/python3.8/http/client.py", line 558, in _get_chunk_left
    raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/pangolin", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/pangolin/command.py", line 172, in main
    update({'pangolearn': pangoLEARN.__version__,
  File "/usr/local/lib/python3.8/site-packages/pangolin/command.py", line 520, in update
    open(tarball_path, 'wb').write(request.urlopen(latest_release_tarball).read())
  File "/usr/local/lib/python3.8/http/client.py", line 466, in read
    return self._readall_chunked()
  File "/usr/local/lib/python3.8/http/client.py", line 580, in _readall_chunked
    raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(80709684 bytes read)

Do we need to update pangolin here @wm75 ?

Store workflow in repo

Would it be possible to have the ga file stored in this repo?
Have the CI expand and reformat the JSON so that each element is on its own line. It could then add a commit to the PR with the expanded JSON. This would allow git to better track changes to the workflow.

Extended to-do list

These are things we should keep an eye on. We've only selected a small subset for #29:

IWC goals 21.09

We've had a meeting today and we focused on these things we would like to accomplish:

  • Get the IWC started

    • Writing how to contribute (@mvdbeek) #41
    • Best practices: (@mvdbeek) (Based on / linked to planemo documentation) #41
      • Don’t include internal galaxy version
      • Require release tag (or something)
    • Deposit workflows
    • Deploy workflows to usegalaxy.* (and interested community servers) (@mvdbeek)
      • And/or let deployers specify list of workflows to monitor
      • Maybe integration with usegalaxy-tools (???) (Stretchgoal)
    • Community outreach, find contributors (everyone)
  • Enhance Galaxy for IWC

  • Continue improving command line execution with planemo:

  • Static Display of Workflows in Repo (Stretch Goal - @jmchilton )

    • Linking test data, pages for describing how to run.

We've decided on monthly meetings, every 3rd Monday of the Month. A calendar is here

dockstore uses readme.md only if annotation is not filled

Which is a bit of a problem if we want to have both filled and not have markdown in the galaxy workflow file itself.

We could try to render the description as markdown in Galaxy if it looks like markdown.

In the CI we could copy the readme contents into the annotation field ?

Workflow-specific CI runs

For clarity, in the discussion that follows, I will use "workflow" for a scientific workflow and "gh-workflow" for a GitHub Actions workflow.

The fact that the same gh-workflow is run to test all workflows is a problem for monitoring. Since all RO-Crates point to https://api.github.com/repos/galaxyproject/iwc/actions/workflows/workflow_test.yml, the LifeMonitor dashboard ends up showing the same health status and test build history for all workflows, which is not particularly informative. For instance, if the test for parallel-accession-download/main fails, the build will be marked as a failure for all workflows, even if all the others are doing fine.

For this reason, it would be useful to run a separate gh-workflow for each workflow. One solution would be to have all these separate gh-workflows in this repository. However, they would have to be added manually by the author of each new workflow. Also, as the number of workflows in the repo increases, its GitHub Actions history could become pretty cluttered. Adding them to each individual repository in iwc-workflows, however, should work. Here are some considerations on the individual gh-workflows:

  • They can be much simpler than the main one in iwc, since they only have to run the tests (no deployment, etc.).

  • How to author them? They can be generated before repo deployment in the main gh-workflow, much like RO-Crate metadata files are generated now.

  • When should they be run? Not at each commit: that would be useless, since updates to the individual repositories are only made after tests run successfully in the main gh-workflow. Instead, they should be run periodically (e.g., with a cron: '0 3 * * *') to check for failures due to changes in dependencies, external references, etc.

Workflow autoupdates

Parallel to galaxyproject/tools-iuc#3533, it would be great to implement automatic workflow updates.

I wrote a planemo PR which should achieve that here: galaxyproject/planemo#1151. Basically it does exactly the same thing as a user who 1) manually uploads a workflow to a Galaxy server with the latest versions of all tools installed, 2) presses the Upgrade All Workflow Steps button, and 3) redownloads the updated workflow.

Unfortunately the resulting diffs are currently a horrible mess. Here is an example: main...simonbray:example-autoupdate

Not sure if anyone has any ideas about how or whether this can be improved. My other idea was to implement a much simpler approach of just iterating over the workflow steps and checking the workflow ID against the toolshed(s) for any newer versions. This should give much cleaner diffs and should also be a lot faster (no need to start up a Galaxy server and install all the necessary tools). But running the autoupgrade through a Galaxy server should be more sophisticated in correcting for e.g. added or deleted tool parameters.

metadata in one file or two?

I see advantages to having either path here: Any thoughts from others?

  • new header to workflow data file
  • or a separate data file?

Should be be workflow agnostic or specific to a single engine for now?

criteria:

  • flexible
  • easy for scientists to specify, human readable
  • usable by workflow executors
    • conveys information to workflow runner
    • GUI can use this info for display

advantages of one file:

  • meta can't get detached from workflow
  • keeps things simple

advantages of 2 files:

  • no need to modify workflow specifications to accommodate metadata

New timeslot for IWC meetings

Hey everyone, we're looking for a new regular time slot for the IWC call. Unfortunately, the current slot collides with the VGP meetings and we'd also like to accommodate new members from the tools working group. I've created a poll here: https://www.when2meet.com/?17474061-88e14. If you're interested in joining the call please select the options that would suit you best! I think my preference would be to have the call once every 2 weeks, but let me know if you think that's too much.

meetup notes

  • Put videos and slides on IWC repo
  • Include overview of workflow in README
  • Static site

New workflows:

Some of the new workflows might need more processing power than we can get from github hosted runner.
(Though we might try and use the tool test data for the most expensive step ?)
Some ideas:

  • Submit to real servers ?
  • A pulsar endpoint ?

Would like to make it easier to find workflows from within Galaxy:

  • More central TRS
    • what to do about missing tools ?
      • Request a tool for missing tools, send to admin ?
      • Give link to install / ephemeris yaml … open issue see if someone jumps at it

For next roadmap.

  • Rewrite workflow extraction, extract with labels, preview, bugs

Publishing to IWC requires coding / git experience. Lower submission threshold:

  • Graphic interface for definition of test file.
  • Atomic renaming, to rename tests. Extract test from invocation.

More automation for usegalaxy.* import

If someone has time to do this it would be so cool...
In the log of the github action which install workflows on usegalaxy.* there are some important lines:

# usegalaxy.org
ERROR:root:Error importing #workflow/github.com/iwc-workflows/fragment-based-docking-scoring/main:v0.1.4 with message Imported, but some steps in this workflow have validation errors. 
ERROR:root:Error importing #workflow/github.com/iwc-workflows/fragment-based-docking-scoring/main:v0.1.2 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/fragment-based-docking-scoring/main:v0.1.1 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/hic-hicup-cooler/hic-fastq-to-cool-hicup-cooler:v0.3 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/hic-hicup-cooler/chic-fastq-to-cool-hicup-cooler:v0.3 with message Imported, but some steps in this workflow have validation errors. 
ERROR:root:Error importing #workflow/github.com/iwc-workflows/hic-hicup-cooler/hic-juicermediumtabix-to-cool-cooler:v0.3 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/pox-virus-amplicon/main:v0.1 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/openms-metaprosip/main:v0.1 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/sra-manifest-to-concatenated-fastqs/main:v0.2 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/sra-manifest-to-concatenated-fastqs/main:v0.1 with message Imported, but some steps in this workflow have validation errors.

## EU
ERROR:root:Error importing #workflow/github.com/iwc-workflows/parallel-accession-download/main:v0.1.5 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/fragment-based-docking-scoring/main:v0.1.4 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/fragment-based-docking-scoring/main:v0.1.2 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/fragment-based-docking-scoring/main:v0.1.1 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/hic-hicup-cooler/hic-fastq-to-cool-hicup-cooler:v0.3 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/hic-hicup-cooler/chic-fastq-to-cool-hicup-cooler:v0.3 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/hic-hicup-cooler/hic-juicermediumtabix-to-cool-cooler:v0.3 with message Imported, but some steps in this workflow have validation errors.
ERROR:root:Error importing #workflow/github.com/iwc-workflows/sra-manifest-to-concatenated-fastqs/main:v0.2 with message Imported, but some steps in this workflow have validation errors.

## AU TODO

Probably because:

  • Some tool versions are not present in the usegalaxy.*
  • Some tools are not present in the usegalaxy.*

What would be marvelous would be that a CI capture these lines, and write a PR to:

Template repo

We may make this repo a template repo, or create one.. Maybe later.

Weekly lint and tests are failing

Hi,
Here is a picture of the last weekly lint and tests:
image

Errored Tests

❌ ChIPseq_PE.ga_0

Execution Problem:

    [Errno 2] No such file or directory: '/tmp/tmp0m104th4/filtered BAM'

❌ gromacs-dctmd.ga_0

Execution Problem:

    Unexpected HTTP status code: 400: {"err_msg": "Workflow was not invoked; the following required tools are not installed: toolshed.g2.bx.psu.edu/repos/chemteam/gmx_solvate/gmx_solvate/2021.3+galaxy0 (version 2021.3+galaxy0)", "err_code": 0, "traceback": "Traceback (most recent call last):\n  File \"/tmp/tmpcs5y89ad/galaxy-dev/lib/galaxy/web/framework/decorators.py\", line 337, in decorator\n    rval = func(self, trans, *args, **kwargs)\n  File \"/tmp/tmpcs5y89ad/galaxy-dev/lib/galaxy/webapps/galaxy/api/workflows.py\", line 830, in invoke\n    raise exceptions.MessageException(missing_tools_message)\ngalaxy.exceptions.MessageException: Workflow was not invoked; the following required tools are not installed: toolshed.g2.bx.psu.edu/repos/chemteam/gmx_solvate/gmx_solvate/2021.3+galaxy0 (version 2021.3+galaxy0)\n"}

❌ gromacs-mmgbsa.ga_0

Execution Problem:

    Failed to find output [Complex topology] in invocation outputs [{}]

❌ pe-wgs-variation.ga_0

Execution Problem:

    Unexpected HTTP status code: 400: {"err_msg": "Workflow was not invoked; the following required tools are not installed: toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.3 (version 2.0.3)", "err_code": 0, "traceback": "Traceback (most recent call last):\n  File \"/tmp/tmpfbprjxrq/galaxy-dev/lib/galaxy/web/framework/decorators.py\", line 337, in decorator\n    rval = func(self, trans, *args, **kwargs)\n  File \"/tmp/tmpfbprjxrq/galaxy-dev/lib/galaxy/webapps/galaxy/api/workflows.py\", line 830, in invoke\n    raise exceptions.MessageException(missing_tools_message)\ngalaxy.exceptions.MessageException: Workflow was not invoked; the following required tools are not installed: toolshed.g2.bx.psu.edu/repos/devteam/samtools_stats/samtools_stats/2.0.3 (version 2.0.3)\n"}

❌ variation-reporting.ga_0

Execution Problem:

    Failed to find output [af_filter_threshold] in invocation outputs [{'all_variants_all_samples': {'id': 'a131cbc51f1fd9e9', 'src': 'hda', 'workflow_step_id': '4f6802b953fe5453'}, 'by_variant_report': {'id': '5a16b1422edd60d6', 'src': 'hda', 'workflow_step_id': 'b96161ff9e0be2b1'}, 'combined_variant_report': {'id': '030f0c55c98ae64e', 'src': 'hda', 'workflow_step_id': '8f470d64c61aa7d7'}, 'variant_frequency_plot': {'id': '270fb9fed54cf20c', 'src': 'hda', 'workflow_step_id': '6ced77daa85bcfc0'}}]

For more details, just download https://github.com/galaxyproject/iwc/suites/8800928810/artifacts/400205101

I checked the ChIPseq_PE which is failing. I rerun the CI on it (individually with a fake commit https://github.com/lldelisle/iwc/actions/runs/3281782971) and it passed...

Link README in dockstore?

I find it difficult to find info about workflows inputs/outputs from dockstore.
How could we link the README which is much more informative?
Should we include the link into the description?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.