Hello,
First of all - thank you, @mr-c, for the response to my previous problem!
I've been further trying to run emg workflows, specifically emg-pipeline-v4-paired.cwl, on my machine. I have run into several more issues, most of which seem too trivial to formulate as formal issues. If there is a preferred channel of communication, please, let me know.
Some of the problems might reveal my unfamiliarity with CWL, so I apologise, and I'm working on it! Seems like it's really quite an elegant way to deal with patchworks of wildly different tools, commonly known as pipelines.
Here's what I had problems with so far:
ISSUE:
workflows/emg-pipeline-v4-paired-job.yaml specifies Rfam libraries contained within directories: "other" (e.g. .../CWL/data/libraries/Rfam/other/Archaea_SRP.cm), "ribosomal" (e.g. .../CWL/data/libraries/Rfam/ribosomal/RF02542.cm).
I downloaded the Rfam database from ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.tar.gz, and it does not contain a corresponding directory structure.
SOLUTION:
I have not found any so far.
ISSUE:
MGRAST_base.py script used in tools/qc-stats.cwl is missing, can't find it online.
ISSUE: At step trim_quality_control when running workflows/emg-qc-paired.cwl:
[job trim_quality_control] /tmp/tmp688nyarp$ /bin/sh \
-c \
'java' 'org.usadellab.trimmomatic.Trimmomatic' 'PE' '-trimlog' 'trim.log' '-threads' '8' '-phred33' '/tmp/tmpvkjt1pef/stg827d3176-4f1b-4179-8404-4b46397fff43/merged_with_unmerged_reads' 'merged_with_unmerged_reads.trimmed.fastq' 'LEADING:3' 'TRAILING:3' 'SLIDINGWINDOW:4:15' 'MINLEN:100'
Error: Could not find or load main class org.usadellab.trimmomatic.Trimmomatic
EXPLANATION:
On my computer / a different version of Trimmomatic installs as a bash executable, which then calls java.
SOLUTION:
change:
baseCommand [ java, org.usadellab.trimmomatic.Trimmomatic ]
to:
baseCommand [ trimmomatic ]
ISSUE: Is Trimmomatic output log file saved in a directory when it can be found?
Error collecting output for parameter 'output_log':
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3: Traceback (most recent call last):
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3:
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3: File "/P/cwl/venv/lib/python3.6/site-packages/cwltool/command_line_tool.py", line 707, in collect_output
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3: raise WorkflowException("Did not find output file with glob pattern: '{}'".format(globpatterns))
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3:
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3: cwltool.errors.WorkflowException: Did not find output file with glob pattern: '['trim.log']'
SOLUTION:
For now, I just commented out the output_log section of outputs in trimmomatic.cwl (lines 221-233). Guess that might break something, I'm sure there's a better solution.