theroddywms / roddy Goto Github PK
View Code? Open in Web Editor NEWThe Roddy workflow development and management system.
Home Page: http://roddy-documentation.readthedocs.io
License: MIT License
The Roddy workflow development and management system.
Home Page: http://roddy-documentation.readthedocs.io
License: MIT License
When "IndelPlugin-1.0.176-5-1" is used as directory name for the plugin and "IndelPlugin-1.0.175-4" is present, the older plugin version is used and no error message or warning is issued.
Desired behaviour?
Fail fataly: Idea is to "keep your plugin directories clean, or you are on your own"
I think warnings are not sufficient here. A meaningful error message needs to be shown.
Like e.g.
cvalue name="A_BINARY" value="$A_BINARY" ?
1,4M Jul 5 10:35 cTools_ACEseqWorkflow:1.2.8-1_copyNumberEstimationWorkflow_170705_103552779.zip
36K Jun 26 21:34 cTools_AlignmentAndQCWorkflows:1.1.39-0_bisulfiteWorkflow_170626_213432499.zip
5,9K Jun 26 21:34 cTools_AlignmentAndQCWorkflows:1.1.39-0_exomePipeline_170626_213432499.zip
88K Jun 26 21:34 cTools_AlignmentAndQCWorkflows:1.1.39-0_qcPipeline_170626_213432499.zip
1,8M Jun 26 21:34 cTools_AlignmentAndQCWorkflows:1.1.39-0_qcPipelineTools_170626_213432499.zip
4,1K Jun 26 21:34 cTools_COWorkflows:1.1.59-0_devel_170626_213432499.zip
39K Jun 26 21:34 cTools_COWorkflows:1.1.59-0_tools_170626_213432499.zip
39K Jun 9 18:03 cTools_COWorkflows:1.1.76-1_tools_170705_103632264.zip
2,8K Jun 26 21:34 cTools_DefaultPlugin:1.0.33-1_roddyNativeTools_170626_213432499.zip
3,5K Jun 26 21:34 cTools_DefaultPlugin:1.0.33-1_roddyTools_170626_213432499.zip
3,5K Jul 5 21:17 cTools_DefaultPlugin:1.0.33-1_roddyTools_170705_211750327.zip
0 Jul 5 10:36 dir_cTools_ACEseqWorkflow:1.2.8-1_copyNumberEstimationWorkflow_170705_103552779.zip
35 Jun 26 21:34 dir_cTools_AlignmentAndQCWorkflows:1.1.39-0_bisulfiteWorkflow_170626_213432499.zip
31 Jun 26 21:34 dir_cTools_AlignmentAndQCWorkflows:1.1.39-0_exomePipeline_170626_213432499.zip
The ID field for analyses in the following exemplary output is empty:
ID: OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba
Sub configurations:
ID: OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba.SoftwareBwa
Sub configurations:
ID: OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba.SoftwareBwa.WGS
Available analyses:
0: ID=[useplugin=], Workflow=[killswitches=FilenameSection]
ID: OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba.BluebeeBwa
Sub configurations:
ID: OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba.BluebeeBwa.WGS
Available analyses:
0: ID=[useplugin=], Workflow=[killswitches=FilenameSection]
SSHJ uses {user.home}/.ssh/id_rsa
and {user.home}/.ssh/id_dsa
as default SSH keys.
It would be nice if it was possible to pass a different file name in the Roddy configuration, in case you have multiple keys for different machines.
This is possible with the method SSHClient.authPublickey(String username, String... locations)
.
Required JRE/JDK: 1.8
Required Groovy: 2.4
/data/michael/.roddy/runtimeDevel/jdk1.8.0_121/bin/java
/data/michael/.roddy/runtimeDevel/groovy-2.4.8/bin/groovy
Roddy version 2.3.160
Loading feature toggle file /data/michael/.roddy/featureToggles.ini
Need to find a way to properly get the job state for a completed job. Neither tracejob, nor qstat -f are a good way. qstat -f only works for 'active' jobs. Lists with long active lists are not default.
Set logfile location, parameter file and job state log file on job creation (or override a method).
Allow enabling and disabling of options for resource arbitration for defective job managers.
parseToJob() is not implemented and will return null.
Ignore directory /data/michael/Projekte/RoddyInfra/plugins_2.3/IndelCallingWorkflow_1.0.167/resources/analysisTools/.HiddenAndBad; It is not a valid tools directory.
Ignore directory /data/michael/Projekte/RoddyInfra/plugins_2.3/IndelCallingWorkflow_1.0.167/resources/analysisTools/.HiddenAndyMaybeBad; It is not a valid tools directory.
Ignore directory /data/michael/Projekte/Roddy/dist/plugins/PluginBase/resources/analysisTools/.keep; It is not a valid tools directory.
Ignore directory /data/michael/Projekte/RoddyInfra/Roddy/dist/plugins/PluginBase/resources/analysisTools/.keep; It is not a valid tools directory.
The plugin SophiaWorkflow:current could not be found, are the plugin paths properly set?
Could not build the plugin queue for:
SophiaWorkflow
The plugin :current could not be found, are the plugin paths properly set?
Could not build the plugin queue for:
Unrecoverable errors, could not load plugins for analysis sophia
Could not load analysis coWorkflowsTestProject@sophia
Prevent usage of wrong configuration files
Use case: A complex command with many options fails. Because many options are used (iodir, confdirs), maybe with absolute paths, the user edits the previous command line and forgets to remove the workflow identifier and/or PID. These are then used to filter output during listworkflows
and listdatasets
, but this is not explicitly told to the user.
Solution: Print filtering criteria during listworkflows
and listdatasets
.
Some time ago I noticed that the onScriptParameter filename-patterns were broken in the AlignmentAndQCWorkflowsPlugin:
<filename class="TextFile" onScriptParameter="annotateCovWindows:FILENAME_COV_WINDOWS_ANNO"
pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorOutputDirectory}/${sample}_${pid}.readCoverage_${cvalue,name="WINDOW_SIZE",default="1"}kb_windows.anno.txt.gz'/>
<filename class="TextFile" onScriptParameter="annotateCovWindows:FILENAME_SEX"
pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorOutputDirectory}/${sample}_${pid}_sex.txt'/>
<filename class="TextFile" onScriptParameter="mergeAndFilterCovWindows:FILENAME_COV_WINDOWS_WG"
pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorOutputDirectory}/${sample}_${pid}_readCoverage_10kb_windows.filtered.txt.gz'/>
<filename class="TextFile" onScriptParameter="correctGc:FILENAME_GC_CORRECTED_WINDOWS"
pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorOutputDirectory}/${sample}_${pid}_readCoverage_10kb_windows.filtered.corrected.txt.gz'/>
<filename class="TextFile" onScriptParameter="correctGc:FILENAME_QC_GC_CORRECTION_JSON"
pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorPlotOutputDirectory}/${sample}_${pid}_qc_gc_corrected.json'/>
<filename class="TextFile" onScriptParameter="correctGc:FILENAME_GC_CORRECTED_QUALITY"
pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorPlotOutputDirectory}/${sample}_${pid}_qc_gc_corrected.tsv'/>
<filename class="TextFile" onScriptParameter="correctGc:FILENAME_GC_CORRECT_PLOT"
pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorPlotOutputDirectory}/${sample}_${pid}_gc_corrected.png'/>
See the comment with the assert below.
private void assembleJobParameters() {
// Assemble initial parameters
parameters[PRM_WORKFLOW_ID] = context.analysis.configuration.getName()
if (toolName) {
parameters[PRM_TOOL_ID] = toolName;
parameters[PRM_TOOLS_DIR] = configuration.getProcessingToolPath(context, toolName).getParent();
}
// Assemble additional parameters
for (Object entry in additionalInput) {
if (entry instanceof BaseFile)
// assert(((BaseFile) entry).isEvaluated)
allInputValues << (BaseFile) entry;
else if (entry instanceof FileGroup) {
//Take a group and store all files in that group.
allInputValues << (FileGroup) entry;
} else if (entry instanceof Map<String, String>) {
(entry as Map<String, String>).forEach { String k, String v ->
parameters[k] = v
}
} else { // Catch-all, in case one still wants to use a string with '=' to define a parameter (deprecated).
String[] split = entry.toString().split("=");
if (split.length != 2)
throw new RuntimeException("Not able to convert entry ${entry.toString()} to parameter.")
parameters[split[0]] = split[1];
}
}
}
There were questions and remarks in #93, which need to be discussed.
With e.g. a numberformatexception
Bad paths looking like regular Roddy plugin folder can cause problems:
e.b. ABC_1.0-BACKUP will cause problems
It would be nice to know which tasks still need to be done, before the LSF cluster is fully supported.
See TheRoddyWMS/BatchEuphoria#46 for how this is in LSF.
e.g.
RoddyCore/src/de/dkfz/roddy/client/cliclient/RoddyCLIClient.groovy
int longestAnalysisID = 0
int longestAnalysisSrcID = 0
int longestPluginID = 0
List<List<String>> splitted = pti.icc.listOfAnalyses.sort().collect {
the detailed analysis descriptor is joined by :: and the splitting code is at several positions. This is error prone and a custom value class for this could help.
Also look for a "disect" / "dissect" method
I get the following output, if a rollback is initialized:
$ /basePath/RoddyProject_Roddy2.4/Roddy/roddy.sh rerun testCoBaseConfigs-Roddy-2-4.Picard.SoftwareBwa.WGS@alignment testpid --useconfig=/basePath//RoddyProject_Roddy2.4/testConfigs/applicationProperties-analysis-local-lsf.ini --usefeaturetoggleconfig=/basePath//RoddyProject_Roddy2.4/testConfigs/featureToggles.ini --configurationDirectories=/basePath//RoddyProject_Roddy2.4/configs,/basePath//RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/,/basePath//RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows//resources --usePluginVersion=AlignmentAndQCWorkflows:current --useiodir=/basePath/testData-small//view-by-pid,/basePath/tests/AlignmentAndQCWorkflows-OTPConfig-1.5-Roddy-2.4/testCoBaseConfigs-Roddy-2-4.Picard.SoftwareBwa.WGS --cvalues=INDEX_PREFIX:/icgc/dkfzlsdf/dmg/otp/production/processing/reference_genomes/bwa06_1KGRef/hs37d5.fa,CHROM_SIZES_FILE:/icgc/dkfzlsdf/dmg/otp/production/processing/reference_genomes/bwa06_1KGRef/stats/hs37d5.fa.chrLenOnlyACGT_realChromosomes.tab,usedResourcesSize:xs,runFingerprinting:true,outputAllowAccessRightsModification:false,runACEseqQc:true,cnv_min_coverage_ALN:1,mapping_quality_ALN:0,min_windows_ALN:1,runFastQC:true,RODDY_SCRATCH:/local/${USER}/${RODDY_JOBID},workflowEnvironmentScript:workflowEnvironment_tbiLsf --useRoddyVersion=current --additionalImports=wgs-standard-lsf-fpga-0.7.8
The plugin AlignmentAndQCWorkflows [ Version: current ] was loaded (/basePath/RoddyProject_Roddy2.4/plugins_R2.4/AlignmentAndQCWorkflows).
The plugin COWorkflowsBasePlugin [ Version: current ] was loaded (/basePath/RoddyProject_Roddy2.4/plugins_R2.4/COWorkflowsBasePlugin).
The plugin PluginBase [ Version: current ] was loaded (/basePath/RoddyProject_Roddy2.4/plugins_R2.4/PluginBase).
The plugin DefaultPlugin [ Version: current ] was loaded (/basePath/RoddyProject_Roddy2.4/plugins_R2.4/DefaultPlugin).
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/testCoBaseConfigs-roddy-2.4.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/testCoBaseConfigs-roddy-2.4.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/testCoBaseConfigs-roddy-2.4.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/testCoBaseConfigs-roddy-2.4.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/plugins_R2.4/DefaultPlugin/resources/configurationFiles/default.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/configs/coBaseProject.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/plugins_R2.4/DefaultPlugin/resources/configurationFiles/default.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/configs/cofilenames.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/configs/coApplicationsAndReferenceFiles.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/plugins_R2.4/AlignmentAndQCWorkflows/resources/configurationFiles/analysisQC.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/plugins_R2.4/COWorkflowsBasePlugin/resources/configurationFiles/commonCOWorkflowsSettings.xml
Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/resources/wgs-standard-lsf-fpga-0.7.8.xml
Found 3 datasets in the in- and output directories.
Found 1 samples for dataset testpid
Searching for lane files in directory /basePath/testData-small/view-by-pid/testpid/tumor/paired
Processed sample tumor and found 2 groups of lane files.
Tried to call a non generic tool via the generic call method
A workflow error occurred, try to rollback / abort submitted jobs.
bkill 0x88FBF8B0F98A8 0x88FBF8C21B1B5 0x88FBF91065209 0x88FBF91A4F17F 0x88FBF9290F166 0x88FBF9464C3D2 0x88FBFAEE44660 0x88FBFB0F56E55 0x88FBFB269C380 0x88FBFB4D6C618
An unknown / unhandled exception occurred: 'Not able to context tool coveragePlotSingle'
de.dkfz.roddy.knowledge.methods.GenericMethod._callGenericToolOrToolArray(GenericMethod.groovy:232)
de.dkfz.roddy.knowledge.methods.GenericMethod.callGenericTool(GenericMethod.groovy:50)
de.dkfz.b080.co.files.CoverageTextFile.plot(CoverageTextFile.java:49)
de.dkfz.b080.co.files.CoverageTextFileGroup.plot(CoverageTextFileGroup.java:47)
de.dkfz.b080.co.qcworkflow.QCPipeline.execute(QCPipeline.groovy:94)
de.dkfz.roddy.core.ExecutionContext.execute(ExecutionContext.groovy:625)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:397)
de.dkfz.roddy.core.Analysis.run(Analysis.java:206)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.rerun(RoddyCLIClient.groovy:493)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.parseStartupMode(RoddyCLIClient.groovy:114)
de.dkfz.roddy.Roddy.parseRoddyStartupModeAndRun(Roddy.java:688)
de.dkfz.roddy.Roddy.startup(Roddy.java:282)
de.dkfz.roddy.Roddy.main(Roddy.java:214)
Found 1 samples for dataset testpid
Processed sample tumor and found 2 groups of lane files.
Creating the following execution directory to store information about this process:
/basePath/tests/AlignmentAndQCWorkflows-OTPConfig-1.5-Roddy-2.4/testCoBaseConfigs-Roddy-2-4.Picard.SoftwareBwa.WGS/testpid/roddyExecutionStore/exec_171019_121111722_kensche_alignment
Tried to call a non generic tool via the generic call method
A workflow error occurred, try to rollback / abort submitted jobs.
bkill 0x88FC0435A70E4 0x88FC045E1E94B 0x88FC0523BB7F4 0x88FC055016FF0 0x88FC057534853 0x88FC066054A97 8276 8277 8278 8279
An unknown / unhandled exception occurred: 'Not able to context tool coveragePlotSingle'
de.dkfz.roddy.knowledge.methods.GenericMethod._callGenericToolOrToolArray(GenericMethod.groovy:232)
de.dkfz.roddy.knowledge.methods.GenericMethod.callGenericTool(GenericMethod.groovy:50)
de.dkfz.b080.co.files.CoverageTextFile.plot(CoverageTextFile.java:49)
de.dkfz.b080.co.files.CoverageTextFileGroup.plot(CoverageTextFileGroup.java:47)
de.dkfz.b080.co.qcworkflow.QCPipeline.execute(QCPipeline.groovy:94)
de.dkfz.roddy.core.ExecutionContext.execute(ExecutionContext.groovy:625)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:397)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:341)
de.dkfz.roddy.core.Analysis.rerun(Analysis.java:229)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.rerun(RoddyCLIClient.groovy:494)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.parseStartupMode(RoddyCLIClient.groovy:114)
de.dkfz.roddy.Roddy.parseRoddyStartupModeAndRun(Roddy.java:688)
de.dkfz.roddy.Roddy.startup(Roddy.java:282)
de.dkfz.roddy.Roddy.main(Roddy.java:214)
There were errors for the execution context for dataset testpid
* An uncaught error occurred during a run. SEVERE: An uncaught error occurred during a run.
bkill is apparently called on jobs that don't have a valid identifier.
Currently, the code is present at several positions:
Suggestion:
and without their stack trace.
This is a bug in the GenericMethod class
if (split.length != 2)
throw new RuntimeException("Not able to convert entry ${entry.toString()} to parameter.")
In de.dkfz.roddy.knowledge.methods.GenericMethod#_callGenericToolOrToolArray the non/basefile parameters are not evaluated.
E.g. in the ACEseq WF
"CHR_NR" -> "${PARM_CHR_INDEX}"
So to say:
Implement --additionalImports
to add configuration names to add to the project's configuration via the command line. Required to implement a solution for #100.
E.g a configuration Id is available in several directories and the wrong configuration could be used. Prevent this by exitting if several cfgs with the same Id would override each other.
When a dependent plugin is checked but the dependent plugin is not a "-0" version, it is not loaded.
After the requested plugin was processed the loop in de.dkfz.roddy.plugins.LibrariesFactory#buildupPluginQueue
tries the dependent plugin. If the dependent plugin does not have revision number 0, the following code will skip the plugin and eventually remove it from the chain.
if (!mapOfPlugins.checkExistence(id as String, version as String)) {
if (id) { // Skip empty entries and reduce one message.
mapOfErrorsForPluginEntries.get(id, []) << ("The plugin ${id}:${version} could not be found, are the plugin paths properly set?").toString();
}
}
pluginsToCheck.remove(0);
It is possible that then only the default and base plugins remain, because they are separatly added. The error manifests itself in errors about missing dependencies, because the dependent plugin is not loaded.
For a fix I suggest the usage of the Version and VersionIO classes in RoddyToolLibs. Version has a compatibleTo(Version other, Collection<tuple2<Version,Version>> compatibleRanges) method that could take a list of compatible ranges for the plugin and checks whether this and the other Version are within any of the compatibleRangest, taking into account the revision numbers, that are always compatible (if used correctly).
The two tests de.dkfz.roddy.config.FilenamePattern#fillConfigurationVariables
and de.dkfz.roddy.config.FilenamePattern#fillConfigurationVariables
result in an infinite loop in function de.dkfz.roddy.config.FilenamePattern#fillConfigurationVariables
.
Might be an interesting question, as we already had several cases, where the developer assumed that values are available when they added them to the config within the execute() method. We could diff the current config / workflow config to the static one and put differences to the parameter file.
Currently, Roddy does not support SSH-agent. According to hierynomus/sshj#33 this feature is long been implemented already by a combination of SSHJ with jsch-agent-proxy.
List<String> valueBlacklist = [ConfigurationConstants.CVALUE_PLACEHOLDER_RODDY_JOBID_RAW, 'PWD', Constants.PID_CAP, Constants.PID
The current implementation does not cover it! It always tries to use GroovyServ with the socket port of 1961.
Defaults will stay in the Java code, but it might be nice to have a file in the application directory, which could then be used by default in multi user environments.
Apparently, the output is generated from a catch-clause in de.dkfz.roddy.config.loader.ConfigurationFactory#readFilenamePatterns
, line 474.
An exception is thrown from readOnMethodFileNamePattern
out of the GroovyClassLoader
.
de.dkfz.roddy.config.loader.ConfigurationFactory#readOnMethodFilenamePattern
line 555
calledClass = (Class<FileObject>) LibrariesFactory.getInstance().loadClass(calledClassName)
See #69 for comments.
e.g. #82
That might make it much easier to load source files!
In PBS this was implemented via accountName and -A option (might be specific for our cluster though).
All the logging output that should be produced with set -x
, up to the line RODDY_SCRATCH is set to ...
is missing from the STDERR log files.
This happens e.g. if the storage is full.
Also consider that these identifiers are used as parameters to jobs. Maybe there should always be a dataset
and DATASET
identifier, but COWorkflowsBasePlugin copies their values into pid
/PID
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.