Giter VIP home page Giter VIP logo

theroddywms / roddy Goto Github PK

View Code? Open in Web Editor NEW
8.0 8.0 3.0 308.61 MB

The Roddy workflow development and management system.

Home Page: http://roddy-documentation.readthedocs.io

License: MIT License

Java 27.27% Groovy 70.11% CSS 0.22% Shell 1.08% Python 1.32%
workflow-management bioinformatics hpc-applications workflow-engine scientific-workflows pipeline pbs lsf grid-engine workflow

roddy's People

Contributors

askask avatar dankwart-de avatar fkaercher avatar gwarsow avatar vinjana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

roddy's Issues

Malformed plugin version string may lead to silent loading wrong plugin version

When "IndelPlugin-1.0.176-5-1" is used as directory name for the plugin and "IndelPlugin-1.0.175-4" is present, the older plugin version is used and no error message or warning is issued.

Desired behaviour?

Fail fataly: Idea is to "keep your plugin directories clean, or you are on your own"

I think warnings are not sufficient here. A meaningful error message needs to be shown.

If the compressed tool file exists and the extracted folder exists without files, Roddy runs will fail.

    1,4M Jul  5 10:35 cTools_ACEseqWorkflow:1.2.8-1_copyNumberEstimationWorkflow_170705_103552779.zip
     36K Jun 26 21:34 cTools_AlignmentAndQCWorkflows:1.1.39-0_bisulfiteWorkflow_170626_213432499.zip
    5,9K Jun 26 21:34 cTools_AlignmentAndQCWorkflows:1.1.39-0_exomePipeline_170626_213432499.zip
     88K Jun 26 21:34 cTools_AlignmentAndQCWorkflows:1.1.39-0_qcPipeline_170626_213432499.zip
    1,8M Jun 26 21:34 cTools_AlignmentAndQCWorkflows:1.1.39-0_qcPipelineTools_170626_213432499.zip
    4,1K Jun 26 21:34 cTools_COWorkflows:1.1.59-0_devel_170626_213432499.zip
     39K Jun 26 21:34 cTools_COWorkflows:1.1.59-0_tools_170626_213432499.zip
     39K Jun  9 18:03 cTools_COWorkflows:1.1.76-1_tools_170705_103632264.zip
    2,8K Jun 26 21:34 cTools_DefaultPlugin:1.0.33-1_roddyNativeTools_170626_213432499.zip
    3,5K Jun 26 21:34 cTools_DefaultPlugin:1.0.33-1_roddyTools_170626_213432499.zip
    3,5K Jul  5 21:17 cTools_DefaultPlugin:1.0.33-1_roddyTools_170705_211750327.zip
       0 Jul  5 10:36 dir_cTools_ACEseqWorkflow:1.2.8-1_copyNumberEstimationWorkflow_170705_103552779.zip
      35 Jun 26 21:34 dir_cTools_AlignmentAndQCWorkflows:1.1.39-0_bisulfiteWorkflow_170626_213432499.zip
      31 Jun 26 21:34 dir_cTools_AlignmentAndQCWorkflows:1.1.39-0_exomePipeline_170626_213432499.zip

`listworkflows` does not mention the analysis identifiers

The ID field for analyses in the following exemplary output is empty:

        ID:       OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba
         Sub configurations:
                ID:       OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba.SoftwareBwa
                 Sub configurations:
                        ID:       OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba.SoftwareBwa.WGS
                         Available analyses:
                           0: ID=[useplugin=], Workflow=[killswitches=FilenameSection]


                ID:       OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba.BluebeeBwa
                 Sub configurations:
                        ID:       OTPTest-AQCWF-WGS-1-1-Roddy-2-3.Sambamba.BluebeeBwa.WGS
                         Available analyses:
                           0: ID=[useplugin=], Workflow=[killswitches=FilenameSection]

Support configuration option for location of SSH key

SSHJ uses {user.home}/.ssh/id_rsa and {user.home}/.ssh/id_dsa as default SSH keys.
It would be nice if it was possible to pass a different file name in the Roddy configuration, in case you have multiple keys for different machines.

This is possible with the method SSHClient.authPublickey(String username, String... locations).

Fix bad loader errors / messages for CommandFactory init and first plugin in build queue

Required JRE/JDK: 1.8
Required Groovy: 2.4
/data/michael/.roddy/runtimeDevel/jdk1.8.0_121/bin/java
/data/michael/.roddy/runtimeDevel/groovy-2.4.8/bin/groovy
Roddy version 2.3.160
Loading feature toggle file /data/michael/.roddy/featureToggles.ini
Need to find a way to properly get the job state for a completed job. Neither tracejob, nor qstat -f are a good way. qstat -f only works for 'active' jobs. Lists with long active lists are not default.
Set logfile location, parameter file and job state log file on job creation (or override a method).
Allow enabling and disabling of options for resource arbitration for defective job managers.
parseToJob() is not implemented and will return null.
Ignore directory /data/michael/Projekte/RoddyInfra/plugins_2.3/IndelCallingWorkflow_1.0.167/resources/analysisTools/.HiddenAndBad; It is not a valid tools directory.
Ignore directory /data/michael/Projekte/RoddyInfra/plugins_2.3/IndelCallingWorkflow_1.0.167/resources/analysisTools/.HiddenAndyMaybeBad; It is not a valid tools directory.
Ignore directory /data/michael/Projekte/Roddy/dist/plugins/PluginBase/resources/analysisTools/.keep; It is not a valid tools directory.
Ignore directory /data/michael/Projekte/RoddyInfra/Roddy/dist/plugins/PluginBase/resources/analysisTools/.keep; It is not a valid tools directory.
The plugin SophiaWorkflow:current could not be found, are the plugin paths properly set?
Could not build the plugin queue for: 
SophiaWorkflow
The plugin :current could not be found, are the plugin paths properly set?
Could not build the plugin queue for: 

Unrecoverable errors, could not load plugins for analysis sophia
Could not load analysis coWorkflowsTestProject@sophia

Print filtering criteria when doing `listworkflows` and `listdatasets`

Use case: A complex command with many options fails. Because many options are used (iodir, confdirs), maybe with absolute paths, the user edits the previous command line and forgets to remove the workflow identifier and/or PID. These are then used to filter output during listworkflows and listdatasets, but this is not explicitly told to the user.

Solution: Print filtering criteria during listworkflows and listdatasets.

OnScriptParameter filename patterns do not work

Some time ago I noticed that the onScriptParameter filename-patterns were broken in the AlignmentAndQCWorkflowsPlugin:

       <filename class="TextFile" onScriptParameter="annotateCovWindows:FILENAME_COV_WINDOWS_ANNO" 
                 pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorOutputDirectory}/${sample}_${pid}.readCoverage_${cvalue,name="WINDOW_SIZE",default="1"}kb_windows.anno.txt.gz'/> 
       <filename class="TextFile" onScriptParameter="annotateCovWindows:FILENAME_SEX" 
                 pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorOutputDirectory}/${sample}_${pid}_sex.txt'/> 

       <filename class="TextFile" onScriptParameter="mergeAndFilterCovWindows:FILENAME_COV_WINDOWS_WG" 
                 pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorOutputDirectory}/${sample}_${pid}_readCoverage_10kb_windows.filtered.txt.gz'/> 

       <filename class="TextFile" onScriptParameter="correctGc:FILENAME_GC_CORRECTED_WINDOWS" 
                 pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorOutputDirectory}/${sample}_${pid}_readCoverage_10kb_windows.filtered.corrected.txt.gz'/> 
       <filename class="TextFile" onScriptParameter="correctGc:FILENAME_QC_GC_CORRECTION_JSON" 
                 pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorPlotOutputDirectory}/${sample}_${pid}_qc_gc_corrected.json'/> 
       <filename class="TextFile" onScriptParameter="correctGc:FILENAME_GC_CORRECTED_QUALITY" 
                 pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorPlotOutputDirectory}/${sample}_${pid}_qc_gc_corrected.tsv'/> 
       <filename class="TextFile" onScriptParameter="correctGc:FILENAME_GC_CORRECT_PLOT" 
                 pattern='${outputAnalysisBaseDirectory}/qualitycontrol/merged/${gccorPlotOutputDirectory}/${sample}_${pid}_gc_corrected.png'/> 

Assert that input file objects via parameters in genericMethod have already evaluated filename(patters)

See the comment with the assert below.

private void assembleJobParameters() {
        // Assemble initial parameters
        parameters[PRM_WORKFLOW_ID] = context.analysis.configuration.getName()
        if (toolName) {
            parameters[PRM_TOOL_ID] = toolName;
            parameters[PRM_TOOLS_DIR] = configuration.getProcessingToolPath(context, toolName).getParent();
        }

        // Assemble additional parameters
        for (Object entry in additionalInput) {
            if (entry instanceof BaseFile)
                // assert(((BaseFile) entry).isEvaluated)
                allInputValues << (BaseFile) entry;
            else if (entry instanceof FileGroup) {
                //Take a group and store all files in that group.
                allInputValues << (FileGroup) entry;
            } else if (entry instanceof Map<String, String>) {
                (entry as Map<String, String>).forEach { String k, String v ->
                    parameters[k] = v
                }
            } else {               // Catch-all, in case one still wants to use a string with '=' to define a parameter (deprecated).
                String[] split = entry.toString().split("=");
                if (split.length != 2)
                    throw new RuntimeException("Not able to convert entry ${entry.toString()} to parameter.")
                parameters[split[0]] = split[1];
            }
        }
    }

Bad Plug in folders are not ignored

With e.g. a numberformatexception

Bad paths looking like regular Roddy plugin folder can cause problems:

e.b. ABC_1.0-BACKUP will cause problems

Unify and fix the :: splitting code

e.g.
RoddyCore/src/de/dkfz/roddy/client/cliclient/RoddyCLIClient.groovy

  •        int longestAnalysisID = 0
    
  •        int longestAnalysisSrcID = 0
    
  •        int longestPluginID = 0
    
  •        List<List<String>> splitted = pti.icc.listOfAnalyses.sort().collect {
    

the detailed analysis descriptor is joined by :: and the splitting code is at several positions. This is error prone and a custom value class for this could help.
Also look for a "disect" / "dissect" method

Rollback seems to be requested for test jobs?

I get the following output, if a rollback is initialized:

$ /basePath/RoddyProject_Roddy2.4/Roddy/roddy.sh rerun testCoBaseConfigs-Roddy-2-4.Picard.SoftwareBwa.WGS@alignment testpid --useconfig=/basePath//RoddyProject_Roddy2.4/testConfigs/applicationProperties-analysis-local-lsf.ini --usefeaturetoggleconfig=/basePath//RoddyProject_Roddy2.4/testConfigs/featureToggles.ini --configurationDirectories=/basePath//RoddyProject_Roddy2.4/configs,/basePath//RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/,/basePath//RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows//resources --usePluginVersion=AlignmentAndQCWorkflows:current --useiodir=/basePath/testData-small//view-by-pid,/basePath/tests/AlignmentAndQCWorkflows-OTPConfig-1.5-Roddy-2.4/testCoBaseConfigs-Roddy-2-4.Picard.SoftwareBwa.WGS --cvalues=INDEX_PREFIX:/icgc/dkfzlsdf/dmg/otp/production/processing/reference_genomes/bwa06_1KGRef/hs37d5.fa,CHROM_SIZES_FILE:/icgc/dkfzlsdf/dmg/otp/production/processing/reference_genomes/bwa06_1KGRef/stats/hs37d5.fa.chrLenOnlyACGT_realChromosomes.tab,usedResourcesSize:xs,runFingerprinting:true,outputAllowAccessRightsModification:false,runACEseqQc:true,cnv_min_coverage_ALN:1,mapping_quality_ALN:0,min_windows_ALN:1,runFastQC:true,RODDY_SCRATCH:/local/${USER}/${RODDY_JOBID},workflowEnvironmentScript:workflowEnvironment_tbiLsf --useRoddyVersion=current --additionalImports=wgs-standard-lsf-fpga-0.7.8

The plugin AlignmentAndQCWorkflows [ Version: current ] was loaded (/basePath/RoddyProject_Roddy2.4/plugins_R2.4/AlignmentAndQCWorkflows).
The plugin COWorkflowsBasePlugin [ Version: current ] was loaded (/basePath/RoddyProject_Roddy2.4/plugins_R2.4/COWorkflowsBasePlugin).
The plugin PluginBase [ Version: current ] was loaded (/basePath/RoddyProject_Roddy2.4/plugins_R2.4/PluginBase).
The plugin DefaultPlugin [ Version: current ] was loaded (/basePath/RoddyProject_Roddy2.4/plugins_R2.4/DefaultPlugin).
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/testCoBaseConfigs-roddy-2.4.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/testCoBaseConfigs-roddy-2.4.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/testCoBaseConfigs-roddy-2.4.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/testCoBaseConfigs-roddy-2.4.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/plugins_R2.4/DefaultPlugin/resources/configurationFiles/default.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/configs/coBaseProject.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/plugins_R2.4/DefaultPlugin/resources/configurationFiles/default.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/configs/cofilenames.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/configs/coApplicationsAndReferenceFiles.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/plugins_R2.4/AlignmentAndQCWorkflows/resources/configurationFiles/analysisQC.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/plugins_R2.4/COWorkflowsBasePlugin/resources/configurationFiles/commonCOWorkflowsSettings.xml
  Fully load configurationFile /basePath/RoddyProject_Roddy2.4/testConfigs/AlignmentAndQCWorkflows/resources/wgs-standard-lsf-fpga-0.7.8.xml
Found 3 datasets in the in- and output directories.
Found 1 samples for dataset testpid
Searching for lane files in directory /basePath/testData-small/view-by-pid/testpid/tumor/paired
Processed sample tumor and found 2 groups of lane files.
Tried to call a non generic tool via the generic call method
A workflow error occurred, try to rollback / abort submitted jobs.
bkill 0x88FBF8B0F98A8 0x88FBF8C21B1B5 0x88FBF91065209 0x88FBF91A4F17F 0x88FBF9290F166 0x88FBF9464C3D2 0x88FBFAEE44660 0x88FBFB0F56E55 0x88FBFB269C380 0x88FBFB4D6C618
An unknown / unhandled exception occurred: 'Not able to context tool coveragePlotSingle'
de.dkfz.roddy.knowledge.methods.GenericMethod._callGenericToolOrToolArray(GenericMethod.groovy:232)
de.dkfz.roddy.knowledge.methods.GenericMethod.callGenericTool(GenericMethod.groovy:50)
de.dkfz.b080.co.files.CoverageTextFile.plot(CoverageTextFile.java:49)
de.dkfz.b080.co.files.CoverageTextFileGroup.plot(CoverageTextFileGroup.java:47)
de.dkfz.b080.co.qcworkflow.QCPipeline.execute(QCPipeline.groovy:94)
de.dkfz.roddy.core.ExecutionContext.execute(ExecutionContext.groovy:625)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:397)
de.dkfz.roddy.core.Analysis.run(Analysis.java:206)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.rerun(RoddyCLIClient.groovy:493)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.parseStartupMode(RoddyCLIClient.groovy:114)
de.dkfz.roddy.Roddy.parseRoddyStartupModeAndRun(Roddy.java:688)
de.dkfz.roddy.Roddy.startup(Roddy.java:282)
de.dkfz.roddy.Roddy.main(Roddy.java:214)
Found 1 samples for dataset testpid
Processed sample tumor and found 2 groups of lane files.
Creating the following execution directory to store information about this process:
        /basePath/tests/AlignmentAndQCWorkflows-OTPConfig-1.5-Roddy-2.4/testCoBaseConfigs-Roddy-2-4.Picard.SoftwareBwa.WGS/testpid/roddyExecutionStore/exec_171019_121111722_kensche_alignment
Tried to call a non generic tool via the generic call method
A workflow error occurred, try to rollback / abort submitted jobs.
bkill 0x88FC0435A70E4 0x88FC045E1E94B 0x88FC0523BB7F4 0x88FC055016FF0 0x88FC057534853 0x88FC066054A97 8276 8277 8278 8279
An unknown / unhandled exception occurred: 'Not able to context tool coveragePlotSingle'
de.dkfz.roddy.knowledge.methods.GenericMethod._callGenericToolOrToolArray(GenericMethod.groovy:232)
de.dkfz.roddy.knowledge.methods.GenericMethod.callGenericTool(GenericMethod.groovy:50)
de.dkfz.b080.co.files.CoverageTextFile.plot(CoverageTextFile.java:49)
de.dkfz.b080.co.files.CoverageTextFileGroup.plot(CoverageTextFileGroup.java:47)
de.dkfz.b080.co.qcworkflow.QCPipeline.execute(QCPipeline.groovy:94)
de.dkfz.roddy.core.ExecutionContext.execute(ExecutionContext.groovy:625)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:397)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:341)
de.dkfz.roddy.core.Analysis.rerun(Analysis.java:229)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.rerun(RoddyCLIClient.groovy:494)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.parseStartupMode(RoddyCLIClient.groovy:114)
de.dkfz.roddy.Roddy.parseRoddyStartupModeAndRun(Roddy.java:688)
de.dkfz.roddy.Roddy.startup(Roddy.java:282)
de.dkfz.roddy.Roddy.main(Roddy.java:214)

There were errors for the execution context for dataset testpid
        * An uncaught error occurred during a run. SEVERE: An uncaught error occurred during a run.

bkill is apparently called on jobs that don't have a valid identifier.

Unify and simplify job parameter conversion code

Currently, the code is present at several positions:

  • GenericMethod
  • BashConverter
  • Job
  • Configuration values

Suggestion:

  • Bash specific code into Bash Converter
  • One simplified class for ParameterConversion, which can also be used, when Jobs are constructed manually by the developer omitting the GenericMethod class.

Task overview for 4.0

  • Clean up Roddy Jobs (relating to BatchEuphoria)
  • Clean up @deprecated in ToolLib
  • Consider all TODOs in the code and make issues or drop them
  • Rework the Roddy instance model
  • Rework ExecutionService and FileSystemAccessProvider

Compatibility by revision does not work for dependent plugins

When a dependent plugin is checked but the dependent plugin is not a "-0" version, it is not loaded.

After the requested plugin was processed the loop in de.dkfz.roddy.plugins.LibrariesFactory#buildupPluginQueue tries the dependent plugin. If the dependent plugin does not have revision number 0, the following code will skip the plugin and eventually remove it from the chain.

            if (!mapOfPlugins.checkExistence(id as String, version as String)) {
                if (id) { // Skip empty entries and reduce one message.
                    mapOfErrorsForPluginEntries.get(id, []) << ("The plugin ${id}:${version} could not be found, are the plugin paths properly set?").toString();
                }
            }
            pluginsToCheck.remove(0);

It is possible that then only the default and base plugins remain, because they are separatly added. The error manifests itself in errors about missing dependencies, because the dependent plugin is not loaded.

For a fix I suggest the usage of the Version and VersionIO classes in RoddyToolLibs. Version has a compatibleTo(Version other, Collection<tuple2<Version,Version>> compatibleRanges) method that could take a list of compatible ranges for the plugin and checks whether this and the other Version are within any of the compatibleRangest, taking into account the revision numbers, that are always compatible (if used correctly).

Pass cvalues set / used at workflow execution to jobs via parameter file?

Might be an interesting question, as we already had several cases, where the developer assumed that values are available when they added them to the config within the execute() method. We could diff the current config / workflow config to the static one and put differences to the parameter file.

Introduce a default feature toggle file.

Defaults will stay in the Java code, but it might be nice to have a file in the application directory, which could then be used by default in multi user environments.

Lots of output in `listworkflows`

Apparently, the output is generated from a catch-clause in de.dkfz.roddy.config.loader.ConfigurationFactory#readFilenamePatterns, line 474.

An exception is thrown from readOnMethodFileNamePattern out of the GroovyClassLoader.
de.dkfz.roddy.config.loader.ConfigurationFactory#readOnMethodFilenamePattern line 555
calledClass = (Class<FileObject>) LibrariesFactory.getInstance().loadClass(calledClassName)

Investigate: Allow setting fast-track processing

In PBS this was implemented via accountName and -A option (might be specific for our cluster though).

  • How is this implemented in LSF? Via queue(s)?
  • How can we implement this? ResourceSetSize?
  • Note that queue and priority are independent concepts. If there are convey and convey_fast queues as in the PBS cluster, this may mix concepts.

Rename pid/PID into dataset in Roddy

Also consider that these identifiers are used as parameters to jobs. Maybe there should always be a dataset and DATASET identifier, but COWorkflowsBasePlugin copies their values into pid/PID.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.