crbs / chmutil Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 1.0 513 KB

Utilities to run Cascaded Hierarchical Model (CHM) jobs on cluster of computers

License: Other

Makefile 0.65% Python 99.35%

chmutil's People

Contributors

Watchers

Forkers

shuail

chmutil's Issues

createchmjob.py outputs pillow image size warning on large images

when using large images > 10k x 10k pillow outputs warning message about a possible DOS attack. This page says there is a way to silence this error:

zimeon/iiif#11

Modify createchmjob.py to allow caller to restrict what parts of images are segmented

It would be nice to be able to pass a mask image file or a stack of them to createchmjob.py with say a --mask flag whose value would be a directory or a single image file. The image file would tell what parts of the image should be segmented.

Modify createchmjob.py to run pychm jobs

Modify createchmjob.py to also run pychm jobs.

Modify checkchmjob.py to update contents of runjobs.gordon with task array range

It appears Gordon uses a custom job filter for commands passed to qsub and this filter does NOT allow the -t #-# flag to be set as a command line argument. The -t #-# denotes the job is an array job and the #-# denote the starting and ending job ids. To work around this the -t needs to be put into the runjobs.gordon script file by checkchmjob.py --submit and updated by that command if rerun.

Example line:
#PBS -t 1-3432

This means that the user just needs to run qsub runjobs.gordon to submit jobs.

add --rawthreshold to createprobmapoverlay.py

Add new argument --rawthreshold that lets caller specify pixel intensity as a threshold cut off. This is needed to be able to overlay training data (which has intensity of 1).

add virtual_free flag to submit file for createchmtrainjob.py rocce submit file

Modify SGEScheduler in image.py to add virtual_free=XXG line to requirements. This is needed cause h_vmem alone does not seem to be putting jobs on machines with enough memory.

Add more information when --detailed is passed to checkchmjob.py

Output information about input dataset:

          Number input images: 1,234 (145 Gigabytes)
          Dimensions of images: 24,000 x 32,000

Add --detailed option to checkchmjob.py

Add a --detailed option which provides this additional information in output:

          CHM tasks: 4% complete (960 of 23,456 completed)
          CHM task runtime: 3.2 hours per task (6gb ram per job)
          CHM task CPU consumption so far: 3,069 CPU hours (~0.35 CPU years)
          CHM task estimated remaining compute: 75,059 CPU hours (~8.69 CPU years)
          

          Merge tasks: 0% complete (0 of 1,234 completed)
          Merge task runtime: NA
          Merge task CPU consumption so far: NA
          Merge task estimated remaining compute: NA

create tool to generate jobs to create probability map overlay images

This tool should be named createprobmapoverlayjob.py and it should be called like so:

createprobmapoverlayjob.py (options of probability map overlay creation)

The above script should take a completed CHM job and use information in that job to generate a new set of tasks to run on a cluster. These tasks (1 per probability map) should take the probability map, filter based on options set by caller and then overlay that probability map onto the original raw image with color and opacity defined by the user.

If a job fails look into altering job submission to exclude that node

On Gordon cluster there are a few nodes that do not have singularity module. This causes the job to fail very quickly which is fine. The problem is this node becomes available again and proceeds to eat other jobs. To remedy this it would be nice if chmutil could catch this failure and add this node to the exclude list of the job so subsequent jobs do NOT get assigned to this node:

Command run in runjobs.gordon:
module load singularity/2.1.2

Error message (not sure if module load returns an exit code or not)
ModuleCmd_Load.c(204):ERROR:105: Unable to locate a modulefile for
'singularity/2.1.2'

List of nodes that had problems on a run over 1-14-2017:

gcn-13-77
gcn-4-28
gcn-7-75
gcn-8-22
gcn-8-74

Look into adding LocalCluster to run jobs on local computer

Perhaps this class should generate the following script named runjobs.local:

#!/bin/sh

if [ $# -ne 2 ] ; then
echo "$0 "
echo ""
echo "Runs sequence of CHM tasks"
echo ""
echo "Ex: $0 1 50"
exit 1
fi

start=$1
end=$2

for Y in seq $start $end ; do
outfile="/fakechmjob/gordon2/chmrun/stdout/${Y}.out"
echo "HOST: $HOSTNAME" > $outfile
echo "DATE: date" >> $outfile
echo "TASKID: $Y" >> $outfile
/usr/bin/time -p /usr/bin/chmrunner.py $Y /fakechmjob/gordon2 --scratchdir /fakechmjob/gordon2/chmrun/tmp --log DEBUG >> $outfile 2>&1

exitcode=$?
echo "chmrunner.py exited with code: $exitcode" >> $outfile
done

Look into automatically determining default project for comet submissions

Should look into automatically setting the project for comet jobs.

Add --gentiles to createchmimage.py

To make it easier to create tiles for probability map viewer add --gentiles to createchmimage.py which will tile image in format needed by probability map viewer

modify createchmjob.py to have the 1st couple tasks only analyze 1 tile

createchmjob.py should create the first couple tasks with only 1 tile in the argument list. The runchmjob.py should then tell the caller to run these tasks first to verify correct operation. This will also give estimates of runtime for running entire dataset.

Fix incorrect script names in readme.txt

checkjobstatus.py needs to be replaced with checkchmjob.py

Add ability to overlay additional probability maps in createprobmapoverlay.py

Add option --addprobmap that allows caller to overlay additional probability maps each with own color and settings.

createchmjob.py should create readme.txt file for job

This readme file should contain the following information:

-- Descriptions of all files and directories pertaining to the job.

-- The arguments passed to createchmjob.py to create this directory

-- Commands to submit jobs and check status

-- Links to get help

Modify chmrunner.py to catch USR2 signals

SGE scheduler will send a USR2 a few seconds before killing the job. chmrunner.py should (depending if flag is set or unset on command line) catch this signal and output any stderr/stdout output and remove any temp files.

Account value not being saved in configuration files

For Comet and rocce cluster the runjobs.CLUSTER and runmerge.CLUSTER files are generated once by createchmjob.py which sets the account value from --account flag.

For Gordon cluster this value is lost since runjobs.gordon and runmerge.gordon is re-written when checkchmjob.py --submit is invoked. To remedy this the account value needs to be stored in a configuration file so it can be loaded into CHMConfig.

mergetiles job failed on rocce because it ran out of memory

A job on rocce failed cause it ran out of memory on mergetiles.py:

2017-07-23 17:52:22,722 ERROR (11321) chmutil.mergetiles Caught exception
Traceback (most recent call last):
  File "/home/rdrigo/miniconda2/bin/mergetiles.py", line 95, in main
    theargs.suffix)
  File "/home/rdrigo/miniconda2/bin/mergetiles.py", line 53, in _merge_image_tiles
    merged = sim.merge_images(im_list)
  File "/home/rdrigo/miniconda2/lib/python2.7/site-packages/chmutil/image.py", line 42, in merge_images
    merged = self._merge_two_images(merged, entry)
  File "/home/rdrigo/miniconda2/lib/python2.7/site-packages/chmutil/image.py", line 58, in _merge_two_images
    b=image2)
  File "/home/rdrigo/miniconda2/lib/python2.7/site-packages/PIL/ImageMath.py", line 265, in eval
    out = builtins.eval(expression, args)
  File "<string>", line 1, in <module>
  File "/home/rdrigo/miniconda2/lib/python2.7/site-packages/PIL/ImageMath.py", line 232, in imagemath_max
    return self.apply("max", self, other)
  File "/home/rdrigo/miniconda2/lib/python2.7/site-packages/PIL/ImageMath.py", line 88, in apply
    out = Image.new(mode or im1.mode, im1.size, None)
  File "/home/rdrigo/miniconda2/lib/python2.7/site-packages/PIL/Image.py", line 2154, in new
    return Image()._new(core.new(mode, size))
MemoryError
2017-07-23 17:52:23,456 INFO (11318) chmutil.core Process 11320 exited with code: 2

The job was merging 71 tiles of size 31237x29138 pixels. To remedy this, need to estimate memory needed by merge and add the following to the -l line in runmerge.rocce configuration file:

h_vmem=XXG,virtual_free=XXG

Where XX is number of gigabytes of ram needed. This is only needed on rocce since the queue places multiple user jobs on a node. Comet should be okay since a job gets an entire node with 128 gigabytes of ram.

Add --account option to creatchmjob.py

add --account option that lets user specify account which is needed for Gordon and Comet clusters. This value should be put into CHMConfig and obtainable via a get method.

createtrainingmrcstack.py failing due to incorrect reference to get_image_path_list function

In version 0.8.0 createtrainingmrcstack.py failing cause it is trying to call get_image_path_list from core module, but that function was moved to image module

Comet has new version of singularity

Need to change singularity module loaded to
singularity/2.3.2

update job readme file to correct checking job status on gordon

update qstat call example in readme.txt to include -t flag needed for array jobs:

qstat -t -u '$USER'

add option to skip deletion of scratch directory for createtrainingmrcstack.py

Add new option --dontdeletescratch to createtrainingmrcstack.py to skip deletion of scratchdir.

Create new tool createchmtrainjob.py

This script should generate a chm train job in similar design to createchmjob.py

Usage:

createchmtrainjob.py ./images ./labels ./run --stage 2 --level 2 --account foo --walltime 24:00:00

under ./run should be something similar to createchmjob.py

.DS_Store in input images directory causes IOError in createchmjob.py

$ createchmjob.py chmimages model mychm --disablechmhisteq --cluster rocce --chmbin /data/churas/chm_s22.img
2017-07-24 16:46:15,539 ERROR chmutil.image Skipping file unable to open /data/scratch/churastest/chmimages/.DS_Store
Traceback (most recent call last):
File "/home/churastest/miniconda2/lib/python2.7/site-packages/chmutil/image.py", line 382, in get_input_image_stats
im = Image.open(fp)
File "/home/churastest/miniconda2/lib/python2.7/site-packages/PIL/Image.py", line 2519, in open
% (filename if filename else fp))
IOError: cannot identify image file '/data/scratch/churastest/chmimages/.DS_Store'
2017-07-24 16:46:15,560 ERROR chmutil.image Caught exception attempting to close image
Traceback (most recent call last):
File "/home/churastest/miniconda2/lib/python2.7/site-packages/chmutil/image.py", line 391, in get_input_image_stats
im.close()
AttributeError: 'NoneType' object has no attribute 'close'
Run this to submit job
/home/churastest/miniconda2/bin/checkchmjob.py "/data/scratch/churastest/mychm" --submit
[churastest@login-0-0 churastest]$ ls -la chmimages/.DS_Store
-rw-r--r-- 1 churastest churastest 6148 Jul 24 16:40 chmimages/.DS_Store
[churastest@login-0-0 churastest]$ file chmimages/.DS_Store

Setting --cluster flag in createchmjob.py should correctly set best values for cluster

Setting this --cluster flag should update --jobspernode to 1 for rocce, 11 or 12 for gordon as well as correct value for comet.

checkchmjob.py --detailed outputting incorrect value for ram

When checking a job with checkchmjob.py --detailed memory usage was output:

CHM tasks: 100% complete (2 of 2 completed)
CHM runtime: 0.5 hours per task (12,846.76GB ram)

Looking standard out files, here is the output for memory usage in kilobytes:

Maximum resident set size (kbytes): 12846688
Maximum resident set size (kbytes): 12846832

The above output should be 12.85GB ram. Looks like checkchmjob.py is outputting megabytes of ram instead of gigabytes.

Add new option to createchmjob.py to specify image type for probability map images

Since IMOD needs tif files to create an MRC stack, it would be nice to have createchmjob.py generate probability map images that are tiffs instead of png files. Easiest implementation is to offer a new flag --gentifs that tells createchmjob.py to generate the merged probability map images as tif files.

crbs / chmutil Goto Github PK

chmutil's People

Contributors

Watchers

Forkers

chmutil's Issues

Recommend Projects

Recommend Topics

Recommend Org