crbs / cdeep3m Goto Github PK
View Code? Open in Web Editor NEWPlease go to https://github.com/CRBS/cdeep3m2 for most recent version
License: Other
Please go to https://github.com/CRBS/cdeep3m2 for most recent version
License: Other
Found by damsport11 (see: #46 (comment))
Passing --1fmonly
to runtraining.sh
kicks out this error message:
$0: unrecognized option '--1fmonly'
The failure is because trainworker.sh
was switched to the newer --models
flag. Just need to adjust runtraining.sh
to pass --models 1fm
to trainworker.sh
Would it be possible for Merge_LargeData.m to look for the de_augmentation_info.mat file in the parent directory? Or maybe be given the path to that file as a command line argument.
$ /home/ubuntu/training_data/predict/run_all_predict.sh /home/ubuntu/100_it_trained_model/ /home/ubuntu/training_data/5stackaug/
Running 1fm predict (1) packages to process
Processing Pkg001_Z01 1 of 1
Non zero exit code from caffe for predict /home/ubuntu/training_data/predict/1fm/Pkg001_Z01 model. Exiting.
Here is last 10 lines of /home/ubuntu/training_data/predict/1fm/Pkg001_Z01/out.log:
Starting to merge large image dataset
Processing:
/home/ubuntu/training_data/predict/1fm/de_augmentation_info.mat
error: load: unable to find file /home/ubuntu/training_data/predict/1fm/de_augmentation_info.mat
error: called from
/home/ubuntu/deep3m/Merge_LargeData.m at line 41 column 1
Command exited with non-zero status 1
real 227.88
user 120.86
sys 78.49
if only 1 model is specified (ie --models 1fm) then EnsemblePredictions.m will fail when it is run cause it requires at least 2 models to merge the data. To fix this simply check the model count and if only 1 just make a symlink named ensembled and point it to the model directory where the images reside.
run_all_predict.sh didn't actually do the prediction
finished in 1sec reporting it was done
Write a script to extract the loss rate and other key learning values from log files of prediction. Ideally should be able to create a plot of this data.
remove Pkg folders (and individual pngs within) once the segmented images are stitched back together
I tried running Merge_LargeData.m on this directory:
.
├── 1fm
│ ├── de_augmentation_info.mat
│ └── Pkg001_Z01
│ ├── DONE
│ ├── log
│ ├── out.log
│ ├── test.h5_shift__0000.png
│ ├── test.h5_shift__0001.png
│ ├── test.h5_shift__0002.png
│ ├── test.h5_shift__0003.png
│ └── test.h5_shift__0004.png
├── 3fm
│ ├── de_augmentation_info.mat
│ └── Pkg001_Z01
│ ├── DONE
│ ├── log
│ ├── out.log
│ ├── test.h5_shift__0000.png
│ ├── test.h5_shift__0001.png
│ ├── test.h5_shift__0002.png
│ ├── test.h5_shift__0003.png
│ └── test.h5_shift__0004.png
├── 5fm
│ ├── de_augmentation_info.mat
│ └── Pkg001_Z01
│ ├── DONE
│ ├── log
│ ├── out.log
│ ├── test.h5_shift__0000.png
│ ├── test.h5_shift__0001.png
│ ├── test.h5_shift__0002.png
│ ├── test.h5_shift__0003.png
│ └── test.h5_shift__0004.png
├── caffe_predict.sh
├── de_augmentation_info.mat
├── out
└── run_all_predict.sh
With this command for 1fm:
Merge_LargeData.m 1fm/de_augmentation_info.mat ./out/
but got this error.
octave: X11 DISPLAY environment variable not set
octave: disabling GUI features
Starting to merge large image dataset
Processing:
1fm/de_augmentation_info.mat
Combining image stacks
error: 'fileformats' undefined near line 13 column 30
error: called from
filter_files at line 13 column 23
/home/ubuntu/deep3m/Merge_LargeData.m at line 72 column 9
error: evaluating argument list element number 1
error: called from
filter_files at line 13 column 23
/home/ubuntu/deep3m/Merge_LargeData.m at line 72 column 9
error: evaluating argument list element number 1
error: called from
/home/ubuntu/deep3m/Merge_LargeData.m at line 72 column 9
I'm guessing I am doing something wrong.
In the Run CDeep3M training and prediction instructions
under the Run Segmentation subtitle
runprediction.sh ~/my_images ~/predictout
should be
runprediction.sh ~/my_trained_model ~/my_images ~/predictout
Worked previously,
Possibly new error introduced while augmenting image data:
<<<
Generating Average Prediction of /home/ubuntu/ImageData/Results_Mito/1fm/
all_files =
104x1 struct array containing the fields:
name
date
bytes
isdir
datenum
statinfo
Merging 16 variations of file test.h5_shift_ ... number 1 of 101
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
H5 Dimensions:
1 2 1024 1024
error: recover8Variation: A(I,J,...) = X: dimensions mismatch
error: called from
merge_16_probs_v2>recover8Variation at line 104 column 32
merge_16_probs_v2 at line 27 column 15
./StartPostprocessing.m at line 43 column 21
copy over the VERSION file in source tree to predict and train job directories so there is a record of the version of software used.
Realized that training data was converted and saved as single, whereas other data was saved as uint8
Despite the fact that single is 4 times as large i'm not sure what other effects this could have, currently testing, but think this has been in the code already since a while
They both seem to just generate noise when augmenting the data,
Not sure what changed
Could it have to do with the h5 package?
CreateTrainJob.m should directly call PreprocessTrainingData.m so the user does not need to make a separate call.
Not all systems have time
installed or in /usr/bin
for that matter so it might be a good idea to remove the /usr/bin
prefix and have the user update their path or add a parameter to the .sh
scripts which lets the user set the time command used with the default set to just time
Update caffepredict.sh script to examine # of GPUs and run predictions in parallel to utilize all GPUs found. This should also be done in run_all_predict.sh script.
Add this either end of CreateTrainJob.m, but this code should look at out.log for each model and get the loss value for all iterations and find where the slope of this curve levels out or goes below some threshold. Output this value into a README.txt file that is put in the output of the train directory.
grep "loss = " LOG/* | sed "s/^.*]//"
Hi, I got a problem here when trying to reproduce your results locally in a Linux docker. I followed all the steps as in the wiki just fine until this page: https://github.com/CRBS/cdeep3m/wiki/Tutorial-3-Run-CDeep3M.
I got stuck in step 5 with the command as
runtraining.sh --additerations 20 --retrain ~/sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out ~/augtrain ~/model
The running of the above command failed as I realized that when using the pretrained model at sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out/1fm/trainedmodel/1fm_classifer_iter_30000.solverstate, this solverstate file will call the model file "1fm_classifer_iter_30000.caffemodel" by default at the location /home/ubuntu/sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out/1fm/trainedmodel/1fm_classifer_iter_30000.caffemodel, which was written into the solverstate binary file!
Since my installation of CDeep3M is not under the directory of /home/ubuntu, I'm wondering if it is possible to update this default path for caffemodel file in the solverstate file when retraining the model from 30000iterations_train_out. Furthermore, can somebody tell me that if I'm going to train my own model, the snapshot of the model output in solverstate file would include the absolute full path instead of relative path? It is important if someone else would like to use the pretrained model... otherwise, they have to reproduce exactly the same configuration environment as I have locally.
Thanks for your help.
On master branch the h5write calls have been commented out in PreprocessTrainingData.m
I already ran dos2unix on EnsemblePredictions.m cause octave was giving errors when trying to run the script from the command line. I also made the script executable so please refresh your source tree (on master branch)
When I run EnsemblePredictions.m with no arguments I get this error message:
./EnsemblePredictions.m
error: Invalid call to exist. Correct usage is:
-- Built-in Function: C = exist (NAME)
-- Built-in Function: C = exist (NAME, TYPE)
error: called from
print_usage at line 90 column 5
./EnsemblePredictions.m at line 21 column 1
octave does not like using exist on to_process variable. I wasn't sure what is trying to be done here so I made a ticket.
Hi,
I installed the software on our ubuntu 16.04 workstation with CUDA 8.0 and try to run the example scripts, however with both receive errors... Any suggestion?? Thanks in advance!
Kevin
k.knoops@nano:~$ runprediction.sh ~/sbem/mitochrondria/xy5.9nm40nmz/30000iterations_train_out ~/cdeep3m-1.4.0/mito_testsample/testset/ ~/predictout30k
Starting Image Augmentation
Check image size of:
/home/local/UNIMAAS/k.knoops/cdeep3m-1.4.0/mito_testsample/testset/
Reading file: /home/local/UNIMAAS/k.knoops/cdeep3m-1.4.0/mito_testsample/testset/images.081.png
z_blocks =1 5
Start up worker to generate packages to process
Start up worker to run prediction on packages
Start up worker to run post processing on packagesTo see progress run the following command in another window:
tail -f /home/local/UNIMAAS/k.knoops/predictout30k/logs/*.log
error: 'fileformats' undefined near line 13 column 30
error: called from
filter_files at line 13 column 23
/home/local/UNIMAAS/k.knoops/cdeep3m-1.4.0/EnsemblePredictions.m at line 35 column 12
error: evaluating argument list element number 1
error: called from
filter_files at line 13 column 23
/home/local/UNIMAAS/k.knoops/cdeep3m-1.4.0/EnsemblePredictions.m at line 35 column 12
ERROR, a non-zero exit code (1) was received from: EnsemblePredictions.m /home/local/UNIMAAS/k.knoops/predictout30k/1fm /home/local/UNIMAAS/k.knoops/predictout30k/3fm /home/local/UNIMAAS/k.knoops/predictout30k/5fm /home/local/UNIMAAS/k.knoops/predictout30k/ensembled
k.knoops@nano:~$
k.knoops@nano:~/cdeep3m-1.4.0$ ./runtraining.sh /home/local/UNIMAAS/k.knoops/mito_testaugtrain ~/output Verifying input training data is valid ... success
Copying over model files and creating run scripts ... successA new directory has been created: /home/local/UNIMAAS/k.knoops/output
In this directory are 3 directories 1fm,3fm,5fm which
correspond to 3 caffe models that need to be trainedDetected 2 GPU(s). Will run in parallel.
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
ERROR: caffe had a non zero exit code: 127
Non zero exit code from caffe for train of model. Exiting.
ERROR, a non-zero exit code (1) was received from: trainworker.sh --numiterations 30000
k.knoops@nano:~/cdeep3m-1.4.0$
-Update documentation; Deep3M Wiki
Train.m should create a train_file.txt and put it in the base of the output directory. solver.prototxt should then have that file as its input. This way if a user wants to change input training data they just modify the train_file.txt in the output directory.
Add commands to INSTALL IMOD into cloud formation template file.
Increase default number of iterations from 2000 to 30000 in runtraining.sh
Could you duplicate versions of Run_all called Run_2D
which will only launch 1fm versions of train and predict
There seems to be a mis-match of dimensions when augmenting or de-augmenting the data causing a slight shift in x-y
Instead of running PreprocessImageData.m in runprediction.sh instead have the code call:
def_datapackages.m
this also writes a textfile in the same folder (where the de_augmentation matlab file is stored),
which you could use to tell the wrapper how many z-stacks and how many x/y packages are to be done.
Then in the loop in preprocess_package.m must called for every z stack and every datapackage to create the augmented images.
preprocess_package.m <indir> <outdir> <xy_package> <z_stack> <augmentation speed>
% Example: preprocess_package ~/EMdata1/ ~/AugmentedEMData/ 15 2 1fm 10
%
% Speed: supported values 1,2,4 or 10
% speeds up processing potentially with a negative effect on accuracy (speed of 1 equals highest accuracy)
Currently 1fm trained weights are named different than 3fm and 5fm
Add --gpu
option to runprediction.sh
so caller can specify a specific GPU to use instead of all available GPUs
Use gnu parallel to run train in parallel if possible
Is this used or can it be deleted?
scripts/de_augment_data copy.m
scripts/Images2H5.m
scripts/post_processing/combinePredicctionSlice copy.m
Remove the quotes around the path passed to tail command in output from runprediction.sh
To see progress run the following command in another window:
tail -f "/home/ubuntu/predictout30k/logs/*.log"
Delete augmented h5 files after StartPostprocessing ran or probably remove
the entire Package folder (e.g. ~/predictout30k/augimages/1fm/Pkg001_Z01/)
-Once post-processing ran, merge Lucas branch back into master
Rename or write new scripts to replace CreateTrainJob and CreatePredictJob. They should be something like runtrain and runpredict and take non augmented images as input.
In case a given model does better then the others it is better to not delete the png files under the model directories (1fm, 3fm, 5fm) in runprediction.sh Currently this is done towards end of runprediction.sh script by these lines:
for Y in `echo $space_sep_models` ; do
/bin/rm -f $out_dir/$Y/*.png
done
Just remove these lines for now.
Call EnsemblePredictions.m after all predictions.
Add flags to runtraining.sh to allow user to optionally adjust these values in solver.prototxt:
base_lr: 1e-02
power: 0.8
momentum: 0.9
weight_decay: 0.0005
average_loss: 16
lr_policy: "poly"
iter_size: 8
snapshot: 2000
Add --gpu #
option to runtraining.sh
so users can specify to use a certain GPU instead of all GPUs.
add to end of PreprocessTraining:
tee ~/deep3m/model/inception_residual_train_prediction_1fm/train_file.txt ~/deep3m/model/inception_residual_train_prediction_3fm/train_file.txt ~/deep3m/model/inception_residual_train_prediction_5fm/train_file.txt < ./train_file.txt >/dev/null
There should be progress output for each Pkg_Z## and each .h5 being processed within. Also an output of time would be nice something like:
Running 1fm 53 Pkg_Z## folders to process
Running Pkg_Z### X of 53.....(period for each .h5 file). <time in seconds>
This tool should take output of PreProcessImageData.m and trained models directory from CreateTrainJob.m to run prediction on all the Pkg directories running predictions. The output should be another directory with a mirrored structure that can be consumed by the postprocessing script.
Usage:
CreatePredictJob.m <Output of Train.m after training run> <augmented image data> <output directory>
Desired output structure:
<output>/
1fm/
<copy over de_augment.m file, or hardlink it>
Pkg001/
Pkg002/
3fm/
<copy over de_augment.m file, or hardlink it>
Pkg001/
Pkg002/
5fm/
<copy over de_augment.m file, or hardlink it>
Pkg001/
Pkg002/
There is a typo in fprintf on line 105. I can fix it, but wasn't sure if you were working on this file so I made a ticket.
Output from run:
Created 1 packages in x/y with 1 z-stacks
error: fprintf: invalid format specified
error: called from
./PreprocessImageData.m at line 105 column 1
fprintf('Data stored in:\n %\n', outdir);
Would you be opposed to a new structure for the source tree?
I'm also going to shy away from dropping scripts into the train job and predict job folders and just have them in the path, cause its easier to test.
I was thinking something like this (i already started messing with this new style in chrisdev branch):
.
├── aws/
├── Makefile
├── model/
│ ├── inception_residual_train_prediction_1fm
│ ├── inception_residual_train_prediction_3fm
│ └── inception_residual_train_prediction_5fm
├── README.md
├── scripts/
│ ├── caffepredict.sh
│ ├── caffetrain.sh
│ ├── CreatePredictJob.m
│ ├── CreateTrainJob.m
│ ├── EnsemblePredictions.m
│ ├── functions/ (all non directly executable matlab files would be put in here)
│ ├── Merge_LargeData.m
│ ├── PreprocessImageData.m
│ ├── PreprocessTrainingData.m
│ ├── run_all_predict.sh
│ ├── run_all_train.sh
│ ├── RunUnitTests.m
│ └── StartPostprocessing.m
├── vagrant/
└── VERSION
Released versions would look like this:
.
├── model/
│ ├── inception_residual_train_prediction_1fm
│ ├── inception_residual_train_prediction_3fm
│ └── inception_residual_train_prediction_5fm
├── README.txt
├── scripts/
│ ├── caffepredict.sh
│ ├── caffetrain.sh
│ ├── CreatePredictJob.m
│ ├── CreateTrainJob.m
│ ├── EnsemblePredictions.m
│ ├── functions/ (all non directly executable matlab files would be put in here)
│ ├── Merge_LargeData.m
│ ├── PreprocessImageData.m
│ ├── PreprocessTrainingData.m
│ ├── run_all_predict.sh
│ ├── run_all_train.sh
│ ├── RunUnitTests.m
│ └── StartPostprocessing.m
└── VERSION
I need to included specific errormessage if imageimporter or imageimporter_large come back empty
script to determine the accuracy when using decreasing number of augmentations
for this old processing structure will be used with modifications in the postprocessing
To be included in the distribution, to allow the end user assessment of how many augmentations to use (if the goal is to reduce processing time but not to lose too much accuracy)
Right after prediction is run for each model (1fm,3fm,5fm) Run the post image processing script (StartPostprocessing.m) to reduce the data footprint. This process can should be invoked and NOT waited on unless its the last job to run.
Adjust caffepredict.sh script to run with less then 16 augmented variations of the data. The code should instead just see what .h5 files are there and run prediction on those files keeping the same naming convention.
We should add a set of test and training images somewhere (not sure, but we could host them on ccdb?)
for the users to go through all steps
Improve stitching of large images
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.