Comments (7)
Hi,
Nice work in getting cdeep3m on docker.
We ran into that same issue a while back. To correct the problem we ended up forking and modifying the custom version of caffe (fork located here https://github.com/coleslaw481/caffe_nd_sense_segmentation) to look for the .caffemodel
file in the same directory where the .solverstate
file is located.
The AWS cloud formation template configuration aws/basic_cloudformation.json
was updated to use the repo mentioned above, but we forgot to update the link in the README.md
I just made a pull request to get the link in the README.md
updated.
Try compiling and installing this version of caffe:
https://github.com/coleslaw481/caffe_nd_sense_segmentation
If you look at the last commit on the above repo you can see what was changed.
Thanks for catching this and let us know if it works or if you run into any other issues.
from cdeep3m.
Hi, Chris,
Thanks for your super-fast response! Do appreciate it.
Yes, we do notice the difference of caffe version used in AWS template which provided a "user data" section to record all the configuration commands used in the cloud. And honestly, this "coleslaw481" version of the caffe confused us a lot since we absolutely have no idea why in AWS CDeep3M deployed a different version from the official Github... Anyway, good to know, and there were two more issues we have to mention when installing locally:
(1) lack of /usr/bin/time
(2) have to update two scripts as script/functions/update__*.m to change the default path of *prototxt files
Also, for some reason, we have to update caffetrain.sh at line 166 to give the absolute path to --solver. Not sure why it happened. But to make the whole thing work is the top priority, and thus, we don't dig further for this...
Thanks again and really a nice model!
from cdeep3m.
Hi,
I made a ticket #56 to correct the first issue you mentioned.
I have a couple questions with regard to the second issue. Were you passing an absolute or relative path? Also were there any spaces or funny characters in the model path?
thanks,
chris
from cdeep3m.
Hi,
Thanks for the update! Yes, we have to use the absolute path both in solver.prototxt and train_val.prototxt, and I don't notice any special characters in the path of our working directory. For instance, in solver.prototxt, the update is
net: "/usr/local/src/CDeep3M/cdeep3m-1.6.2/model_imod_train_out//1fm/train_val.prototxt"
..
snapshot_prefix: "/usr/local/src/CDeep3M/cdeep3m-1.6.2/model_imod_train_out//1fm/trainedmodel/1fm_classifer"
if we also want to retrained the model from snapshot. In order to get this absolute path, those two Matlab scripts in script/functions/update__*.m are modified. And in train_val.prototax, we have to update the line containing "class_prob_mapping_file:" to include the absolute path as well. At this stage, I'm not sure if the usage of the absolute paths in these two files are "bugs" from Caffe or my ubuntu environment setup since in AWS, it seems that everything works fine.
Also, got two quick questions:
(1) we are using two 1080Ti here, and it seems that the prediction of a single 1024x1024 png file with a 30000epoch-pretrained CDeep3M model costs ~3 mins, and 30 seconds for a single 512x512 png file. Is that normal?
(2) would you please provide some estimate of memory usage for the training? we tried to train our own dataset of 200 images with the resolution of 1024x1024 each. It failed since the server is currently equipped with only 32GB memory, but the 1080Ti graphic card seems OK for us.
Thanks.
Brett
from cdeep3m.
Hi,
regarding your questions:
-
I ran a quick test with a 210 image stack, and for the training i get a 45% memory usage of the process on an instance with 60GB (and a single K80). And on the K80 memory its using 9819MiB of its 11439MiB. So I agree with you that it's more likely that if it failed for you it's probably the memory on the machine rather then the 1080Ti.
I'm not sure if you were using both 1080Ti's training two models in parallel, then this would most likely be too much for the 32Gb ram.
For a single GPU it should be pretty close, so if you try with a stack of 150 images I'd expect you should already be able to run it even with the 32Gb ram. -
We will do some tests on 1080s in the next weeks, but I'd expect they are slightly slower than the K80.
Generally speaking the time each image takes to process depends on a couple factors, but if speed is a concern, consider that in the standard setting each image is processed 40 times (8* 1fm + 16* 3fm + 16*5fm) and we note that if you have a well trained model the augmentation can be reduced without losing too much in accuracy (see https://github.com/CRBS/cdeep3m/wiki/Speed-up)
I'd recommend start setting --augspeed 10
you can combine it with using less models, eg. --models 1fm
e.g. for the Demorun 1 I get the 5 images predicted in 9min with the K80 whereas i get the result in 19sec when adding the flags --augspeed 10 --models 3fm
Pre- and postprocessing also adds some time to the whole process since we needed a standard pipeline that will process large images as well
Hope this helps
Best,
Matt
from cdeep3m.
Hi, Matt,
Thanks a lot for the detailed response! It does help us a lot. We will continue to use CDeep3M to our vessel and neural data and see how it works. By the way, do you guys ever consider to also fork a Tensorflow/Keras version of CDeep3M? In that way, I personally believe that it would be greatly benefited from the full power of Python libraries instead of octave/matlab style currently used.... Thanks again.
Brett
from cdeep3m.
Hi Brett,
yes we are working on some enhancements in the background. Including a Python version.
-Matt
from cdeep3m.
Related Issues (20)
- /usr/bin/time not available on all systems
- HDF5 error with Ubuntu 18.04 LTE HOT 4
- Add validation data to training failed HOT 3
- runprediction not working on local build HOT 4
- preprocessing problem HOT 3
- Watershed error HOT 2
- Image binarization and exclusion zone HOT 4
- Can we apply your model on different applications? HOT 1
- checkpoint_nobinary(imgstack) HOT 14
- Unable to plot validation loss HOT 1
- caffepredict doesn't parrellelize properly HOT 1
- Blank prediction results HOT 1
- Error when running runprediction.sh on a local build. Any help is appreciated! HOT 8
- How to obtain the training accuracy from the log file? HOT 4
- 2D data HOT 3
- architecture used? HOT 1
- dataset used in training HOT 1
- How to input muti-channel label image,such as(width*height*channel) and How to use label_class_selection.prototxt
- prediction failed (octave error) HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cdeep3m.