anlthms / dsb-2017 Goto Github PK
View Code? Open in Web Editor NEWData Science Bowl 2017
License: Apache License 2.0
Data Science Bowl 2017
License: Apache License 2.0
The Kaggle dataset is not required to run this code, right? I mean this repo can solely work on LUNA16 data right?
Then why did you mention the note at the end of the README.md , that to preprocess dsb data.
Is Kaggle data needed for this repo?
Please reply.
Is there a script in the current repository doing that? Ideally, it'd be nice to let index.py generate three csv files: train, validation, and test sets. This way we can fine-tune hyper-parameters on validation and get an idea of end performance on test. Could you please advise how to do so, especially the prediction part?
I'm running the project on MacBook Pro with 2.5 GHz Intel Core i7.
First epoch time looks like:
Epoch 0 [Train |████████████████████| 696/696 batches, 0.74 cost, 45575.87s] [CrossEntropyBinary Loss 0.69, 2769.73s]
2017-04-01 08:42:29,323 - neon.callbacks.callbacks - INFO - Epoch 0 complete. Train Cost 0.780730. Eval Cost 0.687632
...which it way longer then your GPU example. Is it ok?
How can we see the output? And can I run this repo on Windows?
I am mainly concerned about the output. It would be really helpful.
Thank you.
I got the 2 prediction files of the train and validation data, but I have no idea what do the 128 numbers of one patient mean. Do you mind pointing it out? Thank you so much!
I'm testing run.py (training) with a toy-size subset, say 24 samples in total from luna16. So there are 18 for training and 6 for validation (testing). The rest settings are by default.
./run.py -e 6 -w luna/vids -r 0 -v -eval 1 -s model.pkl
The output looks like the following:
Network Layers:
Sequential
Convolution Layer 'Convolution_0': 1 x (64x64x64) inputs, 16 x (29x29x29) outputs, 1,1,1 padding, 2,2,2 stride, 1,1,1 dilation
Activation Layer 'Convolution_0_Rectlin': Rectlin
Convolution Layer 'Convolution_1': 16 x (29x29x29) inputs, 32 x (21x21x21) outputs, 0,0,0 padding, 1,1,1 stride, 2,2,2 dilation
BatchNorm Layer 'Convolution_1_bnorm': 296352 inputs, 1 steps, 32 feature maps
Activation Layer 'Convolution_1_Rectlin': Rectlin
Convolution Layer 'Convolution_2': 32 x (21x21x21) inputs, 64 x (17x17x17) outputs, 0,0,0 padding, 1,1,1 stride, 2,2,2 dilation
BatchNorm Layer 'Convolution_2_bnorm': 314432 inputs, 1 steps, 64 feature maps
Activation Layer 'Convolution_2_Rectlin': Rectlin
Pooling Layer 'Pooling_0': 64 x (17x17) inputs, 64 x (9x9) outputs
Convolution Layer 'Convolution_3': 64 x (9x9x9) inputs, 128 x (8x8x8) outputs, 0,0,0 padding, 1,1,1 stride, 1,1,1 dilation
BatchNorm Layer 'Convolution_3_bnorm': 65536 inputs, 1 steps, 128 feature maps
Activation Layer 'Convolution_3_Rectlin': Rectlin
Convolution Layer 'Convolution_4': 128 x (8x8x8) inputs, 128 x (7x7x7) outputs, 0,0,0 padding, 1,1,1 stride, 1,1,1 dilation
BatchNorm Layer 'Convolution_4_bnorm': 43904 inputs, 1 steps, 128 feature maps
Activation Layer 'Convolution_4_Rectlin': Rectlin
Convolution Layer 'Convolution_5': 128 x (7x7x7) inputs, 128 x (6x6x6) outputs, 0,0,0 padding, 1,1,1 stride, 1,1,1 dilation
BatchNorm Layer 'Convolution_5_bnorm': 27648 inputs, 1 steps, 128 feature maps
Activation Layer 'Convolution_5_Rectlin': Rectlin
Convolution Layer 'Convolution_6': 128 x (6x6x6) inputs, 256 x (5x5x5) outputs, 0,0,0 padding, 1,1,1 stride, 1,1,1 dilation
BatchNorm Layer 'Convolution_6_bnorm': 32000 inputs, 1 steps, 256 feature maps
Activation Layer 'Convolution_6_Rectlin': Rectlin
Convolution Layer 'Convolution_7': 256 x (5x5x5) inputs, 1024 x (4x4x4) outputs, 0,0,0 padding, 1,1,1 stride, 1,1,1 dilation
BatchNorm Layer 'Convolution_7_bnorm': 65536 inputs, 1 steps, 1024 feature maps
Activation Layer 'Convolution_7_Rectlin': Rectlin
Convolution Layer 'Convolution_8': 1024 x (4x4x4) inputs, 4096 x (3x3x3) outputs, 0,0,0 padding, 1,1,1 stride, 1,1,1 dilation
BatchNorm Layer 'Convolution_8_bnorm': 110592 inputs, 1 steps, 4096 feature maps
Activation Layer 'Convolution_8_Rectlin': Rectlin
Convolution Layer 'Convolution_9': 4096 x (3x3x3) inputs, 2048 x (2x2x2) outputs, 0,0,0 padding, 1,1,1 stride, 1,1,1 dilation
BatchNorm Layer 'Convolution_9_bnorm': 16384 inputs, 1 steps, 2048 feature maps
Activation Layer 'Convolution_9_Rectlin': Rectlin
Convolution Layer 'Convolution_10': 2048 x (2x2x2) inputs, 1024 x (1x1) outputs, 0,0,0 padding, 1,1,1 stride, 1,1,1 dilation
BatchNorm Layer 'Convolution_10_bnorm': 1024 inputs, 1 steps, 1024 feature maps
Activation Layer 'Convolution_10_Rectlin': Rectlin
Dropout Layer 'Dropout_0': 1024 inputs and outputs, keep 50% (caffe_compat False)
Linear Layer 'Linear_0': 1024 inputs, 2 outputs
BatchNorm Layer 'Linear_0_bnorm': 2 inputs, 1 steps, 2 feature maps
Activation Layer 'Linear_0_Softmax': Softmax
Traceback (most recent call last):
File "./run.py", line 94, in <module>
model.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks)
File "/pyenv/lib/python2.7/site-packages/neon-1.8.2-py2.7.egg/neon/models/model.py", line 182, in fit
self._epoch_fit(dataset, callbacks)
File "/pyenv/lib/python2.7/site-packages/neon-1.8.2-py2.7.egg/neon/models/model.py", line 200, in _epoch_fit
for mb_idx, (x, t) in enumerate(dataset):
File "/dsb-2017/data.py", line 131, in __iter__
yield self.next_minibatch(start)
File "/dsb-2017/data.py", line 96, in next_minibatch
self.next_macrobatch()
File "/dsb-2017/data.py", line 88, in next_macrobatch
self.shuffle(self.data, self.targets)
File "/luna16/dsb-2017/data.py", line 214, in shuffle
data[:] = data[inds]
MemoryError
Is this error due to the small # of samples for training?
If so, what's the minimum of samples required for training?
'convert.py' takes too long to process the whole luna16 dataset, so just wanted to make sure neon take handle a subset of it properly.
Please advise.
Thanks!
Could you please release the code for the visualization of the lung nodule? I have already tried the code from https://www.kaggle.com/gzuidhof/data-science-bowl-2017/full-preprocessing-tutorial, but failed.
Its seems the zip files in the shared google drive are corrupted Or I am missing something ? :-
(.venv2)bash-4.2$ unzip subset0.zip
Archive: subset0.zip
warning [subset0.zip]: 2516934707 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [subset0.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
Hi, I encounter the memory error at first epoch with GPU enabled. By monitoring with top & nvidia-smi, it seems GPU memory is enough but CPU memory is exhausted quickly. I checked with running epoch with GPU disabled (use CPU), the first 1 epoch starts normally but just too slow to continue.
The error output is:
BatchNorm Layer 'Convolution_10_bnorm': 1024 inputs, 1 steps, 1024 feature maps Activation Layer 'Convolution_10_Rectlin': Rectlin Dropout Layer 'Dropout_0': 1024 inputs and outputs, keep 50% (caffe_compat False) Linear Layer 'Linear_0': 1024 inputs, 2 outputs BatchNorm Layer 'Linear_0_bnorm': 2 inputs, 1 steps, 2 feature maps Activation Layer 'Linear_0_Softmax': Softmax Traceback (most recent call last): File "./run.py", line 94, in <module> model.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks) File "/home/xx/neon/neon/models/model.py", line 182, in fit self._epoch_fit(dataset, callbacks) File "/home/xx/neon/neon/models/model.py", line 204, in _epoch_fit x = self.fprop(x) File "/home/xx/neon/neon/models/model.py", line 235, in fprop return self.layers.fprop(x, inference) File "/home/xx/neon/neon/layers/container.py", line 332, in fprop x = l.fprop(x, inference=inference) File "/home/xx/neon/neon/layers/layer.py", line 798, in fprop bsum=self.batch_sum) File "/home/xx/neon/neon/backends/nervanagpu.py", line 1982, in fprop_conv layer.fprop_kernels.bind_params(I, F, O, X, bias, bsum, alpha, beta, relu, brelu, slope) File "/home/xx/neon/neon/backends/convolution.py", line 541, in bind_params bsum_data, x_data = self.xprop_params(O, X, bias, bsum, beta, relu, brelu, slope) File "/home/xx/neon/neon/backends/convolution.py", line 77, in xprop_params bsum_data = self.bsum.bind_params(bsum) File "/home/xx/neon/neon/backends/convolution.py", line 1051, in bind_params bsum_data = self.lib.scratch_buffer_offset(self.size) File "/home/xx/neon/neon/backends/nervanagpu.py", line 875, in scratch_buffer_offset data = int(_get_scratch_data(self.scratch_size)) + self.scratch_offset File "<decorator-gen-60>", line 2, in _get_scratch_data File "/home/xx/neon/.venv2/local/lib/python2.7/site-packages/pycuda/tools.py", line 430, in context_dependent_memoize result = func(*args) File "/home/xx/neon/neon/backends/nervanagpu.py", line 3195, in _get_scratch_data return drv.mem_alloc(scratch_size) pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
Shouldn't pycuda.driver.mem_alloc allocate memory from GPU? But why CPU memory exhausted? Is it a bug or something I missed?
During make
, I'm getting this output:
make
Building bin/loader.so...
g++ -shared -o bin/loader.so -fPIC -Wall -Wno-deprecated-declarations -O3 -std=c++11 src/loader.cpp
In file included from src/loader.cpp:19:
In file included from src/loader.hpp:27:
src/matrix.hpp:39:2: warning: ("OpenCV support not built-in. Certain features will not work.") [-W#warnings]
#warning ("OpenCV support not built-in. Certain features will not work.")
^
1 warning generated.
I just installed OpenCV3 with homebrew
. What's might be wrong?
My env:
macOS Sierra 10.12.3 (16D32)
@anlthms Thanks for sharing and documenting your work. I have really liked your research.
I just want to ask one thing. How do we test data, how do we predict results from this code? And how does the output look like? Please provide details about the testing and evaluation phase.
Thank you.
Hi, anlthms, thank you for your script. I have run your script but got a different result. I think if I can plot the chunk in 3d, the difference may be easily detected.
Would you mind telling me the script for visualizing? thank you.
hi anlthms,
1.mask = np.array(image > -320, dtype=np.int8)
the '-320' is HU value,but u just use it on array value....
2.coord_z,coord_y,coord_x are the world coordinates.After you run
slices = ndimage.interpolation.zoom(data, spacing, mode='nearest')
, u should convert the world coordinates to voxel coordinates using the origin and spacing of the ct_scan, like:
'''
def world_2_voxel(world_coordinates, origin, spacing):
stretched_voxel_coordinates = np.absolute(world_coordinates - origin)
voxel_coordinates = stretched_voxel_coordinates / spacing
return voxel_coordinates
'''
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.