wikke / tianchi-medical-lungtumordetect Goto Github PK

View Code? Open in Web Editor NEW

396.0 26.0 149.0 9.51 MB

天池医疗AI大赛[第一季]：肺部结节智能诊断 UNet/VGG/Inception/ResNet/DenseNet

Jupyter Notebook 99.48% Python 0.52%

unet neural-network keras segmentation classification inception resnet vgg densenet lung-cancer-detection

tianchi-medical-lungtumordetect's Introduction

阿里云天池医疗大赛·肺结节检测

Features

3D Segmentation & Classification with Keras
Fine preprocessing with scikit-image
Fine visualization for clarification
Modified UNet for segmentation
Modified VGG/Inception/ResNet/DenseNet for classification ensemble
Fine hyperparameter tunning with both models and training process.

Code Hierarchy

- config.py # good practice to centralize hyper parameters

- preprocess.py # Step 1, preprocess, store numpy/meta 'cache' at ./preprocess/

- train_segmentation.py # Step 2, segmentation with UNet Model
- model_UNet.py # UNet model definition

- train_classificaion.py # Step 3, classificaiton with VGG/Inception/ResNet/DenseNet
- model_VGG.py # VGG model definition
- model_Inception.py # Inception model definition
- model_ResNet.py # ResNet model definition
- model_DenseNet.py # DenseNet model definition

- generators.py # generator for segmentation & classificaiton models
- visual_utils.py # 3D visual tools

- dataset/ # dataset, changed in config.py
- preprocess/ # 'cache' preprocessed numpy/meta data, changed in config.py

- train_ipynbs # training process notebooks

Preprocess

use SimpleITK to read CT files, process, and store into cache with numpy arrays
process with scikit-image lib, try lots of parameters for best cutting
- binarized
- clear-board
- label
- regions
- closing
- dilation
collect all meta information(seriesuid, shape, file_path, origin, spacing, coordinates, cover_ratio, etc.) and store in ONE cache file for fast training init.
see preprocessing in /train_ipynbs/preprocess.ipynb file

Distribution of the lung part takes on a whole CT.

Tumor size distribution

Segmentation

A simplified and full UNet both tested.
dice_coef_loss as loss function.
Periodically evaluate model with lots of metrics, which helps a lot to understand the model.
30% of negative sample, which has no tumor, for generalization.
Due to memory limitation, 16 batch size used.

Classification

VGG

A simplified and full VGG model both tested. Use simplified VGG as baseline.

Pictures tells that: hyperparameter tunning really matters.

Inception

A simplified Inception-module based network, with each block has 4-5 different type of conv.
- 1*1*1 depth-size seperable conv
- 1*1*1 depth-size seperable conv, then 3*3*3 conv_bn_relu
- 1*1*1 depth-size seperable conv, then 2 3*3*3 conv_bn_relu
- AveragePooling3D, then 1*1*1 depth-size seperable conv
- (optional in config) 1*1*1 depth-size seperable conv, and (5, 1, 1), (1, 5, 1), (1, 1, 5) spatial separable convolution
- Concatenate above.

ResNet

use bottleneck block instead of basic_block for implementation.
A bottleneck residual block consists of:
- (1, 1, 1) conv_bn_relu
- (3, 3, 3) conv_bn_relu
- (1, 1, 1) conv_bn_relu
- (optional in config) kernel_size=(3, 3, 3), strides=(2, 2, 2) conv_bn_relu for compression.
- Add(not Concatenate) with input
Leave RESNET_BLOCKS as config to tune

DenseNet

DenseNet draws tons of experience from origin paper. https://arxiv.org/abs/1608.06993
- 3 dense_block with 5 bn_relu_conv layers according to paper.
- transition_block after every dense_block, expcet the last one.
- Optional config for DenseNet-BC(paper called it): 1*1*1 depth-size seperable conv, and transition_block compression.

Fine Tunning & Experience Got

Learning rate: 3e-5 works well for UNet, 1e-4 works well for classification models.
Due to memory limitation, 16 batch size used.
Data Augumentation: shift, rotate, etc.
Visualization cannot be more important!!!
coord(x, y, z) accord to (width, height, depth), naughty bugs.
Put all config in one file save tons of time. Make everything clean and tidy
Disk read is bottle neck. Read from SSD.
Different runs has different running log dirs, for better TensorBoard visualization. Make it like /train_logs/<model-name>-run-<hour>-<minute>.
Lots of debug options in config file.
4 times probability strengthened for tumors < 10mm, 3 for tumor > 10mm and < 30mm, keep for > 30mm. Give more focus on small tumors, like below.

tianchi-medical-lungtumordetect's People

Contributors

Stargazers

Watchers

Forkers

hordaway unyqhz mojimin ffsouza tangyuan5833 qinshimeng18 nanfengpo liuyonglog hs99 646677064 1059444127 mamro neuwangmeng jameskry rinawhale cyranochen kakoedlinnoeslovo gqrong csyyyyyyy zhizhongchai jhilbertxtu 123fengye741 ai3dvision shuangte leidaguo secretdragon lihaossu liu3xing3long 94mia yuanyuan-nick houguanqun alllakerman vincentcalc carolinelyw pustar bellamkondaprakash jacklee20151 littlestorys zuoshaobo elffer min-sheng mini-shark maliang668 yangwenhaoyang loveplay1983 mingdingzhiai shivanshuaggwal bochuanwu zhengqun decmxj1229 linxuefeng-hust tiffen yui34567 xiaoqingwang bennafly ronglu2003 nick917 yangsenwxy xiaoyuanguo yunhua525 mrgo2008 luxiaohao azuredsky caijiahao xiaohongxiao cyli2019 jason4521 angela000 sambd86 wqw123 ywy0318 fhxzh chenjian120918 kant alloymei bygreencn zhongxing7695 spytensor yolanda1993 codernew wulingtian kungwanyi piggypiggyrun shazha amritsreekumar tricoffee bobofrivia zzs1852 jy00002 liuwenhaha airbail mousechen rxt2012kc tianfangwu 849795902 halfss david-zzy salary-only-17k linhandev evilcalf

tianchi-medical-lungtumordetect's Issues

为什么drop掉小于12的结节

楼主的代码感觉好多疑惑，虽然都测试跑通了，但是发现问题如下：
1.uent训练的是大于12的结节，肺区占比大于0.1的，这样就少了很多结节.
希望楼主看见后能给回复下

test数据集

你好，感谢您的分享，请问您的test数据集怎么测试的？train_segmentation分割后的数据没有用到train_classification的训练中吗？两个问题请教您，谢谢您。期待您的回复

关于unet分割时的generators中的get_block和get_mask不匹配问题疑问

mask[coord[0] - radius[0]:coord[0] + radius[0] + 1,
coord[1] - radius[1]:coord[1] + radius[1] + 1,
coord[2] - radius[2]:coord[2] + radius[2] + 1] = 1.0
这行是根据结节的半径，在mask的正**，构造了一个立方体结节部分为1
w, h, d = int(coord[0] - shape[0] // 2), int(coord[1] - shape[1] // 2), int(coord[2] - shape[2] // 2)
w, h, d = max(w, 0), max(h, 0), max(d, 0)
w, h, d = min(w, W - shape[0] - 1), min(h, H - shape[1] - 1), min(d, D - shape[2] - 1)
block = hf['img'][w:w + shape[0], h:h + shape[1], d:d + shape[2]]
这几行代码生成的block，如果w, h, d = max(w, 0), max(h, 0), max(d, 0)中取了0值的话，那么生成的block中的结节不在正**，会不会对应的mask不匹配？