yuanqing811 / isic2018 Goto Github PK

View Code? Open in Web Editor NEW

123.0 6.0 50.0 1.25 MB

ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection

Python 100.00%

deep-learning computer-vision biomedical-image-processing melanoma-recognition

isic2018's Introduction

ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection

https://challenge2018.isic-archive.com/

Summary

Update: July 15, 2018 to include k-fold validation and validation/test prediction and submission.

This repository provides a starting solution for Task 1 and Task 3 of ISIC-2018 challenge based on Keras/Tensorflow.

The current achieved performance is:

Task 1	Task 3
81.5% mean Jaccard	83 % accuracy
77.2% thresholded Jaccard	68.5% mean recall

We support most of the backbones supported by Keras (Inception, Densenet, VGG etc.). For the segementation problems, we additionally support using Keras pre-trained backbones in a U-Net type structure.

The code is highly configurable allowing you to change and try many aspects of the algorithm. Below, we describe how to run a baseline solution.

Installation / setup

This code uses: Python 3.5, Keras 2.1.6, and TensorFlow 1.8.0. Please see the requirements file for needed packages.

Please make sure that your project directory is in your PYTHONPATH.

export PYTHONPATH="${PYTHONPATH}:yourprojectpath"

Note we use the developement version of scikit-image for image resizing as it supports anti-aliasing. You can install devlopement version directly from Github. Alternatively, you could change the resize function in load_image_by_id in datasets/ISIC2018/__init__.py to not use the anti-aliasing flag.

Data preparation

Place the unzipped ISIC 2018 data in folders datasets/ISIC2018/data. This folder should have the following subfolders:

ISIC2018_Task1-2_Training_Input
ISIC2018_Task1-2_Validation_Input
ISIC2018_Task1-2_Test_Input
ISIC2018_Task1_Training_GroundTruth
ISIC2018_Task3_Training_GroundTruth
- Include ISIC2018_Task3_Training_LesionGroupings.csv file. See here and here
ISIC2018_Task3_Training_Input
ISIC2018_Task3_Validation_Input
ISIC2018_Task3_Test_Input

Data pre-processing

We resize all the images to 224x224x3 size and store them in numpy file for ease/speed of processing. You can run datasets/ISIC2018/preprocess_data.py to do the pre-processing, or it will be done the first time you call a function that needs the pre-processed data. This can take a few hours to complete.

Data visualization

You can visualize the data by running misc_utils/visualization_utils.py. You should be able to see figure likes below:

Task 1 image

Training/Prediction

Task 1 (Segmentation)

Solution

The solution uses an encoder and a decoder in a U-NET type structure. The encoder can be one the pretrained models such as vgg16 etc. The default network that trains ok is vgg16. Run the script runs/seg_train.py to train.

Set num_folds to 5 if you want to do 5 fold training. Set it to 1 if you want to use a single fold.

Task 1 results

Run the script runs/seg_eval.py to evaluate the network. We get the following on the validation set of about 400 images: Mean jaccard = 0.815, Thresholded Jaccard = 0.772 where thresholded Jaccard uses a threshold 0.65 before averaging.

Result Visualization

Task 1 prediction

Run runs/cls_predict.py to make predictions on validation and test set and generate submission. Submission will be in directory submissions.

Set:

num_folds to 5 if you have done 5 fold training. Set it to 1 if you are using a single fold.
Set TTA = False if you do not want to use test time augmentation (which uses rotations of the image and averages predictions)
Set pred_set = 'test' for test set and set it 'validation' for validation set

Task 3 (Classification)

Solution

The solution uses transfer learning from one the pretrained models such as vgg16 etc. The default network that trains ok is inception_v3. Run the script runs/cls_train.py to train.

Set num_folds to 5 if you want to do 5 fold training. Set it to 1 if you want to use a single fold.

Task 3 results

Run the script runs/cls_eval.py. Make sure the configuration matches the one used in runs/cls_eval.py.

The result below is based on training a single InceptionV3 model for 30 epochs, and is based on roughly 2000 validation images.

Confusion Matrix:

True\Pred	MEL	NV	BCC	AKIEC	BKL	DF	VASC	TOTAL
MEL	0.58	0.34	0.01	0.00	0.06	0.00	0.00	231
NV	0.05	0.93	0.01	0.00	0.02	0.00	0.00	1324
BCC	0.07	0.15	0.63	0.11	0.03	0.01	0.00	89
AKIEC	0.07	0.10	0.04	0.55	0.22	0.00	0.00	67
BKL	0.08	0.10	0.02	0.00	0.79	0.00	0.00	240
DF	0.17	0.00	0.00	0.11	0.06	0.67	0.00	18
VASC	0.12	0.12	0.00	0.00	0.12	0.00	0.65	34
TOTAL	233	1352	74	53	253	16	22

Precision/Recall:

	MEL	NV	BCC	AKIEC	BKL	DF	VASC	MEAN
precision	0.575	0.907	0.757	0.698	0.751	0.750	1.000	0.777
recall	0.580	0.926	0.629	0.552	0.792	0.667	0.647	0.685

Result Visualization

Correct predictions are in green and wrong predictions are in red.

Task 3 prediction

Run runs/cls_predict.py to make predictions on validation and test set and generate submission. Submission will be in directory submissions.

Set:

num_folds to 5 if you have done 5 fold training. Set it to 1 if you are using a single fold.
Set TTA = False if you do not want to use test time augmentation (which uses rotations of the image and averages predictions)
Set pred_set = 'test' for test set and set it 'validation' for validation set

Miscellaneous

Backbones supported: inception_v3, vgg16, vgg19, resnet50, densenet121, densenet169, densenet201.

Model data along with logs will be written in model_data directory.

isic2018's People

Contributors

Stargazers

Watchers

isic2018's Issues

About the training CPU or GPU

Sorry, May I ask that this training process are using CPU or GPU? I don't know how to use GPU to train with this code.

parent class Backbone

请问是从哪继承的这个父类？

ISIC2018_Task3_Training_LesionGroupings.csv not found

partition_task3_data tries to read a file

task3_sup_fname = 'ISIC2018_Task3_Training_LesionGroupings.csv' :

df = pd.read_csv(os.path.join(task3_gt_dir, task3_sup_fname))

Where can I find this file?

https://challenge.kitware.com/#phase/5abcbc6f56357d0139260e66

Only provides

ISIC2018_Task3_Training_GroundTruth.csv

Could not find a version that satisfies the requirement keras-contrib==2.0.8

I tried to install requirement.txt using

pip install -r requirements.txt

But I got

Collecting Keras==2.1.6 (from -r requirements.txt (line 26))
  Using cached https://files.pythonhosted.org/packages/54/e8/eaff7a09349ae9bd40d3ebaf028b49f5e2392c771f294910f75bb608b241/Keras-2.1.6-py2.py3-none-any.whl
Collecting keras-contrib==2.0.8 (from -r requirements.txt (line 27))
  Could not find a version that satisfies the requirement keras-contrib==2.0.8 (from -r requirements.txt (line 27)) (from versions: )
No matching distribution found for keras-contrib==2.0.8 (from -r requirements.txt (line 27))

IndexError: index 0 is out of bounds for axis 0 with size 0

Hi,
I want to ask about this error I encountered when trying to run seg_train.py:
Traceback (most recent call last):
File "seg_train.py", line 107, in
if y_train[0].max() > 1:
IndexError: index 0 is out of bounds for axis 0 with size 0

I cannot proceed to train after loading the image.
Can you help me with this?
Thank you and best regards,
Duc

About preprocess_data.py

Hello, I have 0.13.0 version of scikit_image, but anti_aliasing isn't available yet ,i got the error of "TypeError: resize() got an unexpected keyword argument 'anti_aliasing' , how could i solve this question? Thanks very much!

dataset

Hello, could you upload the ISIC 2018 dataset? I downloaded from the web but always failed in about 50%.

任务三分类

你们在任务三分类，除此之外还有哪些尝试和改进？谢谢。

ModuleNotFoundError: No module named 'datasets.ISIC2018'

I am trying to follow your instructions, but when I run

C:\Users\me\GitHub\ISIC2018Challenge>python runs/cls_train.py

I get

Traceback (most recent call last):
  File "runs/cls_train.py", line 4, in <module>
    from datasets.ISIC2018 import *
ModuleNotFoundError: No module named 'datasets.ISIC2018'

The folder /datasets/ISIC2018 subfolder is there, but apparently the import statement in cls_train.py misinterprets the folder as a module.

Did I miss a step in the setup?

Have you published a paper?

Your code is really great! I wonder if you have published a paper?

Change request: Provide an evaluation setup for development

For development, it would be really convenient to have a setup running on only a few samples, to get a quicker feedback. I have already tried and removed the majority of the images and the corresponding rows in the csv for Task 3. But then I ran into

_init__.py", line 232, in partition_task3_data
    y_valid = y[valid_indices]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 10015 but corresponding boolean dimension is 59

TypeError: can't pickle _thread.lock objects

Running cls_train.py with debug_visualize = True triggered the following error:

Epoch 1/50
Exception in thread Thread-2:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\threading.py", line 916, in _bootstrap_inner
    self.run()
  File "C:\ProgramData\Anaconda3\lib\threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils\data_utils.py", line 548, in _run
    with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils\data_utils.py", line 522, in <lambda>
    initargs=(seqs,))
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
    context=self.get_context())
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 174, in __init__
    self._repopulate_pool()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
    w.start()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.lock objects

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

There may be a problem with

def save_model_config(run_name, **model_config):
    config_file = get_model_config_filename(run_name=run_name)
    with open(config_file, 'wb') as file:
        pickle.dump(model_config, file)

Apparently, using pickle to save Keras models is not recommended.

KeyError: 0 in keras\utils\data_utils.py

Running

cls_train.py

on a Windows 10 machine

triggers a KeyError in keras\utils\data_utils.py

with the following output log:

C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
10015it [00:00, 22019.01it/s]
2018-06-17 18:26:57.254087: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to  
==================================================================================================
input_1 (InputLayer)            (None, 224, 224, 3)  0                         
....
predictions (Dense)             (None, 7)            903         dropout_1[0][0]
__________________________________________________________________________________________________
outputs (Activation)            (None, 7)            0           predictions[0][0]
==================================================================================================
Total params: 22,065,959
Trainable params: 22,031,527
Non-trainable params: 34,432
__________________________________________________________________________________________________
num_dense_layers     : 1
num_dense_units      : 128
dropout_rate         : 0.0
pooling              : avg
class_wt_type        : ones
dense_layer_regularizer : L1
class_wt_type        : ones
learning_rate        : 0.0001
batch_size           : 32
use_data_aug         : True
horizontal_flip      : True
vertical_flip        : True
width_shift_range    : 0.1
height_shift_range   : 0.1
rotation_angle       : 180
n_samples_train      : 8012
n_samples_valid      : 2003
Epoch 1/50
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
C:\ProgramData\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Using TensorFlow backend.
Using TensorFlow backend.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils\data_utils.py", line 390, in get_index
    return _SHARED_SEQUENCES[uid][i]
KeyError: 0
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils\data_utils.py", line 564, in get
    inputs = self.queue.get(block=True).get()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 644, in get
    raise self._value
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Ralph\Documents\GitHub\ISIC2018Challenge\runs\cls_train.py", line 131, in <module>
    use_multiprocessing=True)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py", line 2212, in fit_generator
    generator_output = next(output_generator)
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils\data_utils.py", line 570, in get
    six.raise_from(StopIteration(e), e)
  File "<string>", line 3, in raise_from
StopIteration: 0

About script runs/cls_train.py

Hello, when i run the scripts runs/cls_train.py, i met the problems , can you help me solve it out? thanks very much!
Traceback (most recent call last):
File "", line 1, in
File "D:\Anaconda3\envs\tensorflow_gpu\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Anaconda3\envs\tensorflow_gpu\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Exception in thread Thread-2:
Traceback (most recent call last):
File "D:\Anaconda3\envs\tensorflow_gpu\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\Anaconda3\envs\tensorflow_gpu\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\Anaconda3\envs\tensorflow_gpu\lib\site-packages\keras\utils\data_utils.py", line 548, in _run
with closing(self.executor_fn(_SHARED_SEQUENCES)) as executor:
File "D:\Anaconda3\envs\tensorflow_gpu\lib\site-packages\keras\utils\data_utils.py", line 522, in
initargs=(seqs,))
File "D:\Anaconda3\envs\tensorflow_gpu\lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "D:\Anaconda3\envs\tensorflow_gpu\lib\multiprocessing\pool.py", line 174, in init
self._repopulate_pool()
File "D:\Anaconda3\envs\tensorflow_gpu\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
w.start()
File "D:\Anaconda3\envs\tensorflow_gpu\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "D:\Anaconda3\envs\tensorflow_gpu\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\Anaconda3\envs\tensorflow_gpu\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "D:\Anaconda3\envs\tensorflow_gpu\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
OverflowError: cannot serialize a bytes object larger than 4 GiB

达到的最好成绩。

你好，请问一下你们使用新划分的数据集最高能达到多少？
我们去了10%的训练集作为验证集，最后验证集上的acc达到了90%左右，balanced_acc只有80%左右。

Task3 issues

@yuanqing811 Thanks for your code!
There are some issues when running your code:
1- When running cls_train.py, it uses only 32 instances, and there are no instances for some catagories such as DF or VASC as in the screenshots. In the evaluation phase, the results are not as good as what you have stated. Here I have attached the results I get when running cls_eval.py
Could you please suggest the soulution to this issue?

![screenshot from 2018-07-05 10-27-50](https://user-images.githubusercontent.com/38858

584/42329213-280252e4-803e-11e8-82db-1e7b01c73f3d.png)

Submission to task3 only achieves overall score 0.558

Hi, thanks for your code.

I trained the task 3 and followed the instruction. The result below is based on training a single model for 30 epochs, and is based on roughly 2000 validation images.

precision [0.54385965 0.88107324 0.64893617 0.51136364 0.67420814 0.4
0.725 ]
recall [0.4025974 0.91767372 0.68539326 0.67164179 0.62083333 0.22222222
0.85294118]

The results is closer to your reported results on validation images.

However, when I test with test images and submit the result with this model, the overall score is only 0.558. I Set TTA = False. The results is much lower than the reported results.
How can I solve this problem?

Many thanks for your reply.

yuanqing811 / isic2018 Goto Github PK

isic2018's Introduction

ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection

Summary

Installation / setup

Data preparation

Data pre-processing

Data visualization

Task 1 image

Training/Prediction

Task 1 (Segmentation)

Solution

Task 1 results

Result Visualization

Task 1 prediction

Task 3 (Classification)

Solution

Task 3 results

Confusion Matrix:

Precision/Recall:

Result Visualization

Task 3 prediction

Miscellaneous

isic2018's People

Contributors

Stargazers

Watchers

Forkers

isic2018's Issues

Recommend Projects

Recommend Topics

Recommend Org