Light

friedunit / cnn-cancer-detection Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 1.15 MB

Kaggle Histopathologic Cancer Detection Competition

Home Page: https://www.kaggle.com/competitions/histopathologic-cancer-detection/overview

Jupyter Notebook 100.00%

cancer-detection cnn cnn-keras deep-learning

cnn-cancer-detection's Introduction

CNN-Cancer-Detection

Overview

The goal of this project is to create an algorithm to identify metastatic cancer in small impage patches taken from larger digital pathology scans. The data used is a slightly modified version of the PatchCamelyon (PCam) benchmark set.

This is a binary classification problem to analyze digital images of 96 px X 96 px.

The data set was downloaded from https://www.kaggle.com/c/histopathologic-cancer-detection/data

Initial findings in the data:

We see in the data frame description that there are 220,025 observations with all of them being unique.
There are 2 columns, 'id' contains the matching id with an image in the 'train' folder and 'label' is a binary integer representing 0 as false (non cancerous) and 1 as true (cancerous)
Currently, 'id' is an object data type but we can make that a string. 'label' is currently an integer but we can make that a factor
In the 2 folders from the downloaded data set, there are 220,025 in the 'train' folder corresponding to the id's in the data frame. The 'test' folder has 57,458
For value counts of 'label', we can see there are many more non-cancerous (0) labels than cancerous. 130,908 to 89,117, respectively.

Model Architecture

For the structure of the model, I played around with a few different designs based off of the Sequential model. Here, I did 3 blocks with 3 CNN layers each. Filter size of 3x3 with the first layer having 32 filters, the second with 64 and the third with 128. Each uses 'reLU' activation and includes max pooling with 2x2 filters and a batch normalization. After the 3 layers, I added a flatten layer, 3 dropout layers and 3 dense layers before the final sigmoid dense output layer.

Findings and Conclusion

I ran into lots of different problems with versions of Tensorflow and model training sessions getting stuck or freezing. I finally had to reinstall version 2.15 of Tensorflow with Python 3.9.16 and got these 12 epochs to run smoothly. Looking at the accuracy and loss plots above, this model performed fairly well but we see that epoch 4 had very bad validation loss and accuracy. With the training accuracy and loss improving but validation accuracy and loss jumping around like it did, that would indicate overfitting and the model not generalizing well to the data. With more time, I would extend the number of epochs and test different hyperparameters such as optimizers, learning rate, and different layers for the model. The accuracy didn't quite flatten out yet with 12 epochs and the loss was still on the decline.

cnn-cancer-detection's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.