Giter VIP home page Giter VIP logo

cnn-cancer-detection's Introduction

CNN-Cancer-Detection

Overview

The goal of this project is to create an algorithm to identify metastatic cancer in small impage patches taken from larger digital pathology scans. The data used is a slightly modified version of the PatchCamelyon (PCam) benchmark set.

This is a binary classification problem to analyze digital images of 96 px X 96 px.

Initial findings in the data:

  • We see in the data frame description that there are 220,025 observations with all of them being unique.
  • There are 2 columns, 'id' contains the matching id with an image in the 'train' folder and 'label' is a binary integer representing 0 as false (non cancerous) and 1 as true (cancerous)
  • Currently, 'id' is an object data type but we can make that a string. 'label' is currently an integer but we can make that a factor
  • In the 2 folders from the downloaded data set, there are 220,025 in the 'train' folder corresponding to the id's in the data frame. The 'test' folder has 57,458
  • For value counts of 'label', we can see there are many more non-cancerous (0) labels than cancerous. 130,908 to 89,117, respectively.

image

Model Architecture

For the structure of the model, I played around with a few different designs based off of the Sequential model. Here, I did 3 blocks with 3 CNN layers each. Filter size of 3x3 with the first layer having 32 filters, the second with 64 and the third with 128. Each uses 'reLU' activation and includes max pooling with 2x2 filters and a batch normalization. After the 3 layers, I added a flatten layer, 3 dropout layers and 3 dense layers before the final sigmoid dense output layer.

image

image

Findings and Conclusion

I ran into lots of different problems with versions of Tensorflow and model training sessions getting stuck or freezing. I finally had to reinstall version 2.15 of Tensorflow with Python 3.9.16 and got these 12 epochs to run smoothly. Looking at the accuracy and loss plots above, this model performed fairly well but we see that epoch 4 had very bad validation loss and accuracy. With the training accuracy and loss improving but validation accuracy and loss jumping around like it did, that would indicate overfitting and the model not generalizing well to the data. With more time, I would extend the number of epochs and test different hyperparameters such as optimizers, learning rate, and different layers for the model. The accuracy didn't quite flatten out yet with 12 epochs and the loss was still on the decline.

cnn-cancer-detection's People

Contributors

friedunit avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.