Giter VIP home page Giter VIP logo

conv-neural-network's Introduction

Convolutional Neural Network

About the dataset


The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, 6000 images per class. The training set is 50000 images and the test set is 10000 images.

Classes: Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck

Features: One image contains a vector of 3072 values. This corresponds to 3 values for each pixel. There are three values because these are color images. Labels: The labels are an integer between 0 and 9, representing the category.

The problem is a supervised mulitnomial classification problem. My goal is to minimize the error rate on the test set.

One Observation

ship image

Metrics


I will use accuracy on the test set as my final evaluation metric. I will use loss vs steps to determine a reasonable number of steps to trade off between training time and performance.

Baseline Models


Baseline model 1: an accuracy score for guessing a single class for every observation.

Baseline model 2: an accuracy score for using a random number generator to label randomly.

Accuracy on test set for all one class: 10%
Accuracy on test set for random class: 9.69%

Data Preprocessing


I started with the preprocessing shown in the Tensorflow Convolutional Neural Network tutorial:

  • Images cropped to 24x24 randomly for training and centrally for evaluation
  • Images approximately whitenened to make the model insensitive to dyanmic range
  • Dataset artificially increased by randomly flipping image from left to right
  • Dataset artificially increased by randomly distoring brightness and contrast of images

Conv Net Layer reference


http://cs231n.github.io/convolutional-networks/

Modeling


Simple Model Architecture:

  • Input: 128x24x24x3 [batch size, width, height, input depth]
  • Convolution Layer:
    • Number of Filters (K): 12
    • Stride (S): 1
    • Spatial extent (F): 3
    • Amount of Zero padding (P): 1
    • Out: 124x24x24x12
  • Activation: Relu
  • Pool1:
    • Spatial Extent (F): 2
    • Stride (S): 2
    • Depth (K): 12
    • Out: 128x12x12x12
  • FC:
    • Full connected Layer
    • Out: 128x10
  • Softmax: Out 128x10

Simple Arch

LeNet Architecture (little bit off but similar):

LeNet Arch

Cuda ConvNet Architechture:

  • Input: 128x24x24x3 [ batch size, width, height, depth ]
  • Conv Layer: Out = 128x24x24x64
    • Filters (K) = 64
    • Spatial Extent (F) = 5
    • Stride (S) = 1
    • Padding (P) = 2
      • padding for 'SAME' = [(out_height - 1)*S + F - in_height]/2
    • Out: 128x24x24x64
  • Pool 1: Out = 128x12x12x64
    • Max Pooling
    • Stride (S) = 2
    • Size (F) = 3
    • Filters (K) = 64
    • Out: 128x12x12x64
  • Norm 1:
    • Local Response Normalization
    • Depth radius = 4
    • Out: 128x12x12x64
  • Conv Layer 2: = Out 128x12x12x64
    • Filters (K) = 64
    • Spatial Extent (F) = 5
    • Stride (S) = 1
    • Padding (P) = 2
    • Out: 128x12x12x64
  • Norm 2: Out = 128x12x12x64
    • Local Response Normalization
    • Depth radius = 4
    • Out: 128x12x12x64
  • Pool 2: Out = 128x6x6x64
    • Max Pooling
    • Stride (S) = 2
    • Size (F) = 3
    • Filters (K) = 64
    • Out: 128x12x12x64
  • Local 3:
    • Relu activated NN layer of size 384
    • Flattens pooling shape, performs matrix multiplication and adds bias, relu activation
    • Out: 128x384
  • Local 4: Out = 128x192
    • Relu activated NN layer of size 192
    • Performas matrix multiplication and adds bias, relu activation
    • Out: 128x192
  • Softmax:
    • Performs matrix multiplication and adds bias, performs softmax function creating final 128x10 set of logits
    • Out: 128x10

Cuda Net

Entropy and Loss Vs Steps


Graphed the entropy and loss vs steps to find a somewhat low number of steps where performance would be different. I decided if I give a model about 10k steps, I should have a reasonable expectation of what the results will look like.

Entropy vs Steps for Cuda Net Entropy Graph

Loss vs Steps for Cuda Net Loss Graph

Performance


After checking how many predictions had the correct answer as the top choice, the following were the accuracy scores for the two models:

Simple architecture test accuracy after 10k steps: .543

LeNet architecture test accuracy after 10k steps: .626

Cuda convnet setup test accuracy after 118k steps: .8174

Conclusion


Server Setup

When originally planning this project, I thought I wouldn't have a lot of trouble setting up my AWS server with the GPU support. The big hiccup was some additional steps for the Cuda installation on EC2. After finding a blog explaining the step by step for the Cuda install, the remaining tensorflow installation instructions went smoothly.

Setting up the Conv Net

I started with the TensorFlow tutorial for ConvNets and found myself studying the code and trying to reverse engineer the functionality. Rather than just running the tutorial, I moved parts of the functionality into a Jupyter notebook server to ensure I knew what was happening. I eventually got the architecture to work, but realized I would learn more by making a simple architecture without a tutorial.

Training

TensorFlow and TensorBoard making training an algorithm very easy. I love the TensorBoard setup which takes a lot of the effort out of visualizing your results. I wish I had been able to try additional architectures and try to tune model architectures more. After the time it took to understand tensorflow, I down scoped my project to focus on getting two working architectures and scores for those architectures.

Evaluation

My biggest dissappointment was not getting a training score vs steps graph for the models. When I tried to run the script to get the periodic test scores for the graph, I ran out of memory. I ran out of time to deal with resource sharing on the server. The CudaNet architecture outperformed my simple architecture as expected. I think both models would have some improvements given more training time.

What would I do with this dataset given more time?

I would like to create a presentation that breaks down how convolutional neural networks function. I'd also like to compare different architectures and try to give an opinion on why one is outperforming the other. Typically when I run out of time on a machine learning problem, I want to pursue additional preprocessing of data and additional parameter tuning. I found the architecture aspect of deep learning conv nets really interesting and hope to study them more in the future.

If I were to start this project again today, I would focus on adjusting the architectures to see performance changes and getting better graphs to evaluate performance.

What went well, what did not?

It was mostly unforeseen challenges for this project. The server setup was more difficult than anticipated. Getting an understanding of conv nets in tensorflow took me longer than anticipated. There were a lot of learning curves I was on the wrong side of. My favorite part was switching my architecture to a simple conv net - that was the moment I started to understand tensorflow and conv nets better.

conv-neural-network's People

Contributors

ryrizo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.