0. File description
1. Web app instructions
2. Project Introduction
3. Strategy
4. EDA
5. Modelling
6. Hyperparameter tuning
7. Results and conclusion
8. Improvements
9. Licensing and acknowledgements
data
|- dog_names.npy numpy exported file contains a list of dog breeds names
|- haarcascade_frontalface_alt.xml cv2 frontal face recognition file
|- dogmodel_output.h5 pre-trained classification model based on ResNet50
templates
|- index.html main page of web app
app.py flask file that runs app
model.py model file that contains necessary functions for classification task
requirements.txt list of required packages for running the web app
README.md
- Create and activate a virtual environment and follow
requirements.txt
to install all the required packages for app; - Clone the project, create an empty folder named
uploads
in the project's directory, and run the following command to run web app:python app.py
- The app should be running on
http://127.0.0.1:5000/
; - In the web app, select a picture from your computer and submit to see the classification result.
This project aims to build a pipeline to process and classify real-world, user-supplied images. If provided an image of a dog, the web app will identify an estimate of the dog's breed; if supplied an image of a human, the code will identify the resembling dog breed.
In the pipeline, firstly, ResNet50 (with weights from imagenet) and opencv are used to detect if the uploaded image is a dog or a human face respectively. If detected, then a convolution neural network built through transfer learning (base model: ResNet50) will be used to classify the image and return the result. Finally, the pipeline is deployed as a web app by using flask.
The dataset for training the convolution neural network is provided by Udacity:
The metrics used to evaluate the performance of the pipeline is accuracy
.
The images are loaded by scikit learn and numpy, and splitted into training, validation, and test sets. In total, there are 133
dog categories and 8351
total dog images: 6680
images for training set, 835
images for validation set, and 836
for test set.
The convolution neural network is built through transfer learning on top of ResNet50. In the CNN architecture:
- used
GlobalAveragePooling2D layer
to shrink the size of the input and ease the computations for parameters; - used
Dropout layer
that sets the randomly choosed input units to 0, which helps prevent overfitting; - used a
Dense layer
(fully-connected layer) and softmax as activation to calculate the probability of each label, and pick the one with the highest probability.
The model is compiled with Adam
optimizer, and the loss used is categorical crossentropy
.
The tuned hyperparameters are:
learning rate for Adam
= .0003 (tuned in logarithmic scale, e.g. .01, .001, .0001...)
Dropout probability
= .4 (based on validation accuracy)
epochs
= 20 (based on how many loss is reduced on each epoch)
batch size
= 64 (based on validation accuracy)
The accuracy of the model's prediction on the test set is ~83%. The pipeline is able to identify if the uploaded image is a dog or a human face or neither, and then it is able to predict the dog's breed or resembling dog's breed according to the uploaded image.
The training accuracy achieved is ~0.98
, but the validation accuracy is 0.84
and the test accuracy is 0.83
. This indicates that the variance can be improved.
In order to reduce the variance:
- get more dog images for training, or use data augmentation to generate more images;
- adjust and tune the regularization to prevent overfitting;
- may use a different architecture for the task, such as the inception network.
Thanks to Udacity for providing the dataset. Feel free to use any of the code.