Giter VIP home page Giter VIP logo

yolo-object-detection's Introduction

Yolo-Object-Detection

Bounding Box detection using YOLO 09.05.2020 ─ Roshni Koli Introduction Object detection models are extremely powerful — from finding dogs in photos to improving healthcare, training computers to recognize which pixels constitute items unlocks near limitless potential. However, one of the biggest blockers keeping new applications from being built is adapting state-of-the-art, open source, and free resources to custom problems.

When it comes to deep learning-based object detection, there are three primary object detectors : R-CNN and their variants Single Shot Detector (SSDs) YOLO R-CNNs are one of the first deep learning-based object detectors and are an example of a two-stage detector.While R-CNNs tend to be very accurate, the biggest problem with the R-CNN family of networks is their speed — they were incredibly slow, obtaining only 5 FPS on a GPU. To help increase the speed of deep learning-based object detectors, both Single Shot Detectors (SSDs) and YOLO use a one-stage detector strategy. These algorithms treat object detection as a regression problem, taking a given input image and simultaneously learning bounding box coordinates and corresponding class label probabilities. So these tend to be generally less accurate than two-stage detectors, but are incredibly faster. For this project I am using YOLO, i.e. You Look Only Once. First introduced in 2015 by Redmon et al., their paper, You Only Look Once: Unified, Real-Time Object Detection, details an object detector capable of super real-time object detection, obtaining 45 FPS on a GPU. YOLO has gone through a number of different iterations, including YOLO9000: Better, Faster, Stronger (i.e., YOLOv2), capable of detecting over 9,000 object detectors and YOLOv3: An Incremental Improvement. For the purpose of this project we will be focusing more on YOLOv3.

Dataset description

The dataset comprises images from both shops and user posted ones. The train data has 191961 images, validation data has 32123 images and the test data has 62629 images. The dataset has images with objects from 13 classes and the distribution in test and validation sets is as shown in figure below.

Dataset Source : https://github.com/switchablenorms/DeepFashion2

Yolov3 description

YOLO stands for You Only Look Once. It's an object detector that uses features learned by a deep convolutional neural network to detect an object. YOLO makes use of only convolutional layers, making it a fully convolutional network (FCN). It has 75 convolutional layers, with skip connections and upsampling layers. No form of pooling is used, and a convolutional layer with stride 2 is used to downsample the feature maps. This helps in preventing loss of low-level features often attributed to pooling. Being a FCN, YOLO is invariant to the size of the input image. So I kept the sizes as is since resizing such a dataset on CPU proved computationally expensive on my system. Before v3, YOLO used to softmax the class scores. However, that design choice has been dropped in v3, and authors have opted for using sigmoid instead. The reason is that Softmaxing class scores assume that the classes are mutually exclusive which is rarely the case in a real-time object detection scenario.

Reference - https://github.com/AlexeyAB/darknet

The output files from EDA have been used as input to darknet. The weights file for darknet obtained after training for one epoch has been stored at -

https://drive.google.com/file/d/1Z8PcMMCz1sfiwRO1eKoei3rmD9LMfrSJ/view?usp=sharing

The output over train set can be viewed at - https://drive.google.com/drive/folders/1DQxWK__IhlSKLqytyNDPTHLjveWqCIVJ?usp=sharing

The output for the test set can be viewed at - https://drive.google.com/drive/folders/1PT4eq7O3RDNuR8eHjP94FbGY7TXwAHaF?usp=sharing

Code for classification matrix is in the output folder.

yolo-object-detection's People

Contributors

roshnikoli avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.