Giter VIP home page Giter VIP logo

image-captioning-via-yolov5-encoderdecoderwithattention's Introduction

Image-Captioning-via-YOLOv5-EncoderDecoderwithAttention

Archived : This code was just a fun project! Neither it was propely tuned nor it is properly maintained! No Updates/correction expected! Focusing on other areas, I am not a vision expert. This code runs properly for most of the people, please check if images are getting populated properly for you if there is an error!**

Use original Flickr8K dataset. - https://www.kaggle.com/datasets/adityajn105/flickr8k

PUT 'Images' directory and 'captions.txt' in the same directory as in root of this repo.

Attempt for Image Captioning using combination of object detection via YOLOv5 and Encoder Decoder LSTM model on Flickr8K dataset.

  1. Run to make object crops via YOLOv5
python detect_object.py
  1. Run to train - This just takes the Resnet embeddngs of object cropped images detected not any kind of text from YOLO labeller.
python train.py True
  1. To evaluate on validation data
python train.py False
  1. Sample predictions -

2.jpg

references -  [This is a black dog splashing in the water, A black lab with tags frolicks in the water ,A black dog running in the surf,The black dog runs through the water]

prediction- [['<SOS>'], ['a'], ['black'], ['dog'], ['is'], ['a'], ['a'], ['water'], ['.'], ['<EOS>']]

1.jpg

references -  [A black dog and a spotted dog are fighting, A black dog and a tri-colored dog playing with each other on the road,
A black dog and a white dog with brown spots are staring at each other in the street,Two dogs of different breeds looking at each other on the road]

prediction- [['<SOS>'], ['a'], ['black'], ['and'], ['white'], ['dog'], ['is'], ['running'], ['through'], ['a'], ['.'], ['<EOS>']]
  1. Mean BLEU-4 score on validation data is quite low. Suggested Improvements : Use Adam and shuffling of data. Maybe minibatching. Increasing number of datapoints combining other datasets since 8k is quite low smaple size (No of parameters >> no of datapoints, not ideal for neural nets).

Citation (Flickr8K Dataset)

Hodosh, Micah, Peter Young, and Julia Hockenmaier. "Framing image description as a ranking task: Data, models and evaluation metrics." Journal of Artificial Intelligence Research 47 (2013): 853-899.

image-captioning-via-yolov5-encoderdecoderwithattention's People

Contributors

akjayant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

image-captioning-via-yolov5-encoderdecoderwithattention's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.