Giter VIP home page Giter VIP logo

jnepal / image-captioning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tharindu326/image-captioning

0.0 0.0 0.0 6.33 MB

The aim of the project is to create a captioning model. The architecture of your generative model which have two encoders and a decoder. The first encoder is used to extract visual features from images, while the second one is used for semantic features. Visual and semantic features are then concatenated to represent the images.

Jupyter Notebook 100.00%

image-captioning's Introduction

image-captioning

The aim of the project is to create a captioning model. The architecture of your generative model which have two encoders and a decoder. The first encoder is used to extract visual features from images, while the second one is used for semantic features. Visual and semantic features are then concatenated to represent the images.The decoder aims at generating words to construct the captions for the medical images.

methodology

We employed both generative and retrieval models to produce the output captions. Within the generative approach, two methodologies were utilized:

  1. A dual-encoder configuration involving one image encoder and a separate text encoder for captions, both linked to a decoder.
  2. A single encoder-decoder setup.

The outputs from the chosen encoder configuration were then fed into the decoder to generate provisional captions. These were subsequently compared with the captions obtained from the retrieval model to determine the final caption.

Usage

  1. Install the requirements

    pip3 install -r requirements.txt
    
  2. scripts

     process_dataset.ipynb             Task1: select dataset and generate the json dat object
     data_visualize.ipynb              Task1: visualize samples from the dataset
     vocabulary_builder.ipynb          Task2: build the vocabulary 
     vacabulary_frequency.ipynb        Task3: plot the word occurrences 
     word_embeddings.ipynb             Task4: generate word embeddings (using different methods) and plotting 
     data_loader.ipynb                 Task5: pytorch data loading functions generative model (encoder-decoder models)
     train_duel_encoder.ipynb          Task5: fit data: model training of duel encoder model
     train_single_encoder.ipynb        Task5: fit data: model training of single encoder model
     similarity_single_encoder.ipynb   Task6 - Task9 using single encoder generative model 
                                       Task6: get generative captions and similarities with GTs (X) using different similarity matrices 
                                       Task7: Retrieval method: get the most similar caption for generated caption (Y) from training set (Z)
                                       Task8: model fusion: compare GT (X) with (Y) and (Z) and get the best caption as (Y) or (Z) and assigned it to test image
                                       Task9: Evaluation metrics
     similarity_duel_encoder.ipynb     Task6 - Task9 using dual encoder generative model 
     inference.ipynb                   Task6 - inference of dual encoder model
    

image-captioning's People

Contributors

tharindu326 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.