Giter VIP home page Giter VIP logo

huse's Introduction


HUSE: Hierarchical Universal Semantic Embedding

Hierarchical Universal Semantic Embeddings
Pradyumna Narayana, Aniket Pednekar , Abishek Krishnamoorthy , Kazoo Sone , Sugato Basu

Abstract: There is a recent surge of interest in cross-modal representation learning corresponding to images and text. The main challenge lies in mapping images and text to a shared latent space where the embeddings corresponding to a similar semantic concept lie closer to each other than the embeddings corresponding to different semantic concepts, irrespective of the modality. Ranking losses are commonly used to create such shared latent space โ€” however, they do not impose any constraints on inter-class relationships resulting in neighboring clusters to be completely unrelated. The works in the domain of visual semantic embeddings address this problem by first constructing a semantic embedding space based on some external knowledge and projecting image embeddings onto this fixed semantic embedding space. These works are confined only to image domain and constraining the embeddings to a fixed space adds additional burden on learning. This paper proposes a novel method, HUSE, to learn crossmodal representation with semantic information. HUSE learns a shared latent space where the distance between any two universal embeddings is similar to the distance between their corresponding class embeddings in the semantic embedding space. HUSE also uses a classification objective with a shared classification layer to make sure that the image and text embeddings are in the same shared latent space. Experiments on UPMC Food-101 show our method outperforms previous state-of-the-art on retrieval, hierarchical precision and classification results.

Training your model

Dataset:

  • Images: Put the unzipped images in the /images folder
  • Create a data.csv file with 'img_name,text,class' columns. (See the example provided)

Config

  • The config.ini file can be modified to update training parameters such as learning_rate, graph_threshold, dataset_path, etc
  • To specify batch_size and number of epochs, specify them in the command line when calling train.py as follow python trian.py -b 512 -e 1500

Environment Setu

  • Works on
tensorflow               2.2
sentencepiece            0.1.86 

Saved Model

  • The model will be saved post training.
  • Checkpoints feature will be added soon

Model Description

model_high_level (Source: 2004.04467)

The model encorporates an architecture that leverages the semantic similarity between classes, and combines them to get image and text embeddings in a universal space. A pre-trained model is used to get image embeddings that is fed into an image tower consisting of fully connected dense layers. A pre-trained model is used to get text embeddings that are passed through a text tower also consisting of fully connected Dense layers. The output of these two towers are then passed through a single dense shared hidden layer. There are three losses that are used to train the model.

  • Softmax cross entropy loss when calculating Class level similarity
  • Graph regularization loss to calculate semantic similarity
  • Cross Modal Gap loss

The architecture in looks like this: model_schematic

To-do:

  • Add Bert model capable of running on GPU
  • Use argparse for arguments, add better arguments
  • Add example data and images
  • Save model weights half-way
  • Support for text that is longer than 512 words using TF-IDF
  • Fix batching issue - 1 EPOCH uses only one batch + Encode and store to ram

Credits and acknowledgements:

For constant support and guidance, Harsha Bommana
For helping in understanding how to tokenize inputs for BERT: https://medium.com/@vineet.mundhra/loading-bert-with-tensorflow-hub-7f5a1c722565
The hundreds of articles and blogs that explained various concepts for free

Citation

@misc{narayana2019huse,
title={HUSE: Hierarchical Universal Semantic Embeddings},
author={Pradyumna Narayana and Aniket Pednekar and Abishek Krishnamoorthy and Kazoo Sone and Sugato           Basu},
year={2019},
eprint={1911.05978},
archivePrefix={arXiv},
primaryClass={cs.CV}
 }

huse's People

Contributors

ankushmalaker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

huse's Issues

Request image zip file and csv file

Hi there
I would appreciate this fantastic code. I have a question can you upload the image data and its associated csv file that you used" netaporter_gb_images"
Appreciate it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.