Giter VIP home page Giter VIP logo

seq2emb's Introduction

From Seq2Seq to Handwritten Word Embeddings

This repository contains the code for generating word embeddings and reproducing the experiments of our work: G. Retsinas, G. Sfikas, C. Nikou & P. Maragos, "From Seq2Seq to Handwritten Word Embeddings", BMVC, 2021.

We provide PyTorch implementations of all the main components of the proposed DNN system that can generate discriminative word embeddings, ideal for Keyword Spotting. The core idea is to make use of the encoding output of a Sequence-to-Sequence architecture, trained for performing word recognition.

Overview of overall system (two recognition branches: CTC & Seq2Seq):

Architectural Details of the different components:

Overview of the different setups of the proposed system. Evaluation can be categorized to recognition and three spotting cases: Query-by-Example (QbE), Query-by-String (QbS) and QbS by force-alignment (FA). Functionality


Getting Started

Installation

git clone https://github.com/georgeretsi/Seq2Emb
cd Seq2Emb

Install PyTorch and 1.7+ and the other dependencies (e.g., scikit-image), as stated in requirements.txt file. For pip users, please type the command pip install -r requirements.txt.

Initially tested on PyTorch 1.7 and subsequently tested on 1.9 version

Dataset Setup (IAM)

The provided code includes a loader for the IAM Handwriting Database (https://fki.tic.heia-fr.ch/databases/iam-handwriting-database). Registration is required for downloading (https://fki.tic.heia-fr.ch/databases/download-the-iam-handwriting-database).

In order to use the provided loader the IAM datasets should be structured as follows (and the path should be provided as argument):

IAMpath
│    
└───set_split
│   │   trainset.txt
│   │   testset.txt
│   │   ...
│   
└───ascii
│   │   words.txt
│   │   lines.txt
│   │   ...
│   
└───words
    │  ...

Training and Configuration Options

Train the models with default configurations:

train_words.py --dataset IAM --dataset_path IAMpath --gpu_id 0

The gpu_id argument corresponds to the ID of the GPU to be used (single-GPU implementation).

Extra arguments related to the proposed work are:

  • lowercase (bool) : use reduced lowercase character set (typically used in KWS)
  • path_ctc (bool) : add the CTC branch
  • path_s2s (bool) : add the Seq2Seq branch
  • path_autoencoder (bool) : add the character encoder branch (forming an autoencoder)
  • train_external_lm (bool) : use an external lexicon for training the autoencoder path
  • binarize (bool) : binarize the word embedding
  • feat_size (int) : define the size of word embeddings

General configuration variables, such as training options (initial learning rate, batch size etc.) and architecture options (depth and configuration of the DNN components), can be set through the file config.py.

NOTE: example.py contains an alternative way to train the system using the function evaluate_setting of train_words.py


Citation:

@inproceedings{retsinas2021from,
  title={From Seq2Seq to Handwritten Word Embeddings},
  author={Retsinas, George and Sfikas, Giorgos and Nikou, Christophoros and Maragos, Petros},
  booktitle={British Machine Vision Conference (BMVC)},
  year={2021},
}

seq2emb's People

Contributors

dependabot[bot] avatar georgeretsi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

aniketgurav

seq2emb's Issues

set_split

Hello,

I am able to download the IAM dataset but I couldn't find the set_split. would you please help me in the content of the set_split text files?

thanks

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Hello,

Thank you for your prompt response.
can you please help me on this error?

cnn size:5710784
ctc head size:5710784
enc size:2497024
cdec size:1604636
cenc size:2504192
Training:
binarize: False
/users/noufcs/.local/lib/python3.6/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Testing at epoch 5
Traceback (most recent call last):
  File "train_words.py", line 542, in <module>
    evaluate_setting(setting, dataset, gpu_id)
  File "train_words.py", line 451, in evaluate_setting
    test_funcs.test(epoch, test_loader, models, setting_dict, gpu_id, logger)
  File "/mydata/exp/Seq2Emb/test_funcs.py", line 41, in test
    tdecs = np.concatenate(tdecs)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.