Giter VIP home page Giter VIP logo

sammt's Introduction

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

This repository contains code for EMNLP'23 submission "Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation".

Get started

Text-to-image Generation Environment:
conda create -n stable python==3.8
pip install torch==2.0.1
pip install Pillow==9.5.0
pip install transformers==4.27.4
pip install diffusers==0.16.1
pip install scipy==1.10.1
pip install accelerate==0.18.0

Training environment:
conda create -n sammt python==3.6.7
pip install -r requirements.txt
pip install --editable ./

Data

Multi30K texts and images can be downloaded here and here. We get Multi30K text data from fairseq_mmt.

cd fairseq_sammt
git clone https://github.com/multi30k/dataset.git
git clone https://github.com/BryanPlummer/flickr30k_entities.git
# Organize the downloaded dataset
flickr30k
├─ flickr30k-images
├─ test_2017_flickr
└─ test_2017_mscoco
multi30k-dataset
└─ data
    └─ task1
        ├─ tok
        └─ image_splits

Text-to-image Generation

conda activate stable
python train_stable_diffusion_step50.py train

script parameters:

  • dataset: $1: choices=['train','valid','test', 'test1', 'test2']

Extract Image Feature

conda activate sammt
python image_process.py train synth

script parameters:

  • dataset:$1: choices=['train','valid','test', 'test1', 'test2']
  • synthetic or authentic images: $2: choices=['synth','authe']

The pre-extracted image features can also be downloaded here.

Train and Test

1. Preprocess

conda activate sammt
bash preprocess.sh

2. Train

bash train_mmt.sh

3. Test

# bash translate_mmt.sh $1 $2 $3
bash translate_mmt.sh clip test synth

script parameters:

  • image feature: $1: choices=['clip']
  • test set: $2: choices=['test', 'test1', 'test2']
  • inference with synthetic or authentic images: $3: choices=['synth', 'authe']

Acknowledgements

This project is built on several open-source repositories/codebases, including:

sammt's People

Contributors

christina0717 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.