Giter VIP home page Giter VIP logo

liuyejia / vault Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gchochla/vault

0.0 0.0 0.0 770 KB

This repo contains the original implementation of VAuLT, the Vision-and-Augmented-Language Transformer. We provide instructions to download some multimodal social-media datasets, and scripts to experiment with. VAuLT is a stack of Transformers, a LM like BERT that preprocesses the text input of ViLT

Home Page: https://arxiv.org/abs/2208.09021

License: MIT License

Shell 4.10% Python 95.90%

vault's Introduction

VAuLT: Vision-and-Augmented-Language Transformer

Code for VAuLT: Augmenting the Vision-and-Language Transformer with the Propagation of Deep Language Representations.

VAuLT


Installation

Experiments were conducted with Python 3.7.4. To install the necessary packages (preferably in a virtual environment):

pip install -e .

If you want to alter the implementation, we suggest you install with:

pip install -e .[dev]

(Note to zsh users: escape [ and ]).

Entity Linking

While our experiments showed no improvement from the implemented form of entity linking, you can use it for the TWITTER-1# datasets by following these instructions (we install their package using pip in setup.py, aka the above commands) and enabling it when running the scripts.


Using VAuLT

from vault.models.vault import VaultModel, VaultProcessor
import requests
from PIL import Image

vilt_model_name_or_path = "dandelin/vilt-b32-mlm"
bert_model_name_or_path = "vinai/bertweet-base"

processor = VaultProcessor.from_pretrained(
    vilt_model_name_or_path, bert_model_name_or_path
)
model = VaultModel.from_pretrained(vilt_model_name_or_path, bert_model_name_or_path)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
text = "a bunch of [MASK] laying on a [MASK]."

encoding = processor(image, text, return_tensors="pt")
outputs = model(**encoding)
outputs.keys()  # odict_keys(['last_hidden_state', 'pooler_output'])

Datasets

TWITTER-1#

Follow the instructions here. We expect the final structure of the dataset to be:

/data/twitter-tmsc/
├──twitter2015
│  ├── dev.tsv
│  ├── dev.txt
│  ├── test.tsv
│  ├── test.txt
│  ├── train.tsv
│  └── train.txt
├──twitter2015_images
│  ├── 996.jpg
│  ├── 997.jpg
│  ├── 998.jpg
│  ├── ...
│  └── 123456789.jpg
├──twitter
│  ├── dev.tsv
.  .
.  .
.  .

but the dataset configuration is configurable within the code (not the given scripts).

MVSA

Find the links here. We expect the final structure of the dataset to be:

/data/mvsa/
├── MVSA
│   ├── corrupt_ids.txt
│   ├── data
│   │   ├── 10000.jpg
│   │   ├── 10000.txt
│   │   ├── 10001.jpg
│   │   ├── 10001.txt
│   │   ├── 10002.jpg
│   │   ├── 10002.txt
│   │   ├── ...
│   │   └── 9998.txt
│   └── labelResultAll.txt
└── MVSA_Single
    ├── data
    │   ├── 1000.jpg
    │   ├── 1000.txt
    │   ├── 1001.jpg
    │   ├── 1001.txt
    │   ├── 1002.jpg
    │   ├── 1002.txt
    │   ├── ...
    │   └── 9.txt
    └──  labelResultAll.txt

where corrupt_ids.txt contains 3151, 3910 and 5995 in separate lines.

Bloomberg

Download annotations and text here and the images here (based on this). We expect the final structure of the dataset to be:

/data/bloomberg-twitter-text-image/
├── bloomberg-textimage.csv
└── Twitter_images
    ├── T1011681837276639236.jpg
    ├── T1011685052982415360.jpg
    ├── T1011695871761833987.jpg
    ├── T1011697874470653952.jpg
    ├── T1011698048647360512.jpg
    ├── T1011699257290739713.jpg
    ├── T1011700728883220482.jpg
    ├── ...

Running the scripts

To replicate all existing results, run:

chmod +x /scripts/*

./scripts/test-results.sh -t /data/twitter-tmsc -b /data/bloomberg-twitter-text-image -m /data/mvsa -c 0 -r 5

./scripts/frozen-lms.sh -t /data/twitter-tmsc -b /data/bloomberg-twitter-text-image -c 0 -r 5

./scripts/toms.sh -t /data/twitter-tmsc -c 0 -r 5

The dataset directories are given based on the directory structures above, and you can change the device (-c) or the number of repetitions (-r) to your liking.

The final metrics will be available at ./experiment_logs under subfolders with descriptive names, e.g.

experiment_logs/
├── TomBERTTMSC
│   ├── None,twitter2015(train),bert-base-uncased,False_0
│   └── None,twitter(train),bert-base-uncased,False_0
├── TomViLTTMSC
│   ├── twitter2015(train),bert-base-uncased,vilt-b32-mlm,False_0
│   ├── twitter2015(train),bert-base-uncased,vilt-b32-mlm,False_1
│   ├── twitter(train),bert-base-uncased,vilt-b32-mlm,False_0
│   └── twitter(train),bert-base-uncased,vilt-b32-mlm,False_1
├── VaultTMSCBloomberg
│   ├── bert-base-uncased,bloomberg-twitter-text-image(train;dev),vilt-b32-mlm_0
│   ├── bert-base-uncased,bloomberg-twitter-text-image(train;dev),vilt-b32-mlm_1
│   ├── bertweet-base,bloomberg-twitter-text-image(train;dev),vilt-b32-mlm_0
│   ├── bertweet-base,bloomberg-twitter-text-image(train;dev),vilt-b32-mlm_1
│   └── None,bloomberg-twitter-text-image(train;dev),vilt-b32-mlm_1
├── VaultTMSCMVSA
│   ├── bert-base-uncased,MVSA_Single(train;dev),vilt-b32-mlm_0
│   ├── bert-base-uncased,MVSA(train;dev),vilt-b32-mlm_0
│   ├── bertweet-base,MVSA_Single(train;dev),vilt-b32-mlm_0
│   ├── bertweet-base,MVSA(train;dev),vilt-b32-mlm_0
│   ├── None,MVSA_Single(train;dev),vilt-b32-mlm_0
│   └── None,MVSA(train;dev),vilt-b32-mlm_0
└── VaultTMSCTwitter201X
    ├── bert-base-uncased,twitter2015(train;dev),vilt-b32-mlm,False_0
    ├── bert-base-uncased,twitter2017(train;dev),vilt-b32-mlm,False_0
    ├── bertweet-base,twitter2015(train;dev),vilt-b32-mlm,False_0
    ├── bertweet-base,twitter2015(train;dev),vilt-b32-mlm,False_1
    ├── bertweet-base,twitter(train;dev),vilt-b32-mlm,False_0
    ├── bertweet-base,twitter(train;dev),vilt-b32-mlm,False_1
    ├── None,twitter2015(train;dev),vilt-b32-mlm,False_0
    └── None,twitter(train;dev),vilt-b32-mlm,False_0

The name of each specific experiment subfolder contains some important information like which models were used, which splits, etc. The final integer is used to differentiate runs with different hyperparameters if they do not appear in the main name (the same experiment ran twice will log additional experiments in the same subfolder). Each of the above "leaf" subfolders will look like:

experiment_logs/VaultTMSCTwitter201X/bert-base-uncased,twitter2015(train;dev),vilt-b32-mlm,False_0/
├── aggregated_metrics.yml
├── metrics.yml
├── obj.pkl
├── params.yml
└── plots
    └── train_loss.png

aggregated_metrics.yml contains the mean and the standard deviation of all logged metrics across multiple runs for the epoch designated as the final one (e.g. simple training will use the last epoch, early stopping will use the corresponding step, see example below), metrics.yml contains all the logged metrics, params.yml contains the hyperparameter configuration, plots contains plots of the metrics that are logged across all training (not final metrics, e.g. those of the test set), and obj.pkl contains the logger object, which you can re-load by using the load_existent method of vault.logging_utils.ExperimentHandler.

? ''
: best_train_loss: 0.0016+-0.0002
  test_eval_accuracy: 0.7563+-0.0084
  test_eval_loss: 1.5356+-0.0650
  test_macro_f1_score: 0.6997+-0.0171

vault's People

Contributors

gchochla avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.