Giter VIP home page Giter VIP logo

calm's Introduction

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

This code is for ICLR2021 paper: Pre-training Text-to-Text Transformers for Concept-centric Common Sense. Checkout our Project website for details!

Installation

conda create -n calm python==3.7
conda activate calm
python setup.py install
cd CALM

Preprocess for CALM

wiki pre-process

cat wiki.doc | tail -n +500000 | head -n 500000 > wiki/wiki.train.raw
cat wiki.doc | tail -n +1000000 | head -n 100000 > wiki/wiki.valid.raw

Generative Objective

python dataset_utils/concept_deshuffling_data_generation.py
python dataset_utils/keyword_lm_data_generation.py

Dataset creation for concept-order-recovering (COR) and concept-to-sentence (C2S).

Contrastive Objective

python dataset_utils/generate_discriminative_dataset.py

Dataset creation for generative question answering (QA) dataset.
There are three types of contrastive objectives (See Table 4 (b) in the paper).

Option 1: Multi-choice QA
Option 2: Generative QA
Option 3: True/False

For CALM, we use option 2, which is Generative QA.

Mix three dataset

python dataset_utils/mix_dataset.py

Pre-training

Pre-train CALM_mix

First, train the mix dataset.

python finetune.py \
    --data_dir datasets/mix \
    --output_dir outputs/calm_mix_base \
    --model_name_or_path t5-base \
    --tokenizer_name_or_path t5-base \
    --max_seq_length 256 \
    --learning_rate 5e-4 \
    --num_train_epochs 2 \
    --train_batch_size 8 \
    --graident_accumulation_steps 4 \
    --weight_decay 0.01 \
    --warmup_steps 10000 \
    --adam_epsilon 1e-6 \
    --n_gpu 4 \
    --gpu_nums 4,5,6,7 \
    --model_parallel

python finetune.py \
    --data_dir datasets/mix \
    --output_dir outputs/calm_mix_large_dp \
    --model_name_or_path t5-large \
    --tokenizer_name_or_path t5-large \
    --max_seq_length 256 \
    --learning_rate 5e-4 \
    --num_train_epochs 2 \
    --train_batch_size 8 \
    --graident_accumulation_steps 4 \
    --weight_decay 0.01 \
    --warmup_steps 10000 \
    --adam_epsilon 1e-6

Pre-train CALM

Then, train CALM using the checkpoint of mix dataset.

python finetune_generator_discriminator.py \
    --data_dir datasets/option2 \
    --checkpoint_dir outputs/calm_mix \
    --output_dir outputs/calm \
    --max_seq_length 256 \
    --learning_rate 5e-7 \
    --num_train_epochs 3 \
    --train_batch_size 8 \
    --graident_accumulation_steps 32 \
    --fp_16 False \
    --weight_decay 0.01 \
    --warmup_steps 10000 \
    --adam_epsilon 1e-6 \
    --n_gpu 8 \
    --gpu_nums 0,1,2,3,4,5,6,7

python finetune_generator_discriminator.py \
    --data_dir datasets/option2 \
    --checkpoint_dir outputs/calm_mix_base_dp \
    --output_dir outputs/calm_base_dp \
    --max_seq_length 256 \
    --learning_rate 5e-7 \
    --num_train_epochs 3 \
    --train_batch_size 8 \
    --graident_accumulation_steps 32 \
    --fp_16 False \
    --weight_decay 0.01 \
    --warmup_steps 10000 \
    --adam_epsilon 1e-6

Fine-tuning

Use checkpoint to fine-tune on the downstream tasks.

Model List

Our released models are listed as following. You can import these models by using HuggingFace's Transformers.

Model CSQA OBQA PIQA aNLI Description
danny911kr/calm-mix-base 63.02 60.40 70.07 62.79 Mix-Only
danny911kr/calm-base 63.32 60.90 71.01 63.20
danny911kr/calm-mix-large 70.26 62.50 73.70 75.99 Mix-Only
danny911kr/calm-large 71.31 66.00 75.11 77.12

calm's People

Contributors

danny911kr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

luchenji

calm's Issues

how did you split the csqa dataset

As stated in your paper,
"Results for COMMONGEN are on the test set and others are on the official development set. We tune
the hyperparameters based on the models’ performance on a in-house split dev set.",
so for csqa you split the official train set into train and dev set, right? please give me more details about the split(ratio, id)
Thank you in advance! Nice work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.