Giter VIP home page Giter VIP logo

mac's Introduction

MAC

Official PyTorch implementation of "Online Adaptation of Language Models with a Memory of Amortized Contexts".

Conda

conda create -n mac python=3.8 -y
conda activate mac

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121  # cu181 for cuda 11.1
pip install transformers==4.36.2 peft==0.7.1 accelerate==0.25.0 ipykernel==6.29.0 hydra-core==1.2.0 higher==0.2.1 pandas==2.0.3 datasets==2.16.1 spacy==3.7.2 Pillow==10.2.0 matplotlib==3.7.4 protobuf==4.25.2 einops==0.7.0 wandb==0.16.2 bitsandbytes==0.42.0 sentencepiece==0.1.99 deepspeed==0.13.1

Prepare data

Download data to /data folder
or change the data_dir in ./conf/dataset/<DATASET_NAME>.yaml

How to run

WANDB: To use weight and bias (wandb) logging

  • Create a wandb account and get your wandb key
  • Set wandb_key in ./conf/config.yaml as your wandb key
  • wandb_project in ./conf/config.yaml is the name of your wandb project
  • wandb_entity in ./conf/config.yaml is your wandb entity name
  • Set wandb_log as false if you don't want to use wandb logging

DATA and CACHE: Some important paths

  • ./conf/dataset/streamingqa.yaml: dataset path
  • CACHE_DIR in ./conf/config.yaml: cache path for huggingface model download (e.g., GPT2, T5 model parameters and tokenizers)

BATCH_SIZE: Have verified that the current batch size in the config file is able to run with 2 GPUs (48GB each)

  • Actual batch size: update_batch_size * grad_acc_steps
  • update_batch_size: batch size for 1 iteration (considering all gpus)
  • grad_acc_steps: number of gradient accumulation steps
  • batch size per gpu for 1 iteration: update_batch_size // number of gpus

Use bf16 for mixed precision training as fp16 does not go well with t5 (see: huggingface/transformers#17978)

# train distillgpt2
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m accelerate.commands.launch --config_file ./conf/accelerate_config.yaml --num_processes=4 main.py mode=amortize_encdec_distillgpt2 dataset=streamingqa

# train gpt2-large
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m accelerate.commands.launch --config_file ./conf/accelerate_config.yaml --num_processes=4 main.py mode=amortize_encdec_gpt2large dataset=streamingqa mixed_precision=bf16 

# train gpt2-xl
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m accelerate.commands.launch --config_file ./conf/accelerate_config.yaml --num_processes=4 main.py mode=amortize_encdec_gpt2xl dataset=streamingqa mixed_precision=bf16 

# train llama2
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m accelerate.commands.launch --config_file ./conf/zero2_config.yaml --num_processes=4 main.py mode=amortize_encdec_llama2_7b dataset=streamingqa mixed_precision=bf16 quant_type=nf4 llama_cache_dir=<LLAMA_PATH>

Evaluation code

# Evaluate on StreamingQA
CUDA_VISIBLE_DEVICES=0 python eval.py mode_eval=amortize_encdec_distillgpt2 dataset=streamingqa load_path=<LOAD_PATH>

mac's People

Contributors

jihoontack avatar

Stargazers

 avatar  avatar Jeff Carpenter avatar  avatar Zeyu Zhang avatar Minki Kang avatar zhujiem avatar  avatar stdKonjac avatar Yukuo Cen avatar Asmaa Ali avatar Bradley Gram-Hansen avatar  avatar  avatar Jongheon Jeong avatar Minbeom Kim avatar  avatar Ligong Han avatar Dongfang Li avatar Jongjin Park avatar 陈越 (Chen Yue) avatar Changyeon Kim avatar Yuan-Man avatar Xingjian Du avatar  avatar Christian Gerloff avatar Congchi Yin avatar Yan Kaiqiang avatar Younggyo Seo avatar Shoaib Ahmed Siddiqui avatar  avatar 南栖 avatar  avatar  avatar Jeffrey Ma avatar Shyam Sudhakaran avatar Sihyun Yu avatar Pumpkin avatar carbonz avatar  avatar Junsu Kim avatar Subin Kim avatar

Watchers

 avatar

Forkers

i3ullbum

mac's Issues

A question about the use of "qa_id" of each dataset item

Thanks for your great work and I am a new guy in this research area. I try to follow your code and one question confuses me. In the common/dataset.py, we just concat the question and answer pair to form the input_ids and attention_masks.
image
And in the models/amortized_enddec.py, we input the "qa_ids" and "qa_attention" into base_lm for the "qa_loss".
image
I wonder if this will cause information leak to the llm, since we already input the answer with the question to it? As I mentioned I am new in this area and not familiar with the whole online adaptation pipeline. Would you kindly please give me some ideas? Looking forward to your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.