Giter VIP home page Giter VIP logo

infinitransformer's Introduction

InfiniTransformer

Unofficial PyTorch/🤗Transformers implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, with Llama3 and Gemma model supported. (Llama 2 and 1 is also supported)

Two types of Implementation for Infini-Attention

Type I. Infini Attention in Model-wise, Trainer-wise

  • Overrides modeling and config python files.
  • Full edit, Not compatible with basic HF trainer.
  • Need custom training code
  • Memory usage is much lower than SDPA(default) attention
    • can train Gemma-2B with 32768 seq len(2048*16) on 2x H100 80G (with AdamW optimizer, No gradient checkpointing)
    • can train Llama-3-8B with 1M seq len(2048*512) on 2x H100 80G (with Adafactor optimizer, no grad checkpointing)
  • Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)

Type II. Infini Attention in Attention-Layer only

  • Overrides modeling python file only, especially Attention layer only.
  • Minimal edit, fully compatible with HF(Trainer, etc)
  • Memory usage is ~eq with SDPA(default) attention
    • can train Gemma-2B with 8192 seq len(128*64) on 2x H100 80G (with Adafactor Optimizer + Gradient Checkpointing)

How to use Type I. Infini Attention in Model-wise, Trainer-wise.

1. Clone this repository

git clone https://github.com/Beomi/InfiniTransformer

2. Install dependencies

We need to install the latest version(b109257f4f) of 🤗Transformers from the source code.

pip install -r requirements.txt
pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers
# or just pip install transformers

3. Run the example(Inference, simple forward/backward test)

python test_basic.infini.py

4. Train with your data

Train Llama-3 1M seq len with 2K segment size, with MiniPile Dataset

./train.llama.infini.noclm.1Mseq.sh

or

Train Gemma-2B 32K seq len with 2K segment size, with WikiText2 Dataset

./train.gemma.infini.noclm.sh

or

Train Gemma-2B 1M seq len with 2K segment size, with MiniPile Dataset

./train.gemma.infini.noclm.1Mseq.sh

How to use Type II. Infini Attention in Attention-Layer only

1. Clone this repository

git clone https://github.com/Beomi/InfiniTransformer

2. Install dependencies

We need to install the latest version(b109257f4f) of 🤗Transformers from the source code.

pip install -r requirements.txt
pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers

3. Remove original modeling_gemma.py, make a symbolic link with new modeling_gemma.py

python test_basic.infini.py

4. Run the example(Inference, simple forward/backward test)

python test_basic.py

5. Train with your data

python test_train.small.gemma.py
image

Example code used wikitext-2-raw-v1 from https://huggingface.co/datasets/wikitext

Here's the test wandb log here -> https://api.wandb.ai/links/beomi2/1rsqrkfn

6. Inference

python test_basic.trained.py

Sample Generation w/ 1-epoch Trained Model on WikiText2

Input:

This work introduces an efficient method to scale Transformer-based

Output1:

models for denoising , denoising denoising , and deep denoising of images of the U2 EPK model , using a coefficient that is a function of the depth of the image resolution . The paper experiments with image denoising by Turbo @-@ based filtering , denoising by generative adversarial networks , and video denoising by denoising each of the three elements of the video ( color of the pixels / frames ) . The results are considered fair . The video is not discussed . The paper is not considering an actual application in an industrial context , line is probably a 1 . It is built in the Nohmi…

Output2:

vision models across platforms using a custom architecture optimized for both vision ( 3D / 2D ) and vision and language . In other words , a single model can run on different types of devices , a feature that is critical for the development of general @-@ purpose and large-scale AI ( see also : The One @-@ Model @-@ for @-@ All @-@ Things @-@ AI Problem ) . The model is the first to reach a global scale ( 200 GPU + ) on a single GPU using the Transformer and its variants . The model can run at the end of 1967 . He had his family relocated to a house in a nearby neighborhood , where they lived for five years , before returning to their primary residence in St. Petersburg . Later comments of 1968 made by his fellow musician Bruce Hornsby made it clear that he had gone through a lot , both personally and professionally .

infinitransformer's People

Contributors

beomi avatar eltociear avatar liberatedwinner avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.