Giter VIP home page Giter VIP logo

Comments (7)

Mooler0410 avatar Mooler0410 commented on August 11, 2024

We found that after 4.36, the default attention of llama is changed from "LlamaAttention" to "LlamaSdpaAttention". Hence the replacement function will not work. Instead, you may try:

modify_method_of_instance(base_model, "LlamaAttention", "forward", self_extend_forward)
--> modify_method_of_instance(base_model, "LlamaSdpaAttention", "forward", self_extend_forward)

This might be the reason for the failure.

from longlm.

YL-9 avatar YL-9 commented on August 11, 2024

We found that after 4.36, the default attention of llama is changed from "LlamaAttention" to "LlamaSdpaAttention". Hence the replacement function will not work. Instead, you may try:

modify_method_of_instance(base_model, "LlamaAttention", "forward", self_extend_forward) --> modify_method_of_instance(base_model, "LlamaSdpaAttention", "forward", self_extend_forward)

This might be the reason for the failure.

it work, thank you.
I have another question. I want to add it here. but it can still only run normally on 4.32, and the running result of 4.36 is still wrong.
I just added the following three pieces of code and used this command to run them: CUDA_VISIBLE_DEVICES=0,1 python eval/passkey.py --model /data/supry/models/llama-2/llama2-7b-hf --min-tokens 4096 --max-tokens 8192 --tokens-step 4096 --length-step 1024 --iterations 20 --serope
MW9HYFAAUK~T{GVP AJT8S](https://github.com/datamllab/LongLM/assets/73892208/ac8c2bc8-b5f1-4215-b77c-dd37da0523e2) ![~FT7XB3NFIFLJHT%2KFFWSJ](https://github.com/datamllab/LongLM/assets/73892208/ce7832ec-db10-4f78-9dd3-ba04549495d6) ![VFY8E1_DN)BRU1~EL{~K@I

from longlm.

YL-9 avatar YL-9 commented on August 11, 2024

MW9`HYFAAUK~T{GVP AJT8S
$QY)X~9HYTK{2}0 I4K4_W

from longlm.

Mooler0410 avatar Mooler0410 commented on August 11, 2024

Hi YL-9! Could you please test whether self-extend can work by instance wise modification, like the example we provide? Sometimes, direct modification to the transformers' class does not take effect, while the cause of failure is case by case. That's the reason why we choose to modify the forward function of a model instance rather than its class. (Of course,, this can avoid any unexpected behavior for the modification only happens to the specific model instance)

from longlm.

YL-9 avatar YL-9 commented on August 11, 2024

ok, thank you!

from longlm.

ys-zong avatar ys-zong commented on August 11, 2024

Hi, thanks for the nice work! I see the current implementation in llama_self_extend_patch_4_36.py is regular pytorch. I wonder if you plan to implement Flash attention for transformers==4.36?

from longlm.

Mooler0410 avatar Mooler0410 commented on August 11, 2024

Hi, thanks for the nice work! I see the current implementation in llama_self_extend_patch_4_36.py is regular pytorch. I wonder if you plan to implement Flash attention for transformers==4.36?

Hi, thank you for your interests. The main different between transformers==4.36 and transformers==4.38.2 is how the RoPE is applied to KV. You may have a check. The computation of self attention is nearly the same. This means you can follow our 4.38.2 implementation to have a flash attention implementation for 4.36 with minor modification.

One of the possible issues is the flash_attn version used by 4.36. In that case, you may use our triton flash attention implementation instead of flash_attn. It's 10~20% slower than flash_attn.

from longlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.