Giter VIP home page Giter VIP logo

Comments (7)

Mooler0410 avatar Mooler0410 commented on August 11, 2024

It's similar to llama or mistral / llama. Just use the function modify_method_of_instance (in modify_utils.py) to replace the forward method of Phi-2's attention with self-extend.

from longlm.

JoanZhou avatar JoanZhou commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
image
Tried a few parameters:
group_size_1=4, group_size_2=1024
group_size_1=4, group_size_2=512

from longlm.

Mooler0410 avatar Mooler0410 commented on August 11, 2024

Thanks for the feedback!

As we commented on the patch for phi-2 on 4.37

" transfromers version 4.37 (a furture version)
Should work for 'microsoft/phi-2', a offical hf version of microsfot/phi-2, check the detail in Huggingface Hub.
It's dfferent from the previous version for 'susnato/phi-2', which is the default version in transformers 4.36.2 !
Haven't done comprehensive test, but it should work."

It may have some bugs. Will check it once I'm back from the spring break.

from longlm.

Mooler0410 avatar Mooler0410 commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, we found that Phi-2 it-self cannot do passkey retrieval well 🤣. Without SelfExtend, you may try to ask the vanilla Phi-2 to do a 1.5k passkey retrieval and it will fail like the example you provided.

SelfExtend just elicits a LLM's long capabilities and does not equip it with any new capability. This means, if the model cannot finish a task well within its own pretraining window, it cannot do it on longer contexts with SelfExtend.

With the existing patch for Phi-2, on other tasks, such as 'Needle in the haystack', SelfExtend works well. Have a try!

from longlm.

Mooler0410 avatar Mooler0410 commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.

from longlm.

Mooler0410 avatar Mooler0410 commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.

This conclusion is misleading. It is one of the variants I tested works well with transformers==4.36.2.

I tested more variants of Phi-2, find that: with transformers==4.38.2, rhysjones/phi-2-orange-v2 works well, while "microsoft/phi-2" cannot work... It's super wired, considering that passkey retrieval is very simple.

All the tested models are vanilla (without SelfExtend). All the input sequences have a length of 1.5k, which is within its context window (2k).

from longlm.

Mooler0410 avatar Mooler0410 commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

I believe this response will be the last one. I have figured out what happens. Phi-2 is sensitive to the prompt format. Different from other models, Phi-2 requires using the recommended template: "To encourage the model to write more concise answers, you can also try the following QA format using "Instruct: \nOutput:"(https://huggingface.co/microsoft/phi-2). With this template, now, with transfromers==4.38.2, Phi-2 can successfully do passkey retrieval.

from longlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.