Could you please release the example for Phi-2? Thanks.

Example for phi2? about longlm HOT 7 CLOSED

JoanZhou commented on August 11, 2024

Example for phi2?

from longlm.

Comments (7)

Mooler0410 commented on August 11, 2024

It's similar to llama or mistral / llama. Just use the function modify_method_of_instance (in modify_utils.py) to replace the forward method of Phi-2's attention with self-extend.

from longlm.

JoanZhou commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:

Tried a few parameters:
group_size_1=4, group_size_2=1024
group_size_1=4, group_size_2=512

from longlm.

Mooler0410 commented on August 11, 2024

Thanks for the feedback!

As we commented on the patch for phi-2 on 4.37

" transfromers version 4.37 (a furture version)
Should work for 'microsoft/phi-2', a offical hf version of microsfot/phi-2, check the detail in Huggingface Hub.
It's dfferent from the previous version for 'susnato/phi-2', which is the default version in transformers 4.36.2 !
Haven't done comprehensive test, but it should work."

It may have some bugs. Will check it once I'm back from the spring break.

from longlm.

Mooler0410 commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, we found that Phi-2 it-self cannot do passkey retrieval well 🤣. Without SelfExtend, you may try to ask the vanilla Phi-2 to do a 1.5k passkey retrieval and it will fail like the example you provided.

SelfExtend just elicits a LLM's long capabilities and does not equip it with any new capability. This means, if the model cannot finish a task well within its own pretraining window, it cannot do it on longer contexts with SelfExtend.

With the existing patch for Phi-2, on other tasks, such as 'Needle in the haystack', SelfExtend works well. Have a try!

from longlm.

Mooler0410 commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.

from longlm.

Mooler0410 commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.

This conclusion is misleading. It is one of the variants I tested works well with transformers==4.36.2.

I tested more variants of Phi-2, find that: with transformers==4.38.2, rhysjones/phi-2-orange-v2 works well, while "microsoft/phi-2" cannot work... It's super wired, considering that passkey retrieval is very simple.

All the tested models are vanilla (without SelfExtend). All the input sequences have a length of 1.5k, which is within its context window (2k).

from longlm.

Mooler0410 commented on August 11, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

I believe this response will be the last one. I have figured out what happens. Phi-2 is sensitive to the prompt format. Different from other models, Phi-2 requires using the recommended template: "To encourage the model to write more concise answers, you can also try the following QA format using "Instruct: \nOutput:"(https://huggingface.co/microsoft/phi-2). With this template, now, with transfromers==4.38.2, Phi-2 can successfully do passkey retrieval.

from longlm.

Example for phi2? about longlm HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent