Comments (7)
It's similar to llama or mistral / llama. Just use the function modify_method_of_instance (in modify_utils.py) to replace the forward method of Phi-2's attention with self-extend.
from longlm.
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters:
group_size_1=4, group_size_2=1024
group_size_1=4, group_size_2=512
from longlm.
Thanks for the feedback!
As we commented on the patch for phi-2 on 4.37
" transfromers version 4.37 (a furture version)
Should work for 'microsoft/phi-2', a offical hf version of microsfot/phi-2, check the detail in Huggingface Hub.
It's dfferent from the previous version for 'susnato/phi-2', which is the default version in transformers 4.36.2 !
Haven't done comprehensive test, but it should work."
It may have some bugs. Will check it once I'm back from the spring break.
from longlm.
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512
Hi, we found that Phi-2 it-self cannot do passkey retrieval well 🤣. Without SelfExtend, you may try to ask the vanilla Phi-2 to do a 1.5k passkey retrieval and it will fail like the example you provided.
SelfExtend just elicits a LLM's long capabilities and does not equip it with any new capability. This means, if the model cannot finish a task well within its own pretraining window, it cannot do it on longer contexts with SelfExtend.
With the existing patch for Phi-2, on other tasks, such as 'Needle in the haystack', SelfExtend works well. Have a try!
from longlm.
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512
Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.
from longlm.
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512
Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.
This conclusion is misleading. It is one of the variants I tested works well with transformers==4.36.2.
I tested more variants of Phi-2, find that: with transformers==4.38.2, rhysjones/phi-2-orange-v2 works well, while "microsoft/phi-2" cannot work... It's super wired, considering that passkey retrieval is very simple.
All the tested models are vanilla (without SelfExtend). All the input sequences have a length of 1.5k, which is within its context window (2k).
from longlm.
Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512
I believe this response will be the last one. I have figured out what happens. Phi-2 is sensitive to the prompt format. Different from other models, Phi-2 requires using the recommended template: "To encourage the model to write more concise answers, you can also try the following QA format using "Instruct: \nOutput:"(https://huggingface.co/microsoft/phi-2). With this template, now, with transfromers==4.38.2, Phi-2 can successfully do passkey retrieval.
from longlm.
Related Issues (20)
- Example for gemma & use with Ollama HOT 5
- OOM on LongBench HOT 1
- FlashAttention does not work for Batch size > 1
- What effect on qwen1.5 will be if i use self-extend trick? HOT 3
- Something wrong in modify_method_of_instance function HOT 2
- Question | Has anyone tried this with GGUF models? HOT 2
- Long context
- Questions regarding group query/key positional index HOT 2
- Question about equation 4 and Table 5 caption in paper HOT 3
- llama3 is not working. HOT 1
- Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! when resuming training
- Passkey retrieval (needle in a haystack) HOT 2
- Run example.py Error: Failed to modify the attention method of LlamaForCausalLM HOT 2
- Differences with ReRoPE HOT 1
- 是否有示例代码支持对safetensors格式LLM启用SelfExtend HOT 5
- Cohere command r HOT 1
- TypeError: 'NoneType' object is not subscriptable HOT 1
- LongLM really has great potential. HOT 1
- About GPU memory usage
- LongLM isn't compatible with gemma-2-27b-it or gemma-2b-it HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from longlm.