Comments (3)
Did some additional legwork on this "su" scalilng and here's what I came up with...hope it helps, and hope that implementing it still allows someone to use the new flash attention. And as I'm learning, apparently useful when working with large language models to be knowledgeable about a little thing called "math..."
Link to su rope scaling as a jumping off point for ya...
Here's a summary of how it's implemented overall in the script, unless I'm mistaken...
Phi3SuScaledRotaryEmbedding Class
- Inheritance: Inherits from
Phi3RotaryEmbedding
. - Initialization:
- Initializes
self.short_factor
andself.long_factor
fromconfig.rope_scaling
.- These factors are used to scale the frequency of the rotary embeddings based on the sequence length.
- Initializes
self.original_max_position_embeddings
fromconfig.original_max_position_embeddings
.- This value is used as a threshold to determine whether to apply the
short_factor
orlong_factor
scaling.
- This value is used as a threshold to determine whether to apply the
- Initializes
- Method Overrides:
- Overrides the
forward
method to apply "su" rope scaling based on sequence length:- If the sequence length is greater than
self.original_max_position_embeddings
, it applies thelong_factor
scaling. - Otherwise, it applies the
short_factor
scaling. - The scaling is done by multiplying the inverse frequency (
self.inv_freq
) by the respective factor. - The scaled inverse frequency is then used to compute the rotary embeddings.
- The embeddings are further scaled by a
scaling_factor
that depends on the ratio ofmax_position_embeddings
tooriginal_max_position_embeddings
. - The resulting scaled cosine and sine embeddings are returned.
- If the sequence length is greater than
- Overrides the
Phi3Attention Class
- Method Details:
- In the
_init_rope
method:- Checks if
self.rope_scaling
is notNone
. - If rope scaling configuration is provided, it determines the scaling type based on
self.config.rope_scaling["type"]
.
- Checks if
- If
scaling_type == "su"
:- Initializes
self.rotary_emb
as an instance ofPhi3SuScaledRotaryEmbedding
. - This ensures that the "su" rope scaling is applied to the rotary embeddings during the attention computation.
- Initializes
- The
Phi3SuScaledRotaryEmbedding
instance is created with the appropriate configuration, including thedim
(head dimension) andconfig
(model configuration).
- In the
from ctranslate2.
Thank you for your information. We don't have time to implement it now. Will try support su rope scaling in the future.
from ctranslate2.
Closing due to it successfully being implemented in release 4.3
from ctranslate2.
Related Issues (20)
- Dynamic LoRA switching HOT 1
- [SOLVED] Running Llama3 with Ctranslate2 HOT 4
- target_prefix latency HOT 2
- Unexpected inference results from Flan-T5 XXL converted to ctranslate2 with version 4.2.1 and 4.1.1 (using tensor parallel) HOT 4
- How to compile from source on windows 11? HOT 3
- Can't hide GPUs to get_cuda_device_count() HOT 5
- opus-mt-en-zh does not respect the end token
- I got invalid conversion error when compile on linux HOT 2
- CTranslate2 cmake error when trying to build the code from source with cuda support enabled on Windows. HOT 6
- libctranslate2-81fc0d88.so.4.2.1 in python package has executable stack flag
- Whisper encode roughly 4x slower than openai/pytorch HOT 1
- Option --self_attn_type scaled-dot-flash is not supported (supported values are: scaled-dot) HOT 6
- Doesn't build without docker. libiomp5 not found HOT 6
- Clang unusual switches wrongly hardcoded in resulting setup.py HOT 3
- Support for Phi3-Small, Medium, and Vision HOT 1
- Different results when run with tensor parallelism HOT 2
- CUDA DeviceAllocate segfault HOT 3
- Converter not working for NLLB models HOT 5
- Ctranslate2 Pypi exceeds limit 20GB
- Facing issues with Ctranslate2 when working with Intel built-in GPU and oneDNN HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctranslate2.