Giter VIP home page Giter VIP logo

Comments (3)

BBC-Esq avatar BBC-Esq commented on June 14, 2024

Did some additional legwork on this "su" scalilng and here's what I came up with...hope it helps, and hope that implementing it still allows someone to use the new flash attention. And as I'm learning, apparently useful when working with large language models to be knowledgeable about a little thing called "math..."

Link to su rope scaling as a jumping off point for ya...

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/810c2e601a82110f6b709183a8ab5416d4a27f75/modeling_phi3.py#L142

Here's a summary of how it's implemented overall in the script, unless I'm mistaken...

Phi3SuScaledRotaryEmbedding Class

  • Inheritance: Inherits from Phi3RotaryEmbedding.
  • Initialization:
    • Initializes self.short_factor and self.long_factor from config.rope_scaling.
      • These factors are used to scale the frequency of the rotary embeddings based on the sequence length.
    • Initializes self.original_max_position_embeddings from config.original_max_position_embeddings.
      • This value is used as a threshold to determine whether to apply the short_factor or long_factor scaling.
  • Method Overrides:
    • Overrides the forward method to apply "su" rope scaling based on sequence length:
      • If the sequence length is greater than self.original_max_position_embeddings, it applies the long_factor scaling.
      • Otherwise, it applies the short_factor scaling.
      • The scaling is done by multiplying the inverse frequency (self.inv_freq) by the respective factor.
      • The scaled inverse frequency is then used to compute the rotary embeddings.
      • The embeddings are further scaled by a scaling_factor that depends on the ratio of max_position_embeddings to original_max_position_embeddings.
      • The resulting scaled cosine and sine embeddings are returned.

Phi3Attention Class

  • Method Details:
    • In the _init_rope method:
      • Checks if self.rope_scaling is not None.
      • If rope scaling configuration is provided, it determines the scaling type based on self.config.rope_scaling["type"].
    • If scaling_type == "su":
      • Initializes self.rotary_emb as an instance of Phi3SuScaledRotaryEmbedding.
      • This ensures that the "su" rope scaling is applied to the rotary embeddings during the attention computation.
    • The Phi3SuScaledRotaryEmbedding instance is created with the appropriate configuration, including the dim (head dimension) and config (model configuration).

from ctranslate2.

minhthuc2502 avatar minhthuc2502 commented on June 14, 2024

Thank you for your information. We don't have time to implement it now. Will try support su rope scaling in the future.

from ctranslate2.

BBC-Esq avatar BBC-Esq commented on June 14, 2024

Closing due to it successfully being implemented in release 4.3

from ctranslate2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.