i print config, buf find there is no attn_implementation = "eager", here is the outpu

You should print(model), then you'll see: <div class="snippet-clipboard-content no

config no attn_implementation = "eager" about infinitransformer HOT 4 CLOSED

beomi commented on July 27, 2024

config no attn_implementation = "eager"

from infinitransformer.

Comments (4)

Beomi commented on July 27, 2024

You should print(model), then you'll see:

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaInfiniAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear(in_features=16384, out_features=2048, bias=False)
          (act_fn): PytorchGELUTanh()
        )
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
      )
    )
    (norm): GemmaRMSNorm()
  )
  (lm_head): Linear(in_features=2048, out_features=256000, bias=False)
)

from infinitransformer.

Beomi commented on July 27, 2024

Since using attn_implementation='eager' and overriding GEMMA_ATTENTION_CLASSES like this, is not optimal way and confusing -- but since the HF does not allow attention classes so currently I overrode eager class, from original GemmaAttention into GemmaInfiniAttention.

GEMMA_ATTENTION_CLASSES = {
    "eager": GemmaInfiniAttention,  # GemmaAttention,
    "flash_attention_2": GemmaFlashAttention2,
    "sdpa": GemmaSdpaAttention,
}

from infinitransformer.

Beomi commented on July 27, 2024

oh BTW, the default value of attn_implementation is "spda" for HF Gemma.

from infinitransformer.