Giter VIP home page Giter VIP logo

keras-multi-head's Issues

When running MultiHeadAttention I get "AttributeError: module 'keras' has no attribute 'applications"

When runnning the code I get the error: AttributeError: module 'keras' has no attribute 'applications'

Version Info

I am using Keras version 2.4.3

Minimal Codes To Reproduce

import keras
from keras.layers import Input
from keras_multi_head import MultiHeadAttention

print(keras.getversion())

input_layer =Input(
    shape=(2, 3),
    name='Input',
)
att_layer = MultiHeadAttention(
    head_num=3,
    name='Multi-Head',
)(input_layer)
model = keras.models.Model(inputs=input_layer, outputs=att_layer)

Fitting MultiHeadAttention in memory for long sequences

I am trying to train my own sequence tagging model based on this repository implementation of MultiHeadAttention

import keras.layers as ll
from keras import Model
from keras_pos_embd import TrigPosEmbedding
from keras_multi_head import MultiHeadAttention

inputs = ll.Input(shape=(None,))
x = ll.Embedding(10000, 1024)(inputs)
x = TrigPosEmbedding(mode='add')(x)
x = MultiHeadAttention(head_num=8)(x)
x = ll.Dense(units = 512, activation='relu')(x)
x = ll.Dense(units = 4, activation='softmax')(x)
outputs = x
model = Model(inputs, outputs)
model.summary()

I have one big problem. The sequences in my training set are quite long (length upper bound by 20000), and when I attempt to train it I get an OOM.

The OOM happens when trying to allocate a [16, 20000, 20000] tensor. If my calculations are correct, just storing this vector would take >150 GB of RAM!

I was wondering if you have any suggestions on how to modify the code to make it work in a more serialized way, only loading in memory a context of the length specified by a custom parameter.

I tried going to a lower level with keras_self_attention.SeqSelfAttention and the configurable attention width, but in the end it would still try to allocate a very big tensor to my GPU.

PD: Awesome repo!

How to load a keras_multi_head model?

I have trained a model using MultiHead layer.
When I tried to load it, it raises an error as ValueError: Unknown layer: MultiHead.
I guess I have to add a custom_objects but I am not sure what it is .

feature_dim in muti_head_attention

I wonder if 'feature_dim' could be assigned by human. In your code, given the input, 'feature_dim' is fixed, so that the shape of 'Wq','Wk','Wv' is fixed.

def build(self, input_shape): if isinstance(input_shape, list): q, k, v = input_shape else: q = k = v = input_shape feature_dim = int(v[-1])
self.Wq = self.add_weight( shape=(int(q[-1]), feature_dim), self.Wk = self.add_weight( shape=(int(k[-1]), feature_dim), self.Wv = self.add_weight( shape=(int(v[-1]), feature_dim),

Multi-head Attention with 2 Input Layers

Currently, if I use two layers as input to the Multi-Head Attention layer like so:
csr_atten = MultiHeadAttention(head_num=2)([csr_doc_layer, csr_intents_layer])
it throws out the following error:

ValueError: not enough values to unpack (expected 3, got 2)

Is there a workaround for using two input layers? Or is an alternative under development?

Example in simple time-series

hello, this is the heart of transformer, GPT, and BERT architectures. I have been trying to see how to apply these architectures directly in time-series problems (not on NPL problems). Just predict the next value in a sequence and/or classify a sequence of values.

It would be nice if you could provide some simple example of how to apply this block in a simple multivariable time-series scenario (with simple values in a sequence, with no embeddings, etc), if possible.

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.