iamrakesh28 / video-prediction Goto Github PK

View Code? Open in Web Editor NEW

39.0 3.0 11.0 4.27 MB

Implementation of Transformer Encoder Decoder Architecture for Video Predictions

Python 100.00%

transformer-architecture video-prediction conv-lstm encoder-decoder-architecture

video-prediction's People

Contributors

Stargazers

Watchers

Forkers

chamida ruher tsingzao 5l1v3r1 lus24 anilkunwar pphuangyi cswin petrelli panthersuper froghop

video-prediction's Issues

Conv2D input shape is 4D（batch_size, rows, cols, depth）while this work input shape is 5D (batch_size, target_seq_len, rows, cols, depth)

Hello, I was trying to run this code. However, when I was trying to train the model, an error occur:

Input 0 of layer conv2d is incompatible with the layer: expected ndim=4, found ndim=5. Full shape received: [8, 5, 40, 40, 1]

8 is batch size, 5 is target sequence length, 40x40 is rows x cols and 1 is depth.

I just checked the source code and found that in "encoding" and "decoding" step, we have to run conv2d function, which requires a 4D input [batch size, rows, cols, channels]

How to tackle this problem?

the Channel dim of a color picture

Hello~ I am studying your code and i have a question about how the model handle the color image due to I can't find the RGB Channel when frame sequence input into the model.
In the multi_head_attention.py, at the beginning of the call method (after self.wq(q), and i know the self.wq is a conv_layer), your comment says:#(batch_size, num_heads, seq_len_q, rows, cols, depth), where is the channel-dim? The dimension meaning of the six i understand is: seq_len_q is the length of the frame sequence; num_heads × depth = d_model; rows is the H of image; cols is the W of image)

Sincerely hope that you can answer my doubts and if you do not mind, can i ask you for some knowledge about the field of Video Prediction? I am trying to do some research about predicting image sequence with Transformer

module

Is there a module name from transformer_video import VideoPrediction?
Also, can I use this code in jupyter?

iamrakesh28 / video-prediction Goto Github PK

video-prediction's People

Contributors

Stargazers

Watchers

Forkers

video-prediction's Issues

Conv2D input shape is 4D（batch_size, rows, cols, depth）while this work input shape is 5D (batch_size, target_seq_len, rows, cols, depth)

the Channel dim of a color picture

module

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent