Comments (2)
Hi,
The paper you mentioned made an improvement on LRW-1000. In that paper, I saw them used 40-frame sequence for training and testing. For each sequence, the targeted word is always located in the centre of the sequence.
In contrast, we segmented sequences given the annotations without any padding. The average duration for each sequence is 0.3 seconds (8 frames), which is less than 40 frames. Given both results, it indicates that the contextual information around the targeted word might be helpful.
Regarding the pre-processing, we followed "LRW-1000: A Naturally-Distributed Large-Scale Benchmarkfor Lip Reading in the Wild" to resize the cropped mouth ROIs to 122x122.
For training, we used batchsize=16 and learning rate(lr)=1.5e-4 not batchsize=32, lr=1e-3. We used Adam (weight_decay=1e-4) to train the model for 80 epochs and the learning rate is decayed by cosine scheduler without a warmup stage.
from lipreading_using_temporal_convolutional_networks.
could you please release the pretrain weights trained on LRW-1000 datasets?
from lipreading_using_temporal_convolutional_networks.
Related Issues (20)
- Can we do Sentence Prediction for the model? HOT 1
- About variable length augmentation HOT 1
- DC-TCN number of parameters and Hardest words list
- Must convert gray? HOT 1
- ShuffleNet's Parameter
- Do this code in github include the part of data Augmentation? HOT 1
- With the same data , why the result is so different on ms-tcn and dc-tcn ?
- Acc of resnet18_dctcn_video_boundary in my test is wrong HOT 1
- about preprocessing
- cant process HOT 1
- How to use pretrain model after download from Google drive HOT 2
- what is the form of <ANNONATION-DIRECTORY> because I want applied it own my dataset , and landmark method.
- IndexError: index 28 is out of bounds for axis 0 with size 4 when run crop_mouth_from_video.py
- RuntimeError: CUDA error: device-side assert triggered HOT 2
- Can you tell me how to get word boundary from real reasoning?
- Not able to evaluate visual-only performance using the pre-processed npz files HOT 2
- KeyError: 'optimizer_state_dict' arise with Pretrined model
- Your work is excellent! How can I calculate lip reading loss L between the face my model reders and my ground truth image?
- How to create .pkl file for my own video
- How are you dealing with varied length of input - like some are 29,28,27.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lipreading_using_temporal_convolutional_networks.