Comments (3)
Hi, sentence-level speech recognition is quiet different from word-level speech recognition. To be honest, I don't suggest you do it by segmenting sentences since a sequence may have different labels. Besides, word based classification has an issue of out of vocabulary words.
from lipreading_using_temporal_convolutional_networks.
@mpc001 Thank you very much for your reply. I have another two questions:
- Do you have any suggestions on sentence-level lipreading?
- What's the running time of the models listed in your Model Zoo? I tested the resnet18_mstcn_adamw_s3, snv1x_dsmstcn3x and snv1x_tcn1x models. The running time of three models are 10ms, 20ms and 10 ms on my machine. Did you have a similar trend? I thought snv1x_tcn1x has the least flops, so it should be the fastest model.
Thank you very much!
from lipreading_using_temporal_convolutional_networks.
Extending this to sentence-level lipreading is possible but not straightforward. You can think of what we've done here as getting a strong representation, but then you need to add something on top that instead of predicting single labels per segment predicts continuous speech. Maybe you should look into works that have experiments on LRS2 for example to learn how to do that. The head of the architecture would change and the loss is different. If you want to get the best out of it, you can apply a language model.
from lipreading_using_temporal_convolutional_networks.
Related Issues (20)
- Can we do Sentence Prediction for the model? HOT 1
- About variable length augmentation HOT 1
- DC-TCN number of parameters and Hardest words list
- Must convert gray? HOT 1
- ShuffleNet's Parameter
- Do this code in github include the part of data Augmentation? HOT 1
- With the same data , why the result is so different on ms-tcn and dc-tcn ?
- Acc of resnet18_dctcn_video_boundary in my test is wrong HOT 1
- about preprocessing
- cant process HOT 1
- How to use pretrain model after download from Google drive HOT 2
- what is the form of <ANNONATION-DIRECTORY> because I want applied it own my dataset , and landmark method.
- IndexError: index 28 is out of bounds for axis 0 with size 4 when run crop_mouth_from_video.py
- RuntimeError: CUDA error: device-side assert triggered HOT 2
- Can you tell me how to get word boundary from real reasoning?
- Not able to evaluate visual-only performance using the pre-processed npz files HOT 2
- KeyError: 'optimizer_state_dict' arise with Pretrined model
- Your work is excellent! How can I calculate lip reading loss L between the face my model reders and my ground truth image?
- How to create .pkl file for my own video
- How are you dealing with varied length of input - like some are 29,28,27.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lipreading_using_temporal_convolutional_networks.