Comments (3)
Would you please tell me why you are using signal[:,0] format?
The error does not seem to come from the package. It's a pythonic mismatch. In the test example of this repository, the reason behind using signal[:,0] is the dual-band nature of the example Wav files. Yours may only have a single channel. Try to remove signal = signal[:,0] and run the code again.
from speechpy.
hey @astorfi . Ah, you were right.. i was blindly copy pasting the tutorial code and i completely forgot to check the shape. Thank you
Apologies in advance but im new to audio processing. I'm trying to do this in syncet, where the audio inputs are mfcc features and the audio model takes this input (according to model.summary())
conv1_audio (Conv2D) (None, 13, 20, 64) 640
According to the owner of syncnet repo,
I have used a library called speechpy to extract MFCC features. The function to extract MFCC features from a .wav file according to the instructions using speechpy is:
speechpy.feature.mfcc(signal, sampling_frequency, frame_length=0.010, frame_stride=0.010, num_cepstral=13)
Audio features are computed over a duration of audio. In the paper, it is mentioned that features are computed at 100 Hz => for every 0.010 seconds. Hence, frame_length=0.010, frame_stride=0.010 (no overlap).According to the paper, audio features and video features are extracted for every 0.2 seconds.
Lip: 0.2 seconds => 0.2 * 25fps = 5 video frames
Audio: 0.2 seconds => 0.2 / 0.01(frame duration) = 20 audio framesHence, a 112x112x5 matrix is input to the lips model, and a 13x20 matrix is input to the audio model.
Can you help me understand how to shape peechpy.feature.mfcc return value to be 13x20 matrix? Or is this a simple .reshape() ? ( I was thinking that originally, but I was thinking that this is probably wrong especially since im completely blind in the world of audio processing.. even with all the tutorials I read)
PS: the original syncnet paper:
The input audio data is MFCC values. This is a representation of the shortterm power spectrum of a sound on a non-linear mel scale of frequency. 13 mel frequency bands are used at each time step. The features are computed at a sampling rate of 100Hz, giving 20 time steps for a 0.2-second input signal.
from speechpy.
The package output is available in the official documentation. I think you should read more about the MFCC or speech features in general. A good tutorial is as follows:
Mel Frequency Cepstral Coefficients (mfccs)
from speechpy.
Related Issues (20)
- cmvnw: Division by zero HOT 6
- A typo in the documentation HOT 1
- Remove animation from logo HOT 1
- Redundant calculations slowing down performance HOT 2
- filter bank shape HOT 2
- Installing release 2.3 appears to install 2.2 HOT 3
- Possibly out of date citation request? HOT 2
- How about Delta and DeltaDelta HOT 2
- Animated logo makes it difficult to read documentation
- bug : numframes need to add 1 ,that is numframes = 1 + math.ceil() HOT 1
- A feature request:How can I judge user intentions ? HOT 2
- MFCC Feature HOT 2
- Extracting log mel filterbank features HOT 1
- stack frames calculation
- negative dimensions are not allowed
- pre-emphasis HOT 2
- raw spectrogram HOT 2
- Speechpy Animation HOT 1
- can speechpy be used to distinguish between speech and non speech in a given audio file? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from speechpy.