Comments (5)
For higher sample rates you'll need more filterbanks, otherwise each filterbank will be covering a large frequency range. If you do use more filterbanks, the higher ones will contain very little information since there is little speech info above 8khz. Sampling around 16khz should do the best.
from python_speech_features.
Actually, this comes together with another doubt I have. I am quite concerned about the pre-emphasis function - it seems that it is a basic subtraction of the original signal by the same signal slightly delayed (multiplied by a constant near 1). This delay is considerably different depending on the frame rate (delay = (62.5 uS for fr = 16k, 24.4us for fr = 44.1).
Do you guys have any suggestion of paper or any other technical reference about this specific approach of pre-emphasis?
from python_speech_features.
Preemphasis is used to 'flatten' the spectrum a little bit. For speech signals there is usually more energy in the low frequencies compared to the high frequencies. The Preemphasis filter is a highpass filter that evens out the energy a bit. It was used a lot in the past when euclidean distances were commonly used in asr systems. With gmms or neural nets preemphasis doesn't really matter, results will be the same whether you use it or not. It was included in the code because every other mfcc library I have seen includes it. You can safely ignore it for most purposes. Alternatively you can run some tests on a dev set with different preemph coefficients and see which, if any, works better.
from python_speech_features.
Thanks, James... Your response was really helpful
from python_speech_features.
No problem, glad I could help
from python_speech_features.
Related Issues (20)
- Error in fbank HOT 1
- logfbank functionstrange winstep size HOT 2
- I do not found code about inverse DFT
- logfbank interface exist error, lack of winfunc HOT 1
- Reading Spectogram
- missing frequencies in Mel FillterBank HOT 2
- Std of log mel-filterbank will be close to zero in some dimension when nfilt == 80. HOT 1
- sample audio link is invalid HOT 1
- Cannot read audio file sklearn.
- How to ignore the NFFT warning HOT 5
- Pypi not updating release?
- Reason for not windowing by default?
- Frame length is greater than FFT size HOT 1
- Can I use this MFCC function on edf file
- Minor issue on round vs. floor
- ValueError: File format b'OggS'... not understood. HOT 2
- [Question:] inverse fbank back to wav
- viseme generation
- High CPU Utilization HOT 2
- Use another augmented assignment statement
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python_speech_features.