Pitch_Determiation_for_Speech_Signal

It is a Pitch Determination Algorithm based on Short-time Autocorrelation and Shortest-distance Search

1. Installation

git clone https://github.com/MorrisXu-Driving/Pitch_Determiation_for_Speech_Signal.git
Create a new project in Python IDE and choose file mainvoid.py as the script path in configuration.
Make sure the test input wav file tone4_w.wav is under the same directory as the mainvoid.py.

2. Algorithm Structure

a. Overall Flow
b. Preprocessing
c. Candidate Generation
d. Postprocessing

3. Parameter Setting

In this algorithm we have:

Parameters for input preprocessing
- wlen = int(0.03 * fs) # 0.03 stands for wlen in time domain, here the wlen is 30ms.
- inc = int(0.01 * fs) # 0.01 stands for inc in time domain, here the inc is 10ms.
- lf = 60 # Hz # lf stands for the lower pass frequency of the bandpass denoising filter
- hf = 500 # Hz # hf stands for the upper pass frequency of the bandpass denoising filter
Parameters for pitch determination
- IS = 0.8 # Observe the waveform of the input audio at above diagram and set non-speech time at the start of the input in second
- r1= 0.03 # Threshold Coefficient for energy threshold T1 (shown in the above diagram) judging speech segment, namely T1 = np.mean(H[:NIS]) * r1 where H[:NIS] is the energy of speech between 0-IS.
- r2 = 0.26 # Threshold Coefficient for judging mainbodys in a speech segment, each speech segment has a different T2 (shown in the above diagram)
- ThrC = [10, 15] # Max difference in F0 between adjacent frames when conducting the shortest-distance search in order to avoid unnatural change in final result
- miniL = 10 # Minimum length for a speech segment
- mnlong = 3 # Minimum length for a major body in speech segments

4. Result Demo

The above diagram consists of the spectrogram of the input audio and the pitch extracted from the input file. The pitch extracted(in white line) highly correlated with the first harmonic frequency shwon from the STFT spectrogram, which reveals that the algorithm is working properly.
The RMSE in Hz of the results tested from the wav files in speech_signal_for_test/.

5. Conclusion

The algorithm is not adaptive to differnt types of audio signals.
- For those inputs with low SNR(i.e. the background energy between 0-IS is very high already needs to set a low r1)
- For those inputs with low energy at each speech segments, r2 should be lower in order to better recognize the extended parts besides each mainbodys.
- Adaptive parameter setting is needed to have better user experience since too many parameters need to be adjusted to achieve a good performance on different types of speech audios.
Future Work
- Merely extracting the pitch is not friendly for future research. Its combination with forced alignment in char level and word level need to be conducted.

morrisxu-driving / pitch_determiation_and-endpoint_detection_for_speech_signal Goto Github PK

pitch_determiation_and-endpoint_detection_for_speech_signal's Introduction

Pitch_Determiation_for_Speech_Signal

1. Installation

2. Algorithm Structure

a. Overall Flow

b. Preprocessing

c. Candidate Generation

d. Postprocessing

3. Parameter Setting

4. Result Demo

5. Conclusion

pitch_determiation_and-endpoint_detection_for_speech_signal's People

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent