Giter VIP home page Giter VIP logo

pitch_determiation_and-endpoint_detection_for_speech_signal's Introduction

Pitch_Determiation_for_Speech_Signal

It is a Pitch Determination Algorithm based on Short-time Autocorrelation and Shortest-distance Search

1. Installation

  1. git clone https://github.com/MorrisXu-Driving/Pitch_Determiation_for_Speech_Signal.git
  2. Create a new project in Python IDE and choose file mainvoid.py as the script path in configuration.
  3. Make sure the test input wav file tone4_w.wav is under the same directory as the mainvoid.py.

2. Algorithm Structure

  • a. Overall Flow

  • b. Preprocessing

    Image
  • c. Candidate Generation

    Image
  • d. Postprocessing

    Image

3. Parameter Setting

In this algorithm we have:

  • Parameters for input preprocessing

    • wlen = int(0.03 * fs) # 0.03 stands for wlen in time domain, here the wlen is 30ms.
    • inc = int(0.01 * fs) # 0.01 stands for inc in time domain, here the inc is 10ms. Image
    • lf = 60 # Hz # lf stands for the lower pass frequency of the bandpass denoising filter
    • hf = 500 # Hz # hf stands for the upper pass frequency of the bandpass denoising filter
  • Parameters for pitch determination Image

    • IS = 0.8 # Observe the waveform of the input audio at above diagram and set non-speech time at the start of the input in second
    • r1= 0.03 # Threshold Coefficient for energy threshold T1 (shown in the above diagram) judging speech segment, namely T1 = np.mean(H[:NIS]) * r1 where H[:NIS] is the energy of speech between 0-IS.
    • r2 = 0.26 # Threshold Coefficient for judging mainbodys in a speech segment, each speech segment has a different T2 (shown in the above diagram)
    • ThrC = [10, 15] # Max difference in F0 between adjacent frames when conducting the shortest-distance search in order to avoid unnatural change in final result
    • miniL = 10 # Minimum length for a speech segment
    • mnlong = 3 # Minimum length for a major body in speech segments

4. Result Demo

Image
The above diagram consists of the spectrogram of the input audio and the pitch extracted from the input file. The pitch extracted(in white line) highly correlated with the first harmonic frequency shwon from the STFT spectrogram, which reveals that the algorithm is working properly. Image
The RMSE in Hz of the results tested from the wav files in speech_signal_for_test/.

5. Conclusion

  • The algorithm is not adaptive to differnt types of audio signals.
    • For those inputs with low SNR(i.e. the background energy between 0-IS is very high already needs to set a low r1)
    • For those inputs with low energy at each speech segments, r2 should be lower in order to better recognize the extended parts besides each mainbodys.
    • Adaptive parameter setting is needed to have better user experience since too many parameters need to be adjusted to achieve a good performance on different types of speech audios.
  • Future Work
    • Merely extracting the pitch is not friendly for future research. Its combination with forced alignment in char level and word level need to be conducted.

pitch_determiation_and-endpoint_detection_for_speech_signal's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

runngezhang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.