In this project, Linear Predictive Coding (LPC) has been implemented and studied through MATLAB Graphical User Interface (GUI).
Linear predictive coding (LPC) is a widely used technique in audio signal processing, especially in speech signal processing. It has been particularly used to estimate the basic parameters of speech such as pitch (through frequency) and intensity (through loudness).
We can formulate the relation between the input and output as,
To find out , the above equation is converted to matrix form. Speech samples can be approximated as a linear combination of the past samples by minimizing the error. The solution of choice for LPC is Least Squares, for which those values of
are chosen that minimizes
, the power of the error of estimation or residual e.
The usual approach in audio processing for encoding the audio signal after it is segmented is to use a technique called Overlap-add (OLA). For OLA, we window the signal with a window function that has a constant OLA property.
The signal is decoded by running the filter coefficients through the LPC model and using variance to control the source, and then the decoded windowed signals is overlap-added to obtain the full signal.
A Graphical User Interface (GUI) in MATLAB was designed to study the working of LPC.
- The user is asked to enter the duration of input speech signal 'x' in seconds.
- When the 'Record' push button is pressed, the audio signal gets recorded and saved as a file.
- When the 'Play' push button is pressed, the original audio signal is played and gets plotted.
- The user is asked to enter the length of audio segment in ms to divide the recorded speech signal into smaller segments.
- The percentage of overlap is entered by the user.
- The user can choose from any of the 4 window functions - Hanning, Hamming, Barlett and Blackman.
- The user can choose from any of the 4 Linear Predictor Filters of orders 12, 48, 72 and 96.
- The sampled input speech signal is applied to an analyzer, which computes the parameters of the speech signal to be transmitted to the synthesizer, ie, the filter coefficients of LP Filter and the pitch of each segment.
- The pitch of each segment of the speech segment is plotted.
- This speech synthesizer reconstructs the approximated speech signal. The input provided to the encoder is the comparison between the sampled signal and the approximated signal with the parameters.
- This encoder forms the digital signal known as LPC output.
- The LPC output is provided to the Low Pass Filter, which reconstructs the audio signal by performing the interpolation of samples in the input, with and without using the information of pitch.
The quality of the reconstructed speech signal output was relatively low. Due to higher rate of compression, the output speech signal was distorted and less clear. The pitch of output signal was also low, as the output sounded deeper than the input speech.
The quality of output speech signal increases as the number of previous samples (order of the LP filter) for prediction increases. Due to a lesser rate of compression the reconstructed speech signal was less distorted and more legible than before. The pitch of output was also marginally higher.
The speech is more understandable due to very less distortion since the rate of compression is less. Thus, the overall output quality is better for the LP filter of order 72.
Quality of output speech signal is the best for the LP filter of order 96. Since 96 previous samples have been considered to reconstruct the output, the compression rate is less, and thus the distortion of the output is also less. The output quality is clearer as compared to the previous lower order filters.
Order of LP Filter | Without Pitch Detection | With Pitch Detection |
---|---|---|
12 | 9.3 to 1 | 8.6 to 1 |
48 | 2.45 to 1 | 2.35 to 1 |
72 | 1.65 to 1 | 1.57 to 1 |
96 | 1.24 to 1 | 1.20 to 1 |
-
With Pitch Detection,
- The rate of compression of the input speech signal was lower
- The distortion of the output was lesser
- The quality of the reconstructed output speech signal was better
- The speech signal was clearer and the nasality tone of the output was more prominent.
-
Without Pitch Detection,
- The depth/pitch of the output speech was considerably higher.
This observation was true irrespective of the order of the LP filter being used. However, as the order of the filter increased, the quality of speech signal output also increased.
Linear Predictive Coding is All-Pole Resonance Modeling by Hyung-Suk Kim