astorfi / speechpy Goto Github PK

View Code? Open in Web Editor NEW

880.0 41.0 105.0 17.9 MB

:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

License: Apache License 2.0

Python 83.18% Makefile 0.16% TeX 16.66%

speech-recognition python feature-extraction speechpy

speechpy's Issues

Speechpy Animation

Hey there,

Thanks for your awesome library! I have a very minor request.

The animation on the documentation page cant be stopped or hidden so I feel like getting eye cancer

Not sure if others have the problem but maybe you can set the looping to False?

Thanks.

How about Delta and DeltaDelta

Hi,
I was wondering calculating MFCCs how can I add Delta and Deltadeltas to my coefficients? Should I go with 39 in num_cepstral?
Thanks

A feature request：How can I judge user intentions ？

Hello，
I have a need for speech recognition now, and I have read many documents of this project, but I am still not sure whether this project can meet my need：

Now I have hundreds of thousands of wav audio files, which are only one to five seconds and divided into two categories, one is positive answer, the other is negative answer, but I do not have the text information corresponding to each wav file, now my demand is whether I can use this project to make intention judgment？

For example, if I input an audio data, then I can get the intention expressed by this audio, but there is no text corresponding to this audio

Any help will be greatly appreciated！

no module main

Hi tryinig to use this in windows 7 64 bits 3.5 python
by pip or by git
no errors during the pip install

when i import pyspeech got this one :

import speechpy

ImportError Traceback (most recent call last)
in ()
----> 1 import speechpy

c:\anaconda3\lib\site-packages\speechpy_init_.py in ()
----> 1 from main import *
2 from processing import *

ImportError: No module named 'main'

negative dimensions are not allowed

When I used speechpy.feature.lmfe to get the log mfcc, the error occured. "negative dimensions are not allowed，ERROR”. Could you help me with the problem. Thanks

Remove animation from logo

Hi when I'm working with a project I like to have the docs open on the side, but would rather not have an animated logo flashing. :)

can speechpy be used to distinguish between speech and non speech in a given audio file?

pre-emphasis

Hi, it seems that pre-emphasis is not implemented. Will you add this in the future?

MFCC Feature

Respected Sir,
Greetings of the day !!!

Sir first of all thank you so much for such amazing library you shared with us.

Sir I am using SpeechPy library for extracting the MFCC of audio signal.

Sir I have an audio signal of 16kHz, 32bit float PCM, Mono channel. I am using framelength 100ms with 50% overlapping.

I used below code for extraction of MFCC,

fs, signal = wav.read("b0.wav")
signal = signal / abs(max(signal)) #Convert into double
mfcc = speechpy.feature.mfcc (signal , sampling_frequency=fs, frame_length=0.1, frame_stride=0.05, num_filters=40, fft_length=2048, low_frequency=0, high_frequency=None)

Respected Sir, I got confusion because I used python_speech_features library also to extract mfcc and for verification of my result. But both are giving different result.

mfcc1 = python_speech_features.base.mfcc(signal, samplerate=fs, winlen=0.1, winstep=0.05, numcep=13, nfilt=26, nfft=2048, lowfreq=0, highfreq=None, preemph=0.97, ceplifter=22, appendEnergy=True)

I wanted to know where I am doing mistake.

My Questions Are:

Is the above code sequence is correct to extract mfcc using speechpy library ?
While using speechpy.feature.mfcc function, preemphasis operation is not performed? That is the reason both library are giving different result.

Should we have to perform seprately preemphasis using below code then we have to give the output of preemphasis to mfcc?

signal_preemphasized = speechpy.processing.preemphasis(signal, cof=0.98)

Why both library are giving different result ?

Its my humble request respected Sir Please response to my query. I am not getting clarification. What to use and which is correct.

I am sorry for my poor English.

Installing release 2.3 appears to install 2.2

I have installed SpeechPy as part of my review of the package. Installing the package, I found a minor issue with the version number: I explicitly checked out the '2.3' release for installation whereas the install script output refers to version 2.2:

$ python setup.py develop
running develop
running egg_info
creating speechpy.egg-info
writing speechpy.egg-info/PKG-INFO
writing dependency_links to speechpy.egg-info/dependency_links.txt
writing requirements to speechpy.egg-info/requires.txt
writing top-level names to speechpy.egg-info/top_level.txt
writing manifest file 'speechpy.egg-info/SOURCES.txt'
reading manifest file 'speechpy.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'speechpy.egg-info/SOURCES.txt'
running build_ext
Creating /home/tha/.conda/envs/joss-review-tmp/lib/python3.6/site-packages/speechpy.egg-link (link to .)
Adding speechpy 2.2 to easy-install.pth file

Installed /home/tha/other-repo/speechpy
Processing dependencies for speechpy==2.2
Searching for numpy==1.14.3
Best match: numpy 1.14.3
Adding numpy 1.14.3 to easy-install.pth file

Using /home/tha/.conda/envs/joss-review-tmp/lib/python3.6/site-packages
Searching for scipy==1.1.0
Best match: scipy 1.1.0
Adding scipy 1.1.0 to easy-install.pth file

Using /home/tha/.conda/envs/joss-review-tmp/lib/python3.6/site-packages
Finished processing dependencies for speechpy==2.2

I suspect this is just some configuration text string that was not properly updated for the 2.3 release?

bug : numframes need to add 1 ,that is numframes = 1 + math.ceil()

Extracting log mel filterbank features

Thanks very much for the great library! It's my default library for speech processing now.

Just want to double check on the following, I want to extract 40-dimensional log mel filterbank feautres from sliding a Hamming window of width 25ms with an overlap of 10ms. Does the code below extract the right features? I am a bit uncertain whether frame_stride=0.01 creates overlap of 10ms..

fs, signal = wav.read(file_path)
lmfe = speechpy.feature.lmfe(signal, sampling_frequency=fs, frame_length=0.025, frame_stride=0.01, num_filters=40, fft_length=512, low_frequency=0, high_frequency=None)

Thanks!

Possibly out of date citation request?

The README says to cite as "astorfi/speech_feature_extraction: SpeechPy", but the repo name is speechpy, so I'm thinking maybe this part of the README is out of date and it should say that one should cite as "storfi/speechpy: SpeechPy" or maybe "storfi/speechpy: Speech recognition and feature extraction" or something?

filter bank shape

Hi, I found that in your feature.py line 44:

# Initial definition
    filterbank = np.zeros([num_filter, fftpoints])

    # The triangular function for each filter
    for i in range(0, num_filter):
        left = int(freq_index[i])
        middle = int(freq_index[i + 1])
        right = int(freq_index[i + 2])
        z = np.linspace(left, right, num=right - left + 1)
        filterbank[i, left:right + 1] = functions.triangle(z, left=left, middle=middle, right=right)

    return filterbank

You use fftpoints directly to initialize, but as you use 512 as default, I think it should be 257 corresponding to FFT results.
I also checked other libraries such as python_speech_features and librosa, they all make it NFFT//2 + 1.
I just want to make sure the differences, thanks.

stack frames calculation

Hi Amirsina,

First of all, great project!

I noticed in the mfcc the last frame_length of the signal buffer is always missing. When the number of stack frames is calculated (in the function stack_frames), the sample_buffer is decreased with the frame_length before it is divided in a number of stack frames.

See snippet:

speechpy/speechpy/processing.py

Lines 103 to 104 in 4ece793

 numframes = (int(math.ceil((length_signal 

 - frame_sample_length) / frame_stride)))

On a 1 second sample buffer this is hardly noticeable, but if we run the mfcc on smaller buffers this becomes significant.

If the calculation is done in this way:

    numframes = (int(math.ceil((length_signal
                                  - (frame_sample_length - frame_stride)) / frame_stride)))

The full sample buffer is used if frame_sample_length equals the frame_stride and adjusted correctly on differences between the frame_length and frame_stride.

cmvnw: Division by zero

In encountered the following warning during the variance normalization of the speech features:

RuntimeWarning: divide by zero encountered in true_divide

cmvnw
This is probably not the desired behavior, I don't know what the best solution in this case is though.

raw spectrogram

First of all thank you for your nice library. As raw features become popular in DNN models, is there any plan to add raw spectrogram feature extraction?

I know there are the same implementation in both numpy and scipy, but I think it is more justice to have all features (MFCC, LMFB and spectrogram) normalized in the same fashion and do a comparative analysis about their advantages on each other.

best regards

Redundant calculations slowing down performance

After reverting this PR the performance suffers due to redundant calculations. For the time being, I plan on uploading a fork with the change to PyPI so that it can be included in other projects via setup.py.

Correct wav format?

Im trying to extract mfcc features from audio of a video file.

I tried FFMPEG:

def extractAudioFromVideo(video, audio_out="out.wav"):
	cmd="ffmpeg -i {} -acodec pcm_s16le -ac 1 -ar 16000 {}".format(video, audio_out)
	os.system(cmd)
	return audio_out

def extractAudioMFCC(file_name="out.wav"):
	fs, signal = wav.read(file_name)
	signal = signal[:,0]

	############# Extract MFCC features #############
	mfcc = speechpy.feature.mfcc(signal, sampling_frequency=fs, frame_length=0.020, frame_stride=0.01,num_filters=40, fft_length=512, low_frequency=0, high_frequency=None)
	mfcc_cmvn = speechpy.processing.cmvnw(mfcc,win_size=301,variance_normalization=True)
	print('mfcc(mean + variance normalized) feature shape=', mfcc_cmvn.shape)


extractAudioMFCC("test.mp4", audio_out="out.wav")
extractAudioMFCC("out.wav")

The error I get:

Traceback (most recent call last):
File "TWK.py", line 99, in
extractAudioMFCC()
File "TWK.py", line 22, in extractAudioMFCC
signal = signal[:,0]
IndexError: too many indices for array

Am I using the wrong wav format?

A typo in the documentation

In test page of doc, a param Filter in speechpy.processing.stack_frames should be filter which has fixed in the latest code in package and local.

Animated logo makes it difficult to read documentation

Hi,

Thanks for sharing your work. It looks interesting and I'd like to dig into it more. However I find it very difficult to work with the documentation here and on read the docs because of the blinking logo.

The animations in the graph are great. The logo however, is a serious drawback.

Thanks again.

	numframes = (int(math.ceil((length_signal
	- frame_sample_length) / frame_stride)))

astorfi / speechpy Goto Github PK

speechpy's Issues

Recommend Projects

Recommend Topics

Recommend Org