Giter VIP home page Giter VIP logo

disvoice's Introduction

Hi there, I'm Camilo 👋

banner that says Camilo Vasquez - Machine learning researcher interested in signal and natural language processing

I have performed research and development activities related to signal processing and machine learning for health-care and biometric applications since five years now, both in academic and industrial partners. Passionate about Machine learning, deep learning, speech processing, and natural language processing technologies. Some technologies I enjoy working and I am familiar with include Pytorch, Transformers, Sklearn, Pandas, FastAPI, Docker, among others.

GitHub Stats:


Find me around the web 🌎:

disvoice's People

Contributors

deepsource-autofix[bot] avatar deepsourcebot avatar dependabot[bot] avatar g-thor avatar jcvasquezc avatar luigiattorresi avatar neshvig10 avatar nicanor5 avatar nicanorgarcia avatar samuelcahyawijaya avatar tariasvergara avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

disvoice's Issues

Text File Not Found. I think that it internally uses Praat to estimate the pitch or Should I need to give the text file while running the prosody.py

Processing audio 1 from 1 001_ddk1_PCGITA.wav
Error: Cannot open file “/home/shsheikh/clones/DisVoice/praat/001_ddk1_PCGITA.wav”.
Script line 14 not performed or completed:
« Read from file... 'fileName$' »
Script “/home/shsheikh/clones/DisVoice/prosody/../praat/vuv_praat.praat” not completed.
Praat: script command <<../praat/vuv_praat.praat 001_ddk1_PCGITA.wav /home/shsheikh/clones/DisVoice/prosody/../tempfiles/tempF0001_ddk1_PCGITA.txt ../tempfiles/tempVUV001_ddk1_PCGITA.txt 60 350 0.01 0.02 0.01>> not completed.

Traceback (most recent call last):
File "prosody.py", line 392, in
feat_vec=prosody_static(audio_file, flag_plots, pitch_method='praat')
File "prosody.py", line 271, in prosody_static
F0,_=praat_functions.decodeF0(temp_filename_f0,len(data_audio)/float(fs),0.01)
File "/home/shsheikh/clones/DisVoice/prosody/../praat/praat_functions.py", line 136, in decodeF0
pitch_data=np.loadtxt(fileTxt)
File "/usr/local/lib/python3.5/site-packages/numpy/lib/npyio.py", line 962, in loadtxt
fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
File "/usr/local/lib/python3.5/site-packages/numpy/lib/_datasource.py", line 266, in open
return ds.open(path, mode, encoding=encoding, newline=newline)
File "/usr/local/lib/python3.5/site-packages/numpy/lib/_datasource.py", line 624, in open
raise IOError("%s not found." % path)
OSError: /home/shsheikh/clones/DisVoice/prosody/../tempfiles/tempF0001_ddk1_PCGITA.txt not found.

TypeError: plot_pros() takes 5 positional arguments but 7 were given

Thanks for this project.
I encounter some errors when i run ./test_prosody.sh
Could you tell me what plot function I should call at this line?
prosody.py#L247

Error Message:
Traceback (most recent call last):
File "prosody.py", line 406, in
profeats = prosody_dynamic(audio_file)
File "prosody.py", line 248, in prosody_dynamic
plot_pros(data_audio, fs, F0, seg_voiced, Ev, featvec, f0v)

VisibleDeprecationWarning and TypeError: can't convert cuda:0 device type tensor to numpy.

Thanks for the awesome toolkit.

After I installed all required packages.

I got the below warning message when I run the code glottal.py, articulaton.py respectively.

/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py:136: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray return array(a, dtype, copy=False, order=order, subok=True)
But I can get the desired output file.

When I run phonation.py or phonological.py, I can get the desired output file without any warning messages.

And if I run Representationlearning.py, I got the below error.

root@198c2471ad59:/codes/m456_smk/DisVoice/replearning# ./test_replearning.sh /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Traceback (most recent call last): File "replearning.py", line 225, in <module> script_manager(sys.argv, replearning) File "/codes/m456_smk/DisVoice/replearning/../script_mananger.py", line 31, in script_manager features=feature_method.extract_features_file(audio, static=static, plots=plots, fmt=fmt) File "replearning.py", line 110, in extract_features_file hb=self.AEspeech.compute_bottleneck_features(audio) File "/codes/m456_smk/DisVoice/replearning/AEspeech.py", line 177, in compute_bottleneck_features return bot.data.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Traceback (most recent call last): File "replearning.py", line 225, in <module> script_manager(sys.argv, replearning) File "/codes/m456_smk/DisVoice/replearning/../script_mananger.py", line 31, in script_manager features=feature_method.extract_features_file(audio, static=static, plots=plots, fmt=fmt) File "replearning.py", line 110, in extract_features_file hb=self.AEspeech.compute_bottleneck_features(audio) File "/codes/m456_smk/DisVoice/replearning/AEspeech.py", line 177, in compute_bottleneck_features return bot.data.numpy() TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

I tried several solutions based on similar issues on stackoverflow but none of them worked.

I am running these on docker. some environment as below shown:
Ubuntu 20.04.1
Python 3.6.9
Package Version


chainer 7.7.0
chardet 3.0.4
click 7.1.2
cloudpickle 1.3.0
cntk-gpu 2.7
cupy 7.8.0
cycler 0.10.0
Cython 0.29.21
grpcio 1.32.0
h5py 2.10.0
httplib2 0.18.1
idna 2.10
imageio 2.9.0
importlib-metadata 1.7.0
ipykernel 5.3.4
ipython 7.16.1
ipython-genutils 0.2.0
ipywidgets 7.5.1
kaldi-io 0.9.0
Keras 2.4.3
Keras-Preprocessing 1.1.2
librosa 0.8.0
matplotlib 3.0.2
numba 0.51.2
numpy 1.19.5
pandas 1.1.2
pandocfilters 1.4.2
parso 0.7.1
pathlib 1.0.1
pickleshare 0.7.5
Pillow 7.2.0
pip 21.0.1
pooch 1.2.0
praat-parselmouth 0.3.3
ptyprocess 0.6.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.20
pydub 0.24.1
Pygments 2.6.1
pygobject 3.26.1
pygpu 0.7.6
pyparsing 2.4.7
pyrsistent 0.16.0
PySocks 1.7.1
pysptk 0.1.16
python-apt 1.6.5+ubuntu0.3
python-dateutil 2.8.1
python-distutils-extra 2.39
python-gflags 1.5.1
python-speech-features 0.6
pytz 2021.1
PyWavelets 1.1.1
PyYAML 5.3.1
pyzmq 19.0.2
qtconsole 4.7.6
QtPy 1.9.0
scikit-image 0.17.2
scikit-learn 0.23.2
scipy 1.5.2
seaborn 0.9.0
Send2Trash 1.5.0
setuptools 54.1.1
simplegeneric 0.8.1
six 1.15.0
SoundFile 0.10.3.post1
stopit 1.1.1
suds-jurko 0.6
tabulate 0.8.7
tensorboard 2.4.1
tensorboard-plugin-wit 1.7.0
tensorflow 2.4.1
tensorflow-estimator 2.4.0
tensorflow-gpu 2.3.0
tensorflow-probability 0.11.0
Theano 1.0.5
threadpoolctl 2.1.0
tifffile 2020.8.25
torch 1.7.0
torchaudio 0.7.0
torchvision 0.8.0.dev20200828+cu101
Werkzeug 1.0.1
wheel 0.36.2
widgetsnbextension 3.5.1
wrapt 1.12.1
zipp 3.1.0

Could you help me with these?
Many thanks

Feature Selection Algorithm

you mentioned some feature selection algorithm such as LASSO, Relief-F in your paper. where is the implementation of feature selection algorithm in your code?

Error in Articulation features

Hello,

I am trying to extract articulation features and I am getting the following error. How can I fix it? Thank you!

image

Error or prosody extraction

I can reproduce the plot on static mode, but get the error after that plot. Using dynamic mode gives a similar error without the resulting plot. I think the issue is that the argument file (wav) is appended to praat file instead of the current file.
Here is the complete error message:

$python prosody.py "./001_ddk1_PCGITA.wav" "featuresDDKdyn.txt" "static" "true"
Error: Cannot open file “/tmp/DisVoice/praat/./001_ddk1_PCGITA.wav”.
Script line 14 not performed or completed:
« Read from file... 'fileName$' »
Script “/tmp/DisVoice/prosody/../praat/vuv_praat.praat” not completed.
Praat: script command <</tmp/DisVoice/prosody/../praat/vuv_praat.praat ./001_ddk1_PCGITA.wav /tmp/DisVoice/prosody/../tempfiles/pitchtemp.txt /tmp/DisVoice/prosody/../tempfiles/voicetemp.txt 60 350 0.01 0.02 0.01>> not completed.

/tmp/DisVoice/prosody/../praat/praat_functions.py:135: UserWarning: loadtxt: Empty input file: "/tmp/DisVoice/prosody/../tempfiles/pitchtemp.txt"
  pitch_data=np.loadtxt(fileTxt)
Traceback (most recent call last):
  File "prosody.py", line 677, in <module>
    avgF0slopes,stdF0slopes,MSEF0, SVU,VU,UVU,VVU,VS,US,URD,VRD,URE,VRE,PR,maxvoicedlen,maxunvoicedlen,minvoicedlen,minunvoicedlen,rvuv,energyslope,RegCoefenergy,msqerrenergy,RegCoeff0,meanNeighborenergydiff,stdNeighborenergydiff, F0_rec, f0real, venergy, uenergy  = intonation_duration(audio_file, flag_plots=flag_plots)
  File "prosody.py", line 352, in intonation_duration
    pitch_z,ttotal = praat_functions.decodeF0(temp_filename_f0,len(data_audio)/fs,size_step)
  File "/tmp/DisVoice/prosody/../praat/praat_functions.py", line 140, in decodeF0
    time_voiced=pitch_data[0] # First datum is the time stamp
IndexError: index 0 is out of bounds for axis 0 with size 0

Praat scripts to compute the funamental frequency do not work properly with relative paths

There are some errors when the arguments for the wav files are entered with a relative path, because of praat scripts do not allow relative paths.

There are two options to fix the issue:

  1. Use absolute paths when you enter the audio file, or
  2. Change the default algorithm to compute the fundamental frequency from 'praat' to 'rapt', in phonation and prosody analyses

How to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

Hi @jcvasquezc ,

I would like to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

The related code is shown as below:
phonafeature=phonation.extract_features_file(filename, static=False, plots=False, fmt="npy") fbankfeature, energies = python_speech_features.fbank(filename, samplerate=16000, nfilt=40, nfft=768,winlen=0.04,winstep=0.02, winfunc=np.hamming)

Because I noticed that the dynamic phonation feature is using winlen=0.04,winstep=0.02, so I set the same parameter value to fbank function.
However, the len(phonafeature) and len(fbankfeature) for one filename input is not same.
e.g.: filename=demo.wav,this demo.wav has 15s long and 16000 sample rate.
the len(phonafeature) for this demo.wav is (430.7), the len(fbankfeature) is (749.40).

For concatenate propose, I have to padding the phonafeature with constant value 0 to match the len(fbankfeature), i.e., from (430.7) to (749.7). Then I can get the concatenated phonation plus fbank feature (749.47) for demo.wav

But I dont think it is the correct way to use dynamic phonation feature with mfcc/fbank feature as the input to feed a DNN.

Could you help me with this issue?
And why is the different in the length of the output phonation feature and fbank feature under same winlen and winstep?

Many thanks

Can not extract glottal features.

I met a problem when I extracted glottal features, where winShift(=5)/1000*fs=0 cause the error.

So I fixed the error like this:
#Calculate LP-residual and extract N maxima per mean-based signal determined intervals
res = utils_gci.GetLPCresidual(x,winLenfs/1000,winShiftfs/1000,LPC_ord, VUV_inter);

Now the code is still not working. Can someone tell me how to fix this problem?
image

I used python2.7 in Unbuntu16.04 x64.

By the way, pysptk package can not be successfully installed under python3.6 virtualenv.

Thanks and Regards
XU SHIHAO

Error with Glottal Features

The glottal feature worked for most of my audio files, but for one of them it had this error:

File "/Users/sruthikurada/PycharmProjects/ML-Parkinson-Disease/DisVoice/glottal/glottal.py", line 287, in extract_features_file
    df[k]=[feat_st[e]]
IndexError: index 20 is out of bounds for axis 0 with size 20

Will a parselmouth-praat version be released?

I'm running my codes on a shared commercial server and it is difficult to install praat, so I've been relying on the python version of praat called parselmouth. I wonder if this would be something that you would implement? Thanks!

Preprocessing before feature extraction

Hi @jcvasquezc thanks again for the great lib!

I am just wondering if I should perform any data preprocessing before feeding the audio to extract_features_file. My audio files are utterances (> 2 secs) mostly one per speaker (sometimes one contains a second speaker saying "yes" or "um") but there's loudness difference in the utterances between the two speakers. Do you suggest I scale the audio waveforms to (-1, +1), save the audio files, and then feed them to the feature extactors?

The down-stream task is classification so I didn't want to complicate it by performing more advanced preprocessing. minmax scaling seems sufficient enough do you think so?

about the Phonological and replearning problem

thank you for your outstanding work! but i have some problems,
first about the phonological, i input my own wav flie(english and chinese) but get nothing in the pic, so i wonder to know how
to fix it
second about the replearning, typeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
sorry to bother you

Error while extracting articulation features

Unable to extract features :
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.10/dist-packages/disvoice/articulation/../../tempfiles/tempFormantsartic4065_v.txt'
Screenshot from 2023-08-09 10-45-12

Thank you

Prosodic Features

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

ValueError in glottal feature extraction

Thanks for the library! I am trying to extract glottal features from my audio files. The value error below showed up in two of my feature extraction pipelines. Do you have any idea about the cause of the error? The feature extraction takes a long time so I really want to keep it error-free if possible. Thanks again!

Traceback (most recent call last):
    feats = glottalf.extract_features_file(file_audio, static=False, plots=False, fmt="npy")
  File "/mnt/sdb/Tools/DisVoice/glottal/glottal.py", line 194, in extract_features_file
    g_iaif=IAIF(data_frame,fs,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/GCI.py", line 147, in IAIF
    residual1=calc_residual(x_filt,x_filt,ord_lpc2,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/utils_gci.py", line 470, in calc_residual
    vector_res[start:stop]=vector_res[start:stop]+residual_win
ValueError: operands could not be broadcast together with shapes (20,) (2,)

and

  File "/mnt/sdb/Tools/DisVoice/glottal/glottal.py", line 194, in extract_features_file
    g_iaif=IAIF(data_frame,fs,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/GCI.py", line 147, in IAIF
    residual1=calc_residual(x_filt,x_filt,ord_lpc2,GCI)
  File "/mnt/sdb/Tools/DisVoice/glottal/utils_gci.py", line 470, in calc_residual
    vector_res[start:stop]=vector_res[start:stop]+residual_win
ValueError: operands could not be broadcast together with shapes (24,) (26,) 

Minimum length of input audio segment

Hi this is a really useful library for extracting interpretable speech features! Thanks!!

I want to ask about the minimum length of the input audio that goes into each of the feature extraction functions. It seems for the prosody features, the input has to be longer than 0.6 sec?

        pitchON = np.where(F0!=0)[0]
        dchange = np.diff(pitchON)
        change = np.where(dchange>1)[0]
        iniV = pitchON[0]

And this is the same for phonation features?

Thanks again.

FileNotFoundError working with Articulation Features

Hi, I have pulled the latest version of this repository. I am having trouble extracting the articulation features from my own audio. I was able to successfully run all of the provided IPython Notebooks.

File "PycharmProjects/ML-Parkinson-Disease/DisVoice/articulation/articulation.py", line 251, in extract_features_file
F0,_=praat_functions.decodeF0(temp_filename_f0,len(data_audio)/float(fs),self.step)
File "PycharmProjects/ML-Parkinson-Disease/DisVoice/articulation/../praat/praat_functions.py", line 139, in decodeF0
if os.stat(fileTxt).st_size==0:

FileNotFoundError: [Errno 2] No such file or directory: 'PycharmProjects/ML-Parkinson-Disease/DisVoice/articulation/../tempfiles/tempF0articulationID17_pd__12_2_1_0.txt'

If it is relevant, I am running this script in my ML-Parkinson-Disease folder, which contains the DisVoice folder within it.

Is there a simpler way to obtain glottal flow signal?

Hello, my apologies for opening this issue. I just need to extract from a *.wav file the glottal flow signal. Is it there a simple way to do this? In the ideal scenario the signature of my function should be something like follows:

def glottal_Flow(file_id):
some actions
return time,glottal_flow

I have been looking at the glottal.py file it's very complete indeed, I thought you might have gone through this before.

Thanks in advance.

Unable to install disvoice on MaAC m1 chip, ERROR: Could not find a version that satisfies the requirement kaldi_iotqdmmatplotlibnumpytorchlibrosapandaspysptkphonetscipyscikit_learn

I have a miniforge python environment on mac m1 chip. The reason am using this environment is beacause its the only way i acan successfully install TensorFlow on the my mac m1 chip. When trying to install disvoice with pip i get the error:
ERROR: Could not find a version that satisfies the requirement kaldi_iotqdmmatplotlibnumpytorchlibrosapandaspysptkphonetscipyscikit_learn

any help will be appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.