ina-foss / inaspeechsegmenter Goto Github PK

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

License: MIT License

Python 91.13% Jupyter Notebook 8.27% Dockerfile 0.60%

speech-segmentation audio-analysis music-detection speech-music gender-equality gender-classification speaker-gender speech music voice-activity-detection

inaspeechsegmenter's Introduction

inaSpeechSegmenter

inaSpeechSegmenter is a CNN-based audio segmentation toolkit.

It splits audio signals into homogeneous zones of speech, music and noise. Speech zones are split into segments tagged using speaker gender (male or female). Male and female classification models are optimized for French language since they were trained using French speakers (accoustic correlates of speaker gender are language dependent). Zones corresponding to speech over music or speech over noise are tagged as speech.

inaSpeechSegmenter has been designed in order to perform large-scale gender equality studies based on men and women speech-time percentage estimation.

Installation

inaSpeechSegmenter works with Python 3.7 to Python 3.11. It is based on Tensorflow which does not yet support Python 3.12+.

It is available on Python Package Index inaSpeechSegmenter and packaged as a docker image inafoss/inaspeechsegmenter.

Prerequisites

inaSpeechSegmenter requires ffmpeg for decoding any type of format. Installation of ffmpeg for ubuntu can be done using the following commandline:

$ sudo apt-get install ffmpeg

PIP installation

# create a python 3 virtual environement and activate it
$ virtualenv -p python3 env
$ source env/bin/activate
# install framework and dependencies
$ pip install inaSpeechSegmenter

Installing from from sources

# clone git repository
$ git clone https://github.com/ina-foss/inaSpeechSegmenter.git
# create a python 3 virtual environement and activate it
$ virtualenv -p python3 env
$ source env/bin/activate
# install framework and dependencies
# you should use pip instead of setup.py for installing from source
$ cd inaSpeechSegmenter
$ pip install .
# check program behavior
$ python setup.py test

Using inaSpeechSegmenter

Speech Segmentation Program

Binary program ina_speech_segmenter.py may be used to segment multimedia archives encoded in any format supported by ffmpeg. It requires input media and output csv files corresponding to the segmentation. Corresponding csv may be visualised using softwares such as https://www.sonicvisualiser.org/

# get help
$ ina_speech_segmenter.py --help
usage: ina_speech_segmenter.py [-h] -i INPUT [INPUT ...] -o OUTPUT_DIRECTORY [-d {sm,smn}] [-g {true,false}] [-b FFMPEG_BINARY] [-e {csv,textgrid}]

Do Speech/Music(/Noise) and Male/Female segmentation and store segmentations into CSV files. Segments labelled 'noEnergy' are discarded from music, noise, speech and gender
analysis. 'speech', 'male' and 'female' labels include speech over music and speech over noise. 'music' and 'noise' labels are pure segments that are not supposed to contain speech.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT [INPUT ...], --input INPUT [INPUT ...]
                        Input media to analyse. May be a full path to a media (/home/david/test.mp3), a list of full paths (/home/david/test.mp3 /tmp/mymedia.avi), a regex input
                        pattern ("/home/david/myaudiobooks/*.mp3"), an url with http protocol (http://url_of_the_file)
  -o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
                        Directory used to store segmentations. Resulting segmentations have same base name as the corresponding input media, with csv extension. Ex: mymedia.MPG will
                        result in mymedia.csv
  -d {sm,smn}, --vad_engine {sm,smn}
                        Voice activity detection (VAD) engine to be used (default: 'smn'). 'smn' split signal into 'speech', 'music' and 'noise' (better). 'sm' split signal into
                        'speech' and 'music' and do not take noise into account, which is either classified as music or speech. Results presented in ICASSP were obtained using 'sm'
                        option
  -g {true,false}, --detect_gender {true,false}
                        (default: 'true'). If set to 'true', segments detected as speech will be splitted into 'male' and 'female' segments. If set to 'false', segments
                        corresponding to speech will be labelled as 'speech' (faster)
  -b FFMPEG_BINARY, --ffmpeg_binary FFMPEG_BINARY
                        Your custom binary of ffmpeg
  -e {csv,textgrid}, --export_format {csv,textgrid}
                        (default: 'csv'). If set to 'csv', result will be exported in csv. If set to 'textgrid', results will be exported to praat Textgrid

Detailled description of this framework is presented in the following study: Doukhan, D., Carrive, J., Vallet, F., Larcher, A., & Meignier, S. (2018, April). An open-source speaker
gender detection framework for monitoring gender equality. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5214-5218). IEEE.

Using Speech Segmentation API

InaSpeechSegmentation API is intended to be very simple to use, and is illustrated by these 2 notebooks :

Google colab tutorial: use API online
Jupyter notebook tutorial : to be used offline

The class allowing to perform segmentations is called Segmenter. It is the only class that you need to import in a program. Class constructor accept 3 optional arguments:

vad_engine (default: 'smn'). Allows to choose between 2 voice activity detection engines.
- 'smn' is the more recent engine and splits signal into speech, music and noise segments
- 'sm' was not trained with noise examples, and split signal into speech and music segments. Noise segments are either considered as speech or music. This engine was used in ICASSP study, and won MIREX 2018 speech detection challenge.
detect_gender (default: True): if set to True, performs gender segmentation on speech segment and outputs labels 'female' or 'male'. Otherwise, outputs labels 'speech' (faster).
ffmpeg: allows to provide a specific binary of ffmpeg instead of default system installation

Citing

inaSpeechSegmenter has been presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018 conference in Calgary, Canada. If you use this toolbox in your research, you can cite the following work in your publications :

@inproceedings{ddoukhanicassp2018,
  author = {Doukhan, David and Carrive, Jean and Vallet, Félicien and Larcher, Anthony and Meignier, Sylvain},
  title = {An Open-Source Speaker Gender Detection Framework for Monitoring Gender Equality},
  year = {2018},
  organization={IEEE},
  booktitle={Acoustics Speech and Signal Processing (ICASSP), 2018 IEEE International Conference on}
}

inaSpeechSegmenter won MIREX 2018 speech detection challenge.
http://www.music-ir.org/mirex/wiki/2018:Music_and_or_Speech_Detection_Results
Details on the speech detection submodule can be found bellow:

@inproceedings{ddoukhanmirex2018,
  author = {Doukhan, David and Lechapt, Eliott and Evrard, Marc and Carrive, Jean},
  title = {INA’S MIREX 2018 MUSIC AND SPEECH DETECTION SYSTEM},
  year = {2018},
  booktitle={Music Information Retrieval Evaluation eXchange (MIREX 2018)}
}

CREDITS

This work has been partially funded by the French National Research Agency (project GEM : Gender Equality Monitor : ANR-19-CE38-0012) and by European Union's Horizon 2020 research and innovation programme (project MeMAD : H2020 grant agreement No 780069).

Some optimization within inaSpeechSegmenter code were realized by Cyril Lashkevich https://github.com/notorca

The code used to extract mel bands features is copy-pasted from sidekit project: https://git-lium.univ-lemans.fr/Larcher/sidekit

Relevant contributions to the project were done by:

Eliott Lechapt : https://github.com/elechapt
Rémi Uro : https://github.com/r-uro

inaspeechsegmenter's People

Contributors

Stargazers

Watchers

Forkers

elechapt lvaleriu entn-at maggie0830 usuyama puhoy larsoncs 0x0ab papiahs saonam molanischen jdc08161063 huguanglong amoliu alongwithyou hongpeng1992 chuanma lephasme sheenchi whu933314 remorses ruohoruotsi usccolumbia jonewei wikipedia2008 sunilsivadas federicafregolent miguel-negrao wy2609 hlng2002 uebergeek999 shihyu databill86 ishine hitman567 thanhtunggggg sasiarivukalanjiam crrflying owen864720655 18376672766666 axchanda alexcannan notorca xulikui123321 vvishal0 qyou ygongny chizhang0814 ireneu precsys joe-nano bilalcorbacioglu alegzandra xuhaoi fakhraddin wy192 hassanamin994 osama-hamed oussema-azzebi hyperaudio r-uro wenwanchen kitisak phattharachon llmhao samantha-fu jason-lee-lxx vancdk oucxlw benhuang2018 dp-aixball mbc-noh dale610 mxuer tann9949 helemanc zerrojs cceyda cdevelop neobrainz oscarliau au-deps oletok taalua mahdiesrafili saifhasan62 ericchlee vinace lyticamx andrewmk ygy12345678 00mjk yifei1010 aisyahizzah wuxiuzhi738 harsh188 ikraduya letsjustfixit simond3v lxfz

inaspeechsegmenter's Issues

memory leak

Line # Mem usage Increment Line Contents

 6    363.0 MiB    363.0 MiB   @profile
 7                             def test():
 8                                 
 9    363.0 MiB      0.0 MiB       media = '8560_00.mp4'
10                             
11    710.4 MiB    347.4 MiB       segmentation = seg(media)
12                                 
13    710.4 MiB      0.0 MiB       print(segmentation)

Line # Mem usage Increment Line Contents

 6    937.0 MiB    937.0 MiB   @profile
 7                             def test():
 8                                 
 9    937.0 MiB      0.0 MiB       media = '8560_00.mp4'
10                             
11   1086.5 MiB    149.6 MiB       segmentation = seg(media)
12                                 
13   1086.5 MiB      0.0 MiB       print(segmentation)

Line # Mem usage Increment Line Contents

 6   1498.8 MiB   1498.8 MiB   @profile
 7                             def test():
 8                                 
 9   1498.8 MiB      0.0 MiB       media = '8560_00.mp4'
10                             
11   1613.3 MiB    114.5 MiB       segmentation = seg(media)
12                                 
13   1613.3 MiB      0.0 MiB       print(segmentation)

`
from inaSpeechSegmenter import Segmenter, seg2csv
from memory_profiler import profile

seg = Segmenter(vad_engine='smn',detect_gender=False)

@Profile
def test():

media = '8560_00.mp4'

segmentation = seg(media)

print(segmentation)

for i in range(10):
test()
`

python 3.7.3
tensorflow 2.2.0
keras 2.4.3

Error running docker image

System information

OS Platform and Distribution: Debian 10, 4.19.0-4-amd64
TensorFlow version: docker image
Python version: docker image
Running on GPU or CPU: GPU GeForce RTX 2060 Driver Version: 460.39
CUDA/cuDNN version (if GPU is used): CUDA 10.1.243-1 CUDNN 7.6.5.32-1+cuda10.1
Using Docker: yes

Expected Behavior

Calling ina_speech_segmenter.py inside the docker container would run.

Current Behavior

does not run:

2021-02-15 18:51:19.309054: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Steps to Reproduce

docker build --build-arg username=$USER --build-arg uid=`id -u $USER` .

docker run -it --gpus all -v "/home/miguel/Development/IPL/investigacao/teste_sons/voice/test_script/:/stuff" 0bc007651a00 bash

pip install matplotlib==3.2
Defaulting to user installation because normal site-packages is not writeable
Collecting matplotlib==3.2
  Downloading matplotlib-3.2.0-cp36-cp36m-manylinux1_x86_64.whl (12.4 MB)
     |████████████████████████████████| 12.4 MB 5.7 MB/s 
Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.6/dist-packages (from matplotlib==3.2) (1.18.5)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib==3.2) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib==3.2) (1.3.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib==3.2) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib==3.2) (2.8.1)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from cycler>=0.10->matplotlib==3.2) (1.15.0)
Installing collected packages: matplotlib
Successfully installed matplotlib-3.2.0
WARNING: You are using pip version 20.2.4; however, version 21.0.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
miguel@17cb13f800b8:/tf$ ina_speech_segmenter.py -i dn-1-44.1-10.mp3 -o .
Traceback (most recent call last):
  File "/usr/local/bin/ina_speech_segmenter.py", line 61, in <module>
    assert len(input_files) > 0, 'No existing media selected for analysis! Bad values provided to -i (%s)' % args.input
AssertionError: No existing media selected for analysis! Bad values provided to -i (['dn-1-44.1-10.mp3'])
miguel@17cb13f800b8:/tf$ ina_speech_segmenter.py -i /stuff/dn-1-44.1-10.mp3 -o /stuff
2021-02-15 18:51:11.205203: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
/usr/local/lib/python3.6/dist-packages/sidekit/bosaris/detplot.py:40: MatplotlibDeprecationWarning: The 'warn' parameter of use() is deprecated since Matplotlib 3.1 and will be removed in 3.3.  If any parameter follows 'warn', they should be pass as keyword, not positionally.
  matplotlib.use('PDF', warn=False, force=True)
2021-02-15 18:51:14.204519: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-02-15 18:51:14.204641: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 18:51:14.205093: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
2021-02-15 18:51:14.205113: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 18:51:14.212524: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-02-15 18:51:14.217429: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-02-15 18:51:14.221732: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-02-15 18:51:14.232363: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-02-15 18:51:14.235654: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-02-15 18:51:14.262948: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-02-15 18:51:14.263099: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 18:51:14.264029: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 18:51:14.264314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-02-15 18:51:14.264533: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-15 18:51:14.270064: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2208000000 Hz
2021-02-15 18:51:14.270530: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7b78790 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-02-15 18:51:14.270549: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-02-15 18:51:14.372068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 18:51:14.372458: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7b7ac60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-02-15 18:51:14.372480: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
2021-02-15 18:51:14.374467: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 18:51:14.374920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
2021-02-15 18:51:14.374957: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 18:51:14.375008: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-02-15 18:51:14.375041: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-02-15 18:51:14.375069: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-02-15 18:51:14.375094: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-02-15 18:51:14.375121: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-02-15 18:51:14.375147: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-02-15 18:51:14.375240: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 18:51:14.375710: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 18:51:14.376106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-02-15 18:51:14.376140: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-02-15 18:51:15.070454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-15 18:51:15.070483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2021-02-15 18:51:15.070492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2021-02-15 18:51:15.070782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 18:51:15.071222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-15 18:51:15.071576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4904 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
batch_processing 1 files
1/1 [('/stuff/dn-1-44.1-10.csv', 0, 'ok')]
2021-02-15 18:51:18.071394: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-02-15 18:51:18.339300: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-02-15 18:51:19.309054: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-02-15 18:51:19.319104: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "/usr/local/bin/ina_speech_segmenter.py", line 77, in <module>
    seg.batch_process(input_files, output_files, verbose=True)
  File "/usr/local/lib/python3.6/dist-packages/inaSpeechSegmenter/segmenter.py", line 288, in batch_process
    lseg = self.segment_feats(mspec, loge, difflen, 0)
  File "/usr/local/lib/python3.6/dist-packages/inaSpeechSegmenter/segmenter.py", line 239, in segment_feats
    lseg = self.vad(mspec, lseg, difflen)
  File "/usr/local/lib/python3.6/dist-packages/inaSpeechSegmenter/segmenter.py", line 138, in __call__
    rawpred = self.nn.predict(batch, batch_size=self.batch_size)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 130, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1599, in predict
    tmp_batch_outputs = predict_function(iterator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 846, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node sequential_3/conv2d_12/Conv2D (defined at /lib/python3.6/dist-packages/inaSpeechSegmenter/segmenter.py:138) ]] [Op:__inference_predict_function_2269]

Function call stack:
predict_function

note:

pip install matplotlib==3.2is needed otherwise the program does not start.
I get the same error when installing from source outsude the docker container.

something about training features

hi,
i have read the paper which the project cited ， but in the paper it tells me you trained the model using mfcc features ，you use melspec in the segment.py 's _wav2feats ,so what exactly features you’ve trained the models

ValueError

I use the command line as below
Ina_speech_segmenter.py -i 0021.mp3 -o out/

The 0021.mp3 file download URL below.
https://www.mediafire.com/file/krfuk8wshq5jr2o/0021.mp3/file

May I know why I got this error message?

Using TensorFlow backend. 2019-10-11 21:54:43.913936: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3392295000 Hz 2019-10-11 21:54:43.914400: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f69cdadd70 executing computations on platform Host. Devices: 2019-10-11 21:54:43.914417: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version /home/shihyu/anaconda3/lib/python3.7/site-packages/keras/engine/saving.py:384: UserWarning: Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer. warnings.warn('Error in loading the saved optimizer ' /home/shihyu/anaconda3/lib/python3.7/site-packages/keras/engine/saving.py:384: UserWarning: Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer. warnings.warn('Error in loading the saved optimizer ' processing file 1/1: 0021.mp3 Traceback (most recent call last): File "/home/shihyu/anaconda3/bin/ina_speech_segmenter.py", line 63, in <module> seg2csv(seg(e), '%s/%s.csv' % (odir, base)) File "/home/shihyu/anaconda3/lib/python3.7/site-packages/inaSpeechSegmenter/segmenter.py", line 174, in __call__ return self.segmentwav(tmpwav) File "/home/shihyu/anaconda3/lib/python3.7/site-packages/inaSpeechSegmenter/segmenter.py", line 148, in segmentwav data21, finite = _get_patches(mspec[:, :21], 68, 2) File "/home/shihyu/anaconda3/lib/python3.7/site-packages/inaSpeechSegmenter/segmenter.py", line 69, in _get_patches data = vaw(mspec, (w,h), step=step) File "/home/shihyu/anaconda3/lib/python3.7/site-packages/skimage/util/shape.py", line 240, in view_as_windows raise ValueError("window_shapeis too large") ValueError:window_shape` is too large

warning management

Do some modifications in script ina_speech_segmenter.py in order to filter the warning messages that should be displayed from those that should be ignored.

Matplotlib TypeError: use() got an unexpected keyword argument 'warn'

System information

Matplotlib version: 3.3.2
Python version: 3.8

Expected Behavior

A successful analysis of the inputted media:
1/1 [('/Users/vanessachaddouk/Desktop/Voice_Gender_Output/musanmix_test-voice.csv', 0, 'ok')]

Current Behavior

When I input a media to analyze, inaSpeechSegmenter returns this error:
TypeError: use() got an unexpected keyword argument 'warn'

Steps to Reproduce

Install inaSpeechSegmenter with the latest version of matplotlib (currently 3.3.2)
Input a media to analyse.

Additional infos

A quick fix is to install an earlier version of matplotlib.
In my case I set it to 3.2 in order to use the current code:
pip install matplotlib==3.2

For reference, this is the full traceback:

improvements in pyroserver

some minor improvements may be done within the sever.

Drop duplicate lines
Strip strings

Please fill out this template for a bug report.

Make sure you ran the unit tests before submitting an issue and tell us if and where they fail.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
TensorFlow version:
Python version:
Running on GPU or CPU:
CUDA/cuDNN version (if GPU is used):
Using Docker:

Expected Behavior

Current Behavior

Steps to Reproduce

Additional infos

sementation fault?

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
TensorFlow version: 2.4.1
Python version: 3.8.5
Running on GPU or CPU: CPU (AMD Ryzen 5 3600)
output of command nvidia-smi (if GPU is used)
CUDA/cuDNN version (if GPU is used):
Using Docker: no

Tried to install both ways: pip install and building from source

Install worked.
ina-speech-segmenter.py help produces the correct help text.

Building from source: python setup.py test failed;
Pip install version: running a .mp3 failed with the same error message:

running test
WARNING: Testing via this command is deprecated and will be removed in a future version. Users looking for a generic test entry point independent of test runner are encouraged to use tox.
running egg_info
writing inaSpeechSegmenter.egg-info/PKG-INFO
writing dependency_links to inaSpeechSegmenter.egg-info/dependency_links.txt
writing requirements to inaSpeechSegmenter.egg-info/requires.txt
writing top-level names to inaSpeechSegmenter.egg-info/top_level.txt
reading manifest file 'inaSpeechSegmenter.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'scripts'
writing manifest file 'inaSpeechSegmenter.egg-info/SOURCES.txt'
running build_ext
2021-03-27 16:26:01.207346: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-03-27 16:26:01.207367: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Segmentation fault (core dumped)

Sidekit

I have installed sidekit using pip and tried running it on spyder(Anaconda). Still it is showing - No module named 'sidekit'. Can you help?

where can i download traning datasets

Hi , I want to retrain this model , where can i download training datasets which the article cited, or can you please offer the download link

AttributeError: module 'distutils' has no attribute 'util

I am using the latest version with an error message

    detect_gender = bool(distutils.util.strtobool(args.detect_gender))
AttributeError: module 'distutils' has no attribute 'util

Add below code to solve in ina_speech_segmenter.py
from distutils import util

What if gender identification is time consuming

Not able to process files over HTTP

Not able to process files over HTTP.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
TensorFlow version: 2.3.0
Python version: 3.7.7
Running on GPU or CPU: GPU
CUDA/cuDNN version (if GPU is used): 11.0
Using Docker: No

Expected Behavior

inaspeechsegmenter.py -i https://domain.tld/file.mp3 -o .
should download and process the file and output to ./file.csv.
At some point, it should print
1/1 [('./file.csv', 0, "ok")]

Current Behavior

Instead, it fails with
1/1 [('./file.csv', 2, "error: <class 'AssertionError'>")]

Additional infos

I work behind an HTTP proxy, but it's correctly set in my environment and I can curl the file with no issue.
I have not been able to reproduce this on another computer. I tested on Archlinux with the same configuration and it works fine. (but is not behind a proxy)

Could I specify the GPU ??

hi,dear
I want to specify the GPU for the seg?
could you help me ?
thx

run detect very slow

i run inaSpeechSegmenter python code to detect voice in file. it take a long time (about 10min/file).
so how can improve speed for detect file

thanks!

Error in loading saved optimizer

i am using windows 10, and when i ran the command, it works fine and then gives me this error

"Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer."

It segments the audio but its incorrect, i mean i am able to get csv file but the detection is wrong, even with english audio with no background noise or overlap. why i get this error ?

update doc with more details on label meaning

energy threshold based on local average

could I input waveform data?

hi,dear
I see the input is the mp3 file,but could I set the waveform data ?
Or wav file is Ok??[tried,but not success]

could you help me ?
thx

Dockerfile issue while installing pyroomcoustics

Hi , I'm getting an error while building the dockerfile. This issue appeared recently I havent had this problem in the past couple days. any help would be greatly appreciated. Thank you.

Heres the error

` Building wheel for pyroomacoustics (PEP 517): started
Building wheel for pyroomacoustics (PEP 517): still running...
Building wheel for pyroomacoustics (PEP 517): finished with status 'error'
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 /usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmpi4sxv4jv
cwd: /tmp/pip-install-yb90rja0/pyroomacoustics
Complete output (119 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/recognition.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/utilities.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/windows.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/metrics.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/version.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/init.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/soundsource.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/beamforming.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/multirate.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/room.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/parameters.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/sync.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
copying pyroomacoustics/acoustics.py -> build/lib.linux-x86_64-3.6/pyroomacoustics
creating build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/grid.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/frida.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/doa.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/waves.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/srp.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/utils.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/init.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/tops.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/detect_peaks.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/music.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/tools_fri_doa_plane.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/plotters.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
copying pyroomacoustics/doa/cssm.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/doa
creating build/lib.linux-x86_64-3.6/pyroomacoustics/adaptive
copying pyroomacoustics/adaptive/util.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/adaptive
copying pyroomacoustics/adaptive/rls.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/adaptive
copying pyroomacoustics/adaptive/adaptive_filter.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/adaptive
copying pyroomacoustics/adaptive/init.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/adaptive
copying pyroomacoustics/adaptive/subband_lms.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/adaptive
copying pyroomacoustics/adaptive/data_structures.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/adaptive
copying pyroomacoustics/adaptive/lms.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/adaptive
creating build/lib.linux-x86_64-3.6/pyroomacoustics/transform
copying pyroomacoustics/transform/init.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/transform
copying pyroomacoustics/transform/stft.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/transform
copying pyroomacoustics/transform/dft.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/transform
creating build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
copying pyroomacoustics/experimental/delay_calibration.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
copying pyroomacoustics/experimental/init.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
copying pyroomacoustics/experimental/physics.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
copying pyroomacoustics/experimental/deconvolution.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
copying pyroomacoustics/experimental/measure_ir.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
copying pyroomacoustics/experimental/localization.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
copying pyroomacoustics/experimental/point_cloud.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
copying pyroomacoustics/experimental/rt60.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
copying pyroomacoustics/experimental/signals.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/experimental
creating build/lib.linux-x86_64-3.6/pyroomacoustics/datasets
copying pyroomacoustics/datasets/utils.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/datasets
copying pyroomacoustics/datasets/cmu_arctic.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/datasets
copying pyroomacoustics/datasets/init.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/datasets
copying pyroomacoustics/datasets/timit.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/datasets
copying pyroomacoustics/datasets/base.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/datasets
copying pyroomacoustics/datasets/google_speech_commands.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/datasets
creating build/lib.linux-x86_64-3.6/pyroomacoustics/bss
copying pyroomacoustics/bss/ilrma.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/bss
copying pyroomacoustics/bss/auxiva.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/bss
copying pyroomacoustics/bss/init.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/bss
copying pyroomacoustics/bss/trinicon.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/bss
copying pyroomacoustics/bss/fastmnmf.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/bss
copying pyroomacoustics/bss/sparseauxiva.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/bss
copying pyroomacoustics/bss/common.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/bss
creating build/lib.linux-x86_64-3.6/pyroomacoustics/denoise
copying pyroomacoustics/denoise/init.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/denoise
copying pyroomacoustics/denoise/spectral_subtraction.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/denoise
copying pyroomacoustics/denoise/subspace.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/denoise
copying pyroomacoustics/denoise/iterative_wiener.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/denoise
creating build/lib.linux-x86_64-3.6/pyroomacoustics/phase
copying pyroomacoustics/phase/init.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/phase
copying pyroomacoustics/phase/gl.py -> build/lib.linux-x86_64-3.6/pyroomacoustics/phase
copying pyroomacoustics/build_rir.pyx -> build/lib.linux-x86_64-3.6/pyroomacoustics
creating build/lib.linux-x86_64-3.6/pyroomacoustics/data
copying pyroomacoustics/data/materials.json -> build/lib.linux-x86_64-3.6/pyroomacoustics/data
running build_ext
creating tmp
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.6m -c /tmp/tmpx2iw39cf.cpp -o tmp/tmpx2iw39cf.o -std=c++14
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.6m -c /tmp/tmp96z8m1wg.cpp -o tmp/tmp96z8m1wg.o -fvisibility=hidden
cythoning pyroomacoustics/build_rir.pyx to pyroomacoustics/build_rir.c
building 'pyroomacoustics.libroom' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/pyroomacoustics
creating build/temp.linux-x86_64-3.6/pyroomacoustics/libroom_src
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I. -Ipyroomacoustics/libroom_src -I/tmp/pip-build-env-2qu99ur1/overlay/lib/python3.6/site-packages/pybind11/include -I/tmp/pip-build-env-2qu99ur1/overlay/lib/python3.6/site-packages/pybind11/include -Ipyroomacoustics/libroom_src/ext/eigen -I/usr/include/python3.6m -c pyroomacoustics/libroom_src/libroom.cpp -o build/temp.linux-x86_64-3.6/pyroomacoustics/libroom_src/libroom.o -DEIGEN_MPL2_ONLY -Wall -O3 -DEIGEN_NO_DEBUG -DVERSION_INFO="0.4.1" -std=c++14 -fvisibility=hidden
In file included from pyroomacoustics/libroom_src/ext/eigen/Eigen/Core:450:0,
from /tmp/pip-build-env-2qu99ur1/overlay/lib/python3.6/site-packages/pybind11/include/pybind11/eigen.h:36,
from pyroomacoustics/libroom_src/libroom.cpp:30:
pyroomacoustics/libroom_src/ext/eigen/Eigen/src/Core/DenseStorage.h: In member function ‘std::tuple<Eigen::Matrix<float, D, 1, (AutoAlign | (((((int)D) == 1) && (1 != 1)) ? RowMajor : (((1 == 1) && (((int)D) != 1)) ? ColMajor : ColMajor))), D, 1>, int, float> Room::next_wall_hit(Vectorf&, Vectorf&, bool) [with long unsigned int D = 2]’:
pyroomacoustics/libroom_src/ext/eigen/Eigen/src/Core/DenseStorage.h:194:66: warning: ‘result’ may be used uninitialized in this function [-Wmaybe-uninitialized]
DenseStorage(const DenseStorage& other) : m_data(other.m_data) {
^
pyroomacoustics/libroom_src/ext/eigen/Eigen/src/Core/DenseStorage.h:194:66: warning: ‘((void)& result +4)’ may be used uninitialized in this function [-Wmaybe-uninitialized]
pyroomacoustics/libroom_src/ext/eigen/Eigen/src/Core/DenseStorage.h: In member function ‘std::tuple<Eigen::Matrix<float, D, 1, (AutoAlign | (((((int)D) == 1) && (1 != 1)) ? RowMajor : (((1 == 1) && (((int)D) != 1)) ? ColMajor : ColMajor))), D, 1>, int, float> Room::next_wall_hit(Vectorf&, Vectorf&, bool) [with long unsigned int D = 3]’:
pyroomacoustics/libroom_src/ext/eigen/Eigen/src/Core/DenseStorage.h:194:66: warning: ‘result’ may be used uninitialized in this function [-Wmaybe-uninitialized]
DenseStorage(const DenseStorage& other) : m_data(other.m_data) {
^
pyroomacoustics/libroom_src/ext/eigen/Eigen/src/Core/DenseStorage.h:194:66: warning: ‘((void)& result +4)’ may be used uninitialized in this function [-Wmaybe-uninitialized]
pyroomacoustics/libroom_src/ext/eigen/Eigen/src/Core/DenseStorage.h:194:66: warning: ‘((void)& result +8)’ may be used uninitialized in this function [-Wmaybe-uninitialized]
In file included from pyroomacoustics/libroom_src/room.hpp:254:0,
from pyroomacoustics/libroom_src/libroom.cpp:38:
pyroomacoustics/libroom_src/room.cpp: In member function ‘bool Room::is_visible_dfs(Vectorf&, ImageSource&) [with long unsigned int D = 2]’:
pyroomacoustics/libroom_src/room.cpp:374:18: warning: ‘((void)& intersection +4)’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Vectorf intersection;
^~~~~~~~~~~~
pyroomacoustics/libroom_src/room.cpp:374:18: warning: ‘intersection’ may be used uninitialized in this function [-Wmaybe-uninitialized]
x86_64-linux-gnu-gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-7/README.Bugs for instructions.
/tmp/pip-build-env-2qu99ur1/overlay/lib/python3.6/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-yb90rja0/pyroomacoustics/pyroomacoustics/build_rir.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
error: command 'x86_64-linux-gnu-gcc' failed with exit status 4

ERROR: Failed building wheel for pyroomacoustics
Building wheel for future (setup.py): started
Building wheel for future (setup.py): finished with status 'done'
Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=493275 sha256=2e43fd37134587de97adc2a0c36c2a03b47e208366a9a71896a0c3736dbf2f16
Stored in directory: /root/.cache/pip/wheels/6e/9c/ed/4499c9865ac1002697793e0ae05ba6be33553d098f3347fb94
Successfully built sidekit pyyaml docopt future
Failed to build pyroomacoustics
ERROR: Could not build wheels for pyroomacoustics which use PEP 517 and cannot be installed directly
WARNING: You are using pip version 20.1.1; however, version 20.2.3 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
The command '/bin/bash -c apt-get update && apt-get install -y ffmpeg && pip install inaspeechsegmenter' returned a non-zero code: 1
Here's my dockerfile
FROM tensorflow/tensorflow:2.3.0-gpu-jupyter

RUN apt-get update &&
apt-get install -y ffmpeg &&
pip install inaspeechsegmenter
`

Regarding its use in other domains

how we can use that for other purpose

Any way to detect sentences?

I am on about speakers (more specifically narrators) that do not pay attention to phrase break durations; I would only like to split on full sentences and these are almost always longer. Does a higher bitrate sound file help?

how to classify speech and speech over music？

add issue template

telling to launch the unit tests
list all required informations : os, python version, tensorflowversion, etc...

GPU usage ?

How can I run this segmentation on GPU instead of CPU for faster processing ?
Is there an API for this ?

update setup.py -> pyannote

Due to last changes in pyannote, setup should be changed in order to use an older version of pyannote

pyannote.algorithms==0.8
pyannote.core==3.0
pyannote.parser==0.7.1

unit test docker image

A test of the docker image would be nice

Cuda 11 ?

Please fill out this template for a bug report.

Make sure you ran the unit tests before submitting an issue and tell us if and where they fail.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04.5 LTS
TensorFlow version: tensorflow-gpu 2.3.1
Python version: Python 3.6.9
Running on GPU or CPU: Trying GPU
CUDA/cuDNN version (if GPU is used): CUDA 11.1.1-1
Using Docker: NO

Expected Behavior

Shoudl use GPU

Current Behavior

ina_speech_segmenter.py  -i *.mp3 -d smn -g true -o .

but it complains about missing libs, which maybe Cuda 11 does not have ?

inaSpeechSegEnv) root@timemachine:~/radio1# ina_speech_segmenter.py  -i *.mp3 -d smn -g true -o .
2020-12-06 09:47:53.035831: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-12-06 09:47:53.035899: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-12-06 09:48:00.912205: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-06 09:48:00.917185: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-06 09:48:00.918312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:05.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2020-12-06 09:48:00.918560: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-12-06 09:48:00.919860: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
2020-12-06 09:48:00.949916: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-12-06 09:48:00.962797: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-12-06 09:48:00.964212: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2020-12-06 09:48:00.964355: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory
2020-12-06 09:48:00.965640: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2020-12-06 09:48:00.965672: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

Steps to Reproduce

Install Ubuntu 18
Install via PIP via virtualenv
Install CUDA via Network Package
Reboot
Run

Additional infos

Memory grows with each execution and is not freed

Training on custom dataset

Hello,
I want to train your awesome model on custom data set. can you help me and guide me.?

Any improvements plan?

I forgot to congratulate you for this nice project! So 'im doing it right now. You've done a very nice job.
Is there any plan to improve this project so that contributers can help?

Control granularity of speech segments

I wonder if it would be possible to make the speech segments more granular or have control of the granularity? At present it seems that lengthy periods of speech (e.g. sentences) are bundled up into one. I'm looking for segmentation at the word level.

add ffmpeg options in command line program and API

ffmpeg option management, may help, for instance, to process subclips of a soundfile (-ss, -to

Multithreading issue

The code works well when run iteratively. However, for applications that require multithreading (CPU) and segmenting multiples files in parallel, there is a lock on the process. Is there way to fix this ? In other words, is there a way to load the model once and use it on multiple threads to segment several files at once?
I think this has to do with the tensorflow backend.

Win10 Py 3.5: Impossible to use PyTorch

Hi,

Here are my specifications:
Win10 64-Bit
Python 3.5.0
pip 19.0.3

I am trying to use inaSpeechSegmenter but I am unable to do so because of PyTorch.
I've searched on the internet to manually install, which I finally did, but it is still not working.

The error is :

    import torch
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\__init__.py", line 102, in <module>
    from torch._C import *
ImportError: DLL load failed: Le module spécifié est introuvable.

Does anyone have any idea what to do?

Thanks for the help.

with which tensorflow version it is compatible with ? unable to import package

Traceback (most recent call last):
File "Z:\anaconda\lib\site-packages\tensorflow_core\python\pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "Z:\anaconda\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "Z:\anaconda\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "Z:\anaconda\lib\imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "Z:\anaconda\lib\imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "Z:\anaconda\lib\site-packages\inaSpeechSegmenter_init_.py", line 26, in
from .segmenter import Segmenter, seg2csv
File "Z:\anaconda\lib\site-packages\inaSpeechSegmenter\segmenter.py", line 31, in
import keras
File "Z:\anaconda\lib\site-packages\keras_init_.py", line 3, in
from . import utils
File "Z:\anaconda\lib\site-packages\keras\utils_init_.py", line 6, in
from . import conv_utils
File "Z:\anaconda\lib\site-packages\keras\utils\conv_utils.py", line 9, in
from .. import backend as K
File "Z:\anaconda\lib\site-packages\keras\backend_init_.py", line 1, in
from .load_backend import epsilon
File "Z:\anaconda\lib\site-packages\keras\backend\load_backend.py", line 90, in
from .tensorflow_backend import *
File "Z:\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py", line 5, in
import tensorflow as tf
File "Z:\anaconda\lib\site-packages\tensorflow_init_.py", line 101, in
from tensorflow_core import *
File "Z:\anaconda\lib\site-packages\tensorflow_core_init_.py", line 40, in
from tensorflow.python.tools import module_util as module_util
File "Z:\anaconda\lib\site-packages\tensorflow_init.py", line 50, in getattr
module = self.load()
File "Z:\anaconda\lib\site-packages\tensorflow_init.py", line 44, in _load
module = importlib.import_module(self.name)
File "Z:\anaconda\lib\importlib_init.py", line 127, in import_module
return _bootstrap.gcd_import(name[level:], package, level)
File "Z:\anaconda\lib\site-packages\tensorflow_core\python_init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "Z:\anaconda\lib\site-packages\tensorflow_core\python\pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "Z:\anaconda\lib\site-packages\tensorflow_core\python\pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "Z:\anaconda\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "Z:\anaconda\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
return load_dynamic(name, filename, file)
File "Z:\anaconda\lib\imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

update readme

add batch_processing examples
add references to unit tests
update notebook tutorial

Training CNN

Dear Concerned,

Is there any way to get access to the script used for training CNN rather than using the pretrained CNNs for French speakers? This would help the researchers evaluate the model for other native speakers. Thank you.

pip install error

hi,dear
OS:win10
when I use pip install the inaSpeechSegmenter,
ups error

ERROR: Could not find a version that satisfies the requirement torch>=1.0 (from sidekit) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.0 (from sidekit)

so how to solve it?
thx

add noise detection

Batch Processing

The batch_process method appears to behave arbitrarily. I have hundreds of audio recordings (avg duration: 45s). The batch_process method hangs arbitrarily after processing 3/10+/30+ audio files. Even when I passed single audio files in the input_files list, it paused arbitrarily without producing the output csv file of the last audio even though the method printed its ok message. What might go wrong here? I have tensorflow-gpu support in my pc. So, there should not be any problem in this regard!

VAD to detect simultaneous music and voice

This is more of a feature request - is it possible to detect simultaneous music and voice?

Segmentation takes too long ?

Is there any way to make this segmenter load/run faster? It takes a long time to segment music even when the input file is small.

Retraining this project

Hi,
I need to retrain this project in order to test with Italian speech. In particular, I need the instructions to do this and the training files that you used. Can you help me?
Thanks!

Problems in MS Windows

Hi!

I have problems with inaspeechsegmenter in Windows, because the input path cannot be found.

How can I fix this problem?

Thanks in advance,
Rita

add detection score pour each segment

Comprehensive list of supported audio formats?

Would it be possible to get a comprehensive list of all audio file formats supported by this tool?

Is it just a matter of "whatever ffmpeg is able to transcode to WAV"?

Add tests

It'd be nice to see some pytest (or whichever your favourite python testing framework might be) tests added to the repo.

Dockerfile do not work

Even using the provided docker I am getting below error. Can you please provide requirement.txt for this repo?
:

Traceback (most recent call last): │··············································································································
File "", line 1, in │··············································································································
File "/usr/local/lib/python3.6/dist-packages/inaSpeechSegmenter/init.py", line 26, in │··············································································································
from .segmenter import Segmenter, seg2csv │··············································································································
File "/usr/local/lib/python3.6/dist-packages/inaSpeechSegmenter/segmenter.py", line 31, in │··············································································································
import keras │··············································································································
File "/usr/local/lib/python3.6/dist-packages/keras/init.py", line 6, in │··············································································································
'Keras requires TensorFlow 2.2 or higher. ' │··············································································································
ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow

Model training

How do we train a model to be used for the inaSpeechSegmenter

ina-foss / inaspeechsegmenter Goto Github PK

inaspeechsegmenter's Introduction

inaSpeechSegmenter

Installation

Prerequisites

PIP installation

Installing from from sources

Using inaSpeechSegmenter

Speech Segmentation Program

Using Speech Segmentation API

Citing

CREDITS

inaspeechsegmenter's People

Contributors

Stargazers

Watchers

Forkers

inaspeechsegmenter's Issues

Line # Mem usage Increment Line Contents

Line # Mem usage Increment Line Contents

Line # Mem usage Increment Line Contents

System information

Expected Behavior

Current Behavior

Steps to Reproduce

System information

Expected Behavior

Current Behavior

Steps to Reproduce

Additional infos

System information

Expected Behavior

Current Behavior

Steps to Reproduce

Additional infos

System information

System information

Expected Behavior

Current Behavior

Additional infos

System information

Expected Behavior

Current Behavior

Steps to Reproduce

Additional infos

Recommend Projects

Recommend Topics

Recommend Org