hetpandya / youtube_tts_data_generator Goto Github PK

A python library to generate speech dataset from Youtube videos

License: Apache License 2.0

Python 100.00%

speech-dataset text-to-speech youtube youtube-dataset youtube-dataset-generator python-library dataset-generator text-to-speech-dataset tts tts-dataset

youtube_tts_data_generator's People

Contributors

Stargazers

Watchers

Forkers

jimzers davincidreams dtemnov yaboiksa ccsourcecode flaviogoncalves akshatgarg99

youtube_tts_data_generator's Issues

Change default Sample Rate and Signal to Standard for TTS dataset

Hi, I have found two more issue that I have fixed in your code. I will be sending PR for those too.

Webvtt dependency missing on `pip install`

Lack of webvtt-py package before installing is not handled in setup.py. Added the dependency.

Punctuations missing in downloaded cvs

Hi Pandya, thanks for creating great resource. Everything works great apart from the fact that this package removes punctuation in the subtitles which I suppose you could understand is very bad things for training as without punctuations attention model will fail to converge hence bad output. Do you know any fix for that?

I am generating dataset for a Spanish video with lang='es'

@hetpandya

Update to version 0.2.0

Changelog:

Added fix to #1 by adding support for downloading subtitles with punctuations.
Added option to change default sample rate (#2)
Fixed bug for duplicate text entries in metadata.
The default subtitle format has been changed to json. If srt or vtt subtitles are detected, they will automatically be converted to json.

[CONTRIBUTION] Speech Dataset Generator

Hi everyone!

I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/

Now you can create datasets automatically with any audio or lists of audios.

I hope you can find it useful.

Here are the key functionalities of the project:

Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).
Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.
Sound Quality Improvement: It improves the quality of the audio when needed.
Audio Segmentation: It can segment audio files within specified second ranges.
Transcription: The project transcribes the segmented audio, providing a textual representation.
Gender Identification: It identifies the gender of each speaker in the audio.
Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.
Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.
Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.
Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.
Syllabic and words-per-minute metrics
Multiple input sources: You can either use your own files or download content by pasting URLs from sources such as YouTube, LibriVox and TED Talks.

Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator

hetpandya / youtube_tts_data_generator Goto Github PK

youtube_tts_data_generator's People

Contributors

Stargazers

Watchers

Forkers

youtube_tts_data_generator's Issues

Change default Sample Rate and Signal to Standard for TTS dataset

Webvtt dependency missing on `pip install`

Punctuations missing in downloaded cvs

Update to version 0.2.0

Changelog:

[CONTRIBUTION] Speech Dataset Generator

Here are the key functionalities of the project:

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent