hetpandya / youtube_tts_data_generator Goto Github PK
View Code? Open in Web Editor NEWA python library to generate speech dataset from Youtube videos
License: Apache License 2.0
A python library to generate speech dataset from Youtube videos
License: Apache License 2.0
Hi, I have found two more issue that I have fixed in your code. I will be sending PR for those too.
Lack of webvtt-py
package before installing is not handled in setup.py. Added the dependency.
Hi Pandya, thanks for creating great resource. Everything works great apart from the fact that this package removes punctuation in the subtitles which I suppose you could understand is very bad things for training as without punctuations attention model will fail to converge hence bad output. Do you know any fix for that?
I am generating dataset for a Spanish video with lang='es'
json
. If srt or vtt subtitles are detected, they will automatically be converted to json.Hi everyone!
I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/
Now you can create datasets automatically with any audio or lists of audios.
I hope you can find it useful.
Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).
Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.
Sound Quality Improvement: It improves the quality of the audio when needed.
Audio Segmentation: It can segment audio files within specified second ranges.
Transcription: The project transcribes the segmented audio, providing a textual representation.
Gender Identification: It identifies the gender of each speaker in the audio.
Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.
Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.
Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.
Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.
Syllabic and words-per-minute metrics
Multiple input sources: You can either use your own files or download content by pasting URLs from sources such as YouTube, LibriVox and TED Talks.
Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.