Giter VIP home page Giter VIP logo

youtube_tts_data_generator's Issues

Punctuations missing in downloaded cvs

Hi Pandya, thanks for creating great resource. Everything works great apart from the fact that this package removes punctuation in the subtitles which I suppose you could understand is very bad things for training as without punctuations attention model will fail to converge hence bad output. Do you know any fix for that?

I am generating dataset for a Spanish video with lang='es'

@hetpandya

[CONTRIBUTION] Speech Dataset Generator

Hi everyone!

I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/

Now you can create datasets automatically with any audio or lists of audios.

I hope you can find it useful.

Here are the key functionalities of the project:

  1. Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).

  2. Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.

  3. Sound Quality Improvement: It improves the quality of the audio when needed.

  4. Audio Segmentation: It can segment audio files within specified second ranges.

  5. Transcription: The project transcribes the segmented audio, providing a textual representation.

  6. Gender Identification: It identifies the gender of each speaker in the audio.

  7. Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.

  8. Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.

  9. Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.

  10. Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.

  11. Syllabic and words-per-minute metrics

  12. Multiple input sources: You can either use your own files or download content by pasting URLs from sources such as YouTube, LibriVox and TED Talks.

Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator

Update to version 0.2.0

Changelog:

  • Added fix to #1 by adding support for downloading subtitles with punctuations.
  • Added option to change default sample rate (#2)
  • Fixed bug for duplicate text entries in metadata.
  • The default subtitle format has been changed to json. If srt or vtt subtitles are detected, they will automatically be converted to json.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.