Topic: wav2vec2 Goto Github

Some thing interesting about wav2vec2

👇 Here are 105 public repositories matching this topic...

amirabaskohi / automatic-speech-recognition-for-speech-assessment-of-persian-preschool-children

wav2vec2,Preschool evaluation is crucial because it gives teachers and parents influential knowledge about children's growth and development. The COVID-19 pandemic has highlighted the necessity of online assessment for preschool children. One of the areas that should be tested is their ability to speak. Employing an Automatic Speech Recognition (ASR) system would not help since they are pre-trained on voices that differ from children's in terms of frequency and amplitude. Because most of these are pre-trained with data in a specific range of amplitude, their objectives do not make them ready for voices in different amplitudes. To overcome this issue, we added a new objective to the masking objective of the Wav2Vec 2.0 model called Random Frequency Pitch (RFP). In addition, we used our newly introduced dataset to fine-tune our model for Meaningless Words (MW) and Rapid Automatic Naming (RAN) tests. Using masking in concatenation with RFP outperforms the masking objective of Wav2Vec 2.0 by reaching a Word Error Rate (WER) of 1.35. Our new approach reaches a WER of 6.45 on the Persian section of the CommonVoice dataset. Furthermore, our novel methodology produces positive outcomes in zero- and few-shot scenarios.

User: amirabaskohi

Home Page: https://maghzineh.com/CognitiveTests/TestEnter.aspx

asr speech-recognition wav2vec2 dataset deep-learning

aryanxxvii / lark

wav2vec2,Speech Assessment API in NextJS

User: aryanxxvii

Home Page: https://larkapi.vercel.app/

huggingface llm machine-learning nextjs phoneme-recognition prisma wav2vec2 pronunciation speech-recognition

audeering / w2v2-how-to

wav2vec2,How to use our public wav2vec2 dimensional emotion model

Organization: audeering

speech-emotion-recognition deep-learning wav2vec2 transformer-models arousal dominance valence msp-podcast onnx

daanzu / wav2vec2_stt_python

wav2vec2,Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

User: daanzu

speech-recognition speech-to-text speech python pytorch wav2vec2 wav2vec

ecnu-cross-innovation-lab / ent

wav2vec2,[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

Organization: ecnu-cross-innovation-lab

Home Page: https://arxiv.org/abs/2403.19224

automatic-speech-recognition speech-emotion-recognition wav2vec2

ecnu-cross-innovation-lab / shiftser

wav2vec2,[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations

Organization: ecnu-cross-innovation-lab

Home Page: https://www.researchgate.net/publication/371101522

hubert speech-emotion-recognition wav2vec2

egorsmkv / asr-corpus-creator

wav2vec2,This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.

User: egorsmkv

asr audio audio-processing automatic-speech-recognition nemo speech-recognition wav2vec2 whisper

fernandolpz / speechrecognition

wav2vec2,This repository contains the implementation of an Automatic Speech Recognition system in python, using a client-server architecture with Web Sockets.

User: fernandolpz

Home Page: https://youtu.be/gdSUyI1z50o

automatic-speech-recognition python speech-recognition speech-to-text transformers wav2vec2 websockets

habla-liaa / ser-with-w2v2

wav2vec2,Official implementation of INTERSPEECH 2021 paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'

Organization: habla-liaa

speech-emotion-recognition wav2vec2 deep-learning tensorflow speech

hammaad2002 / asradversarialattacks

wav2vec2,An ASR (Automatic Speech Recognition) adversarial attack repository.

User: hammaad2002

adversarial-attacks adversarial-machine-learning asr carlini-wagner carlini-wagner-attack fgsm-attack pgd-adversarial-attacks pgd-attack projected-gradient-desent wav2vec2

hamtech-ai / wav2vec2-fa

wav2vec2,fine-tune Wav2vec2. an ASR model released by Facebook

Organization: hamtech-ai

Home Page: https://huggingface.co/masoudmzb/wav2vec2-xlsr-multilingual-53-fa/tree/main

asr asr-model huggingface nlp speech-to-text transformer wav2vec2

harunorikawano / wav2vec2.0

wav2vec2,Implementation of the paper "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations" in Pytorch.

User: harunorikawano

Home Page: https://arxiv.org/abs/2006.11477

pytorch speech-recognition wav2vec2

inboxpraveen / llm-minutes-of-meeting

wav2vec2,🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! 🚀

User: inboxpraveen

huggingface huggingface-transformers llm llm-inference meeting-minutes minutes-of-meeting natural-language-processing nlp python speech-recognition

jmaczan / asr-dysarthria

wav2vec2,Research on Automatic Speech Recognition for dysarthric speech

User: jmaczan

Home Page: https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria

asr automatic-speech-recognition dysarthria dysarthric-speech wav2vec2 deep-learning self-supervised-learning

juju2181 / automatic-nepali-speech-recognition-and-summarizer

wav2vec2,A system capable of converting Nepali speech to text and generate summary of text

User: juju2181

Home Page: https://client-sudips413.vercel.app/

abstractive-summarization deep-learning extractive-summarization machine-learning nepali nepali-nlp python speech-recognition wav2vec2 cnn-resnet-bilstm

jvel07 / wav2vec2_patho

wav2vec2,Fine-tuning wav2vec2 to for Pathological Speech Processing

User: jvel07

computational-paralinguistics deep-learning dnn-embeddings emotion-recognition pytorch sound-processing speech-embeddings speech-processing speech-recognition utterance

khanld / asr-wav2vec-finetune

wav2vec2,:zap: Finetune Wa2vec 2.0 For Speech Recognition

User: khanld

asr finetune-wav2vec huggingface pytorch speech-recognition speech-to-text vietnamese-speech-recognition wav2vec2

khanld / wav2vec2-pretraining

wav2vec2,Wav2vec 2.0 Self-Supervised Pretraining

User: khanld

asr contrastive-learning pretraining quantization self-supervised speech-processing speech-recognition speech-to-text wav2vec2

kingabzpro / wolof-asr-wav2vec2

wav2vec2,Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.

User: kingabzpro

Home Page: https://www.kaggle.com/kingabzpro/fine-tuning-xlsr-wav2vec2-for-wolof-asr-with

asr-model wav2vec2 wolof africa audio-processing audio facebook transcription

louisbrulenaudet / balena

wav2vec2,BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.

User: louisbrulenaudet

Home Page: https://lemone.io

execution python3 sentence-similarity sentence-transformers speech-recognition speech-to-function speech-to-text terminal transformers wav2vec2

lstrgar / self-supervised-phone-segmentation

wav2vec2,Phoneme segmentation using pre-trained speech models

User: lstrgar

hubert speech-segmentation wav2vec2 deep-learning self-supervised-learning speech-technology

lucasgris / wav2vec4bp

wav2vec2,Wav2vec resources and models for Brazilian Portuguese

User: lucasgris

automatic-speech-recognition brazilian-portuguese dataset portuguese speech-to-text wav2vec wav2vec2

mikezzb / lyrics-sync

wav2vec2,A deep learning lyrics-to-audio alignment system, generating synchronized lyrics from a song and its lyrics

User: mikezzb

ai deep-learning demucs jupyter-notebook lyrics machine-learning music music-information-retrieval python wav2vec2

mmakiuchi / multimodal_emotion_recognition

wav2vec2,Scripts used in the research described in the paper "Multimodal Emotion Recognition with High-level Speech and Text Features" accepted in the ASRU 2021 conference.

User: mmakiuchi

emotion-recognition speech-emotion-recognition text-emotion-detection wav2vec2 disentanglement-learning asru2021

mpoyraz / wav2vec2-turkish

wav2vec2,Turkish Speech Recognition using Facebook's Wav2vec 2.0 models

User: mpoyraz

asr speech-recognition speech-to-text turkish wav2vec2

msparihar / transcriber

wav2vec2,Developed an AI tool to automatically generate captions and transcripts for YouTube videos in 67 languages and can generate summarized texts in 133 languages.

User: msparihar

Home Page: https://github.com/Msparihar/Transcriber

audio-processing deep-neural-networks kenlm nlp wav2vec2

mt-upc / shas

wav2vec2,SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Organization: mt-upc

audio-segmentation speech-translation speech-to-text speech wav2vec2

notai-tech / indicasr

wav2vec2,Speeech Recognition for Indic languages.

Organization: notai-tech

speech-recognition telugu transformers pytorch wav2vec wav2vec2 asr indian-language speech-to-text

oliverguhr / wav2vec2-live

wav2vec2,A live speech recognition using Facebooks wav2vec 2.0 model.

User: oliverguhr

speech-recognition wav2vec2 pyaudio wav2vec speech-to-text asr speech

paddlepaddle / paddlespeech

wav2vec2,Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Organization: paddlepaddle

Home Page: https://paddlespeech.readthedocs.io

transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr

parvatijay2901 / hindi-asr-and-tts

wav2vec2,EC499: Major Project

User: parvatijay2901

asr espnetv2 parallel-wavegan spoken-converation-module tacotron2 tts wav2vec2

pooya-mohammadi / audio-classification-pytorch

wav2vec2,In this project, several approaches for training/finetuning an audio gender recognition is provided. The code can simply be used for any other audio classification task by simply changing the number of classes and the input dataset.

User: pooya-mohammadi

audio-classification deep-learning deep-utils python pytorch lstm transformers wav2vec2

pradeepbatchu / speechtotext

wav2vec2,Speech to Text with Wav2Vec2 using torchaudio

User: pradeepbatchu

wav2vec2 torch speech-to-text flask torchaudio wav torchtext torchlight

pszemraj / vid2cleantxt

wav2vec2,Python API & command-line tool to easily transcribe speech-based video files into clean text

User: pszemraj

nlp audio audio-processing transcription speech-to-text speech-recognition speech spelling-correction keyword-extraction keyword

rubenszimbres / repo-2022

wav2vec2,Python codes on PyTorch, Tensorflow, Keras, Wav2Vec2 Fine-Tuning and Google Cloud

User: rubenszimbres

googlecloudplatform keras-tensorflow wav2vec2

s3prl / s3prl

wav2vec2,Self-Supervised Speech Pre-training and Representation Learning Toolkit

Organization: s3prl

Home Page: https://s3prl.github.io/s3prl/

speech-representation mockingjay representation-learning apc tera self-supervised-learning speech-pretraining vq-apc wav2vec vq-wav2vec

sakshirathi77 / hindispeechpro-automatic-speech-recognization

wav2vec2,The project,being part of Kagglex BIPOC Mentorship Program final project, aims to train two separate Hindi ASR models using the Facebook Wav2Vec2 (300M parameters) and OpenAI Whisper-Small models, respectively. The goal is to compare their performance, with a target WER of less than 13%, across various Hindi accents and dialects.

User: sakshirathi77

Home Page: https://huggingface.co/spaces/SakshiRathi77/SakshiRathi77-Wishper-Hi-Kagglex

asr common-voice-dataset gradio hindi-language huggingface kagglexbipoc mp3-to-wav speech-recognition transformer wav2vec2

scottykwok / cantonese-selfish-project

wav2vec2,Cantonese Selfish Project 廣東話自肥企劃 at PYCON HK 2021

User: scottykwok

pycon pyconhk cantonese deepspeech wav2vec2 speechrecognition cantonese-speech-recognition

skit-ai / map-mix

wav2vec2,The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at ICASSP-2023)

Organization: skit-ai

datamaps hubert language-identification mixup speech-processing spoken-language-identification spoken-language-recognition wav2vec2 xlsr confidence-labels