reriiasu / speech-to-text Goto Github PK

View Code? Open in Web Editor NEW

289.0 9.0 44.0 3.15 MB

Real-time transcription using faster-whisper

License: MIT License

Python 28.18% HTML 35.73% JavaScript 26.42% CSS 9.67%

faster-whisper speech-recognition whisper voice-recognition openai speech-to-text

speech-to-text's People

Stargazers

Watchers

speech-to-text's Issues

PY_SSIZE_T_CLEAN error

I am running into this error while program is listening:

対象のDeviceIndexを入力してください: 13
Listening...
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

How may I resolve this issue?

Transcription is working badly without internet connection

I've used the project to transcript some text without the internet. Even when openApi proofreading is off, I don't even get full text from my speech after stopping transcription. It returns either first part or last part or strange text from the speech. Can I do something with this? I am using local medium model

Transcription accuracy is low and time to translate is long

I can run your app easily, but the accuracy of the transcription is quite low. I tested with Vietnamese and English on my window laptop (core i5 CPU, 16GB Ram).
Time to processing was also long, not near real-time.

Maybe I'm wrong in settings?

Input device selection

Hello,

I am running into an issue with the second step- selecting the input device.
Is there a way to check my device list without exiting the program? Or, what is the format the prompt is looking for? I have tried different combinations and all have caused an error that exited the program. For reference, I am using ubuntu

Thank you

Permission denied on Windows OS

When I run python -m speech_to_text command，console message prompts the following error:
[Errno 13] Permission denied: 'D:\Works\Whisper\Faster_Whisper\models--guillaumekln--faster-whisper-base\refs\main'

I am running on windows platform，windows 11.

Acknowledgments

Your implementation is very nice!
I thought I would post the issue as a thank you.

I found your implementation very helpful and would like to Dockerize it to help me implement it on my robot.
Very cool how VAD and buffering works.

I immediately quoted a large part of your implementation and imported it into the Docker environment in my repository.
https://github.com/PINTO0309/faster-whisper-env
Operation is very good!

This issue is published only to express our gratitude and you are free to close it of your own free will.
Again, thank you very much. 😸

Incidentally, at the risk of meddling, the current version of faster-whisper v0.6.0 works fine when weights are run over the network.

Could not locate cudnn_ops_infer64_8.dll

Hello,
I'm encountering an error "Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!"
when running the model with CUDA. Using CPU works fine.

Setup:
• GeForce RTX 2070 + 8G
• CUDA Toolkit 11.4 (downloaded from https://developer.nvidia.com/cuda-11-4-0-download-archive)
• cuDNN 11.4-windows-x64-v8.2.2.26 (downloaded from https://developer.download.nvidia.com/compute/redist/cudnn/v8.2.2/
Extracted cudnn DLLs placed in: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\

Environment path confirmed to the dll location.

  where cudnn_ops_infer64_8.dll
  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\cudnn_ops_infer64_8.dll

Would you have any suggestions to fix this error? In case of compatibility issues, could you recommend compatible CUDA and cuDNN versions with download links?

Increasing speed

Fantastic work. Was able to get it up and running easily.

I'm hoping to increase the speed of transcription.

The transcription speed once it hits Whisper seems fine, but I think the lag comes in before that. Do you have any recommendations on how I might improve the speed?

ModuleNotFoundError: No module named 'speech_to_text.openai_api'

C:\Users\Administrator\speech-to-text>python -m speech_to_text
Traceback (most recent call last):
File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Administrator\speech-to-text\speech_to_text_main.py", line 7, in
from .audio_transcriber import AppOptions
File "C:\Users\Administrator\speech-to-text\speech_to_text\audio_transcriber.py", line 15, in
from .openai_api import OpenAIAPI
ModuleNotFoundError: No module named 'speech_to_text.openai_api'
Can you tell me how to fix this? Thank you.

Нужен обучающий ролик.

Предварительно очень нужная программа, но у меня не получается никак сделать так, чтобы она работала. Запускается, да. Но не могу выбрать нужный вариант Audio Device. Выходит, что запить идет, но ничего не происходит. Перепробовал все возможные, как быть?

Извини, если вопрос слишком глупый, но как еще?)

Can the code translate the other langurage

like the title, i hopefully it can be translated into other languages, not just English. But it's too hard for me. How do I change the code?

About runtime errors

nice to meet you. When I tried using the tool, the following error appeared on the UI. What should I do? Thank you.

[Error number 2] No such file or directory: 'C:\Users\username\.cache\huggingface\hub\models--guillaumekln--faster-whisper-medium\refs\ \main'

reriiasu / speech-to-text Goto Github PK

speech-to-text's People

Stargazers

Watchers

Forkers

speech-to-text's Issues

PY_SSIZE_T_CLEAN error

Transcription is working badly without internet connection

Transcription accuracy is low and time to translate is long

Input device selection

Permission denied on Windows OS

Acknowledgments

Could not locate cudnn_ops_infer64_8.dll

Increasing speed

ModuleNotFoundError: No module named 'speech_to_text.openai_api'

Нужен обучающий ролик.

Can the code translate the other langurage

About runtime errors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent