Giter VIP home page Giter VIP logo

speech-to-text's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech-to-text's Issues

PY_SSIZE_T_CLEAN error

I am running into this error while program is listening:

対象のDeviceIndexを入力してください: 13
Listening...
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

How may I resolve this issue?

Transcription is working badly without internet connection

I've used the project to transcript some text without the internet. Even when openApi proofreading is off, I don't even get full text from my speech after stopping transcription. It returns either first part or last part or strange text from the speech. Can I do something with this? I am using local medium model

Transcription accuracy is low and time to translate is long

I can run your app easily, but the accuracy of the transcription is quite low. I tested with Vietnamese and English on my window laptop (core i5 CPU, 16GB Ram).
Time to processing was also long, not near real-time.

Maybe I'm wrong in settings?

Input device selection

Hello,

I am running into an issue with the second step- selecting the input device.
Is there a way to check my device list without exiting the program? Or, what is the format the prompt is looking for? I have tried different combinations and all have caused an error that exited the program. For reference, I am using ubuntu

Thank you

Permission denied on Windows OS

When I run python -m speech_to_text command,console message prompts the following error:
[Errno 13] Permission denied: 'D:\Works\Whisper\Faster_Whisper\models--guillaumekln--faster-whisper-base\refs\main'

I am running on windows platform,windows 11.

Acknowledgments

Your implementation is very nice!
I thought I would post the issue as a thank you.

I found your implementation very helpful and would like to Dockerize it to help me implement it on my robot.
Very cool how VAD and buffering works.

I immediately quoted a large part of your implementation and imported it into the Docker environment in my repository.
https://github.com/PINTO0309/faster-whisper-env
Operation is very good!

This issue is published only to express our gratitude and you are free to close it of your own free will.
Again, thank you very much. 😸

Incidentally, at the risk of meddling, the current version of faster-whisper v0.6.0 works fine when weights are run over the network.

Could not locate cudnn_ops_infer64_8.dll

Hello,
I'm encountering an error "Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!"
when running the model with CUDA. Using CPU works fine.

Setup:
• GeForce RTX 2070 + 8G
• CUDA Toolkit 11.4 (downloaded from https://developer.nvidia.com/cuda-11-4-0-download-archive)
• cuDNN 11.4-windows-x64-v8.2.2.26 (downloaded from https://developer.download.nvidia.com/compute/redist/cudnn/v8.2.2/
Extracted cudnn DLLs placed in: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\

Environment path confirmed to the dll location.

  where cudnn_ops_infer64_8.dll
  C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\cudnn_ops_infer64_8.dll

Would you have any suggestions to fix this error? In case of compatibility issues, could you recommend compatible CUDA and cuDNN versions with download links?

Increasing speed

Fantastic work. Was able to get it up and running easily.

I'm hoping to increase the speed of transcription.

The transcription speed once it hits Whisper seems fine, but I think the lag comes in before that. Do you have any recommendations on how I might improve the speed?

ModuleNotFoundError: No module named 'speech_to_text.openai_api'

C:\Users\Administrator\speech-to-text>python -m speech_to_text
Traceback (most recent call last):
File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Administrator\speech-to-text\speech_to_text_main
.py", line 7, in
from .audio_transcriber import AppOptions
File "C:\Users\Administrator\speech-to-text\speech_to_text\audio_transcriber.py", line 15, in
from .openai_api import OpenAIAPI
ModuleNotFoundError: No module named 'speech_to_text.openai_api'
Can you tell me how to fix this? Thank you.

Нужен обучающий ролик.

Предварительно очень нужная программа, но у меня не получается никак сделать так, чтобы она работала. Запускается, да. Но не могу выбрать нужный вариант Audio Device. Выходит, что запить идет, но ничего не происходит. Перепробовал все возможные, как быть?

Извини, если вопрос слишком глупый, но как еще?)

About runtime errors

nice to meet you. When I tried using the tool, the following error appeared on the UI. What should I do? Thank you.

[Error number 2] No such file or directory: 'C:\Users\username\.cache\huggingface\hub\models--guillaumekln--faster-whisper-medium\refs\ \main'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.