jhj0517 / whisper-webui Goto Github PK

View Code? Open in Web Editor NEW

878.0 9.0 149.0 699 KB

A Web UI for easy subtitle using whisper model.

License: Apache License 2.0

Batchfile 1.87% Python 94.06% Jupyter Notebook 3.04% Dockerfile 0.70% Shell 0.33%

ai open-source python web-ui whisper gradio pytorch

whisper-webui's Introduction

Whisper-WebUI

A Gradio-based browser interface for Whisper. You can use it as an Easy Subtitle Generator!

Notebook

If you wish to try this on Colab, you can do it in here!

Feature

Select the Whisper implementation you want to use between :
- openai/whisper
- SYSTRAN/faster-whisper (used by default)
- Vaibhavs10/insanely-fast-whisper
Generate subtitles from various sources, including :
- Files
- Youtube
- Microphone
Currently supported subtitle formats :
- SRT
- WebVTT
- txt ( only text file without timeline )
Speech to Text Translation
- From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
Text to Text Translation
- Translate subtitle files using Facebook NLLB models
- Translate subtitle files using DeepL API
Pre-processing audio input with Silero VAD.
Post-processing with speaker diarization using the pyannote model.
- To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
  1. https://huggingface.co/pyannote/speaker-diarization-3.1
  2. https://huggingface.co/pyannote/segmentation-3.0

Installation and Running

Prerequisite

To run this WebUI, you need to have git, python version 3.8 ~ 3.10, FFmpeg.
And if you're not using an Nvida GPU, or using a different CUDA version than 12.1, edit the requirements.txt to match your environment.

Please follow the links below to install the necessary software:

git : https://git-scm.com/downloads
python : https://www.python.org/downloads/ ( If your python version is too new, torch will not install properly.)
FFmpeg : https://ffmpeg.org/download.html
CUDA : https://developer.nvidia.com/cuda-downloads

After installing FFmpeg, make sure to add the FFmpeg/bin folder to your system PATH!

Automatic Installation

Download Whisper-WebUI.zip with the file corresponding to your OS from v1.0.0 and extract its contents.
Run install.bat or install.sh to install dependencies. (This will create a venv directory and install dependencies there.)
Start WebUI with start-webui.bat or start-webui.sh
To update the WebUI, run update.bat or update.sh

And you can also run the project with command line arguments if you like to, see wiki for a guide to arguments.

Running with Docker

Build the image

docker build -t whisper-webui:latest .

Run the container with commands

For bash :

docker run --gpus all -d \
-v /path/to/models:/Whisper-WebUI/models \
-v /path/to/outputs:/Whisper-WebUI/outputs \
-p 7860:7860 \
-it \
whisper-webui:latest --server_name 0.0.0.0 --server_port 7860

For PowerShell:

docker run --gpus all -d `
-v /path/to/models:/Whisper-WebUI/models `
-v /path/to/outputs:/Whisper-WebUI/outputs `
-p 7860:7860 `
-it `
whisper-webui:latest --server_name 0.0.0.0 --server_port 7860

VRAM Usages

This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.

According to faster-whisper, the efficiency of the optimized whisper model is as follows:

Implementation	Precision	Beam size	Time	Max. GPU memory	Max. CPU memory
openai/whisper	fp16	5	4m30s	11325MB	9439MB
faster-whisper	fp16	5	54s	4755MB	3244MB

If you want to use an implementation other than faster-whisper, use --whisper_type arg and the repository name.
Read wiki for more info about CLI args.

Available models

This is Whisper's original VRAM usage table for models.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

.en models are for English only, and the cool thing is that you can use the Translate to English option from the "large" models!

TODO🗓

Add DeepL API translation
Add NLLB Model translation
Integrate with faster-whisper
Integrate with insanely-fast-whisper
Integrate with whisperX ( Only speaker diarization part )
Add background music separation pre-processing with MVSEP-MDX23
Add fast api script
Support real-time transcription for microphone

whisper-webui's People

Contributors

Stargazers

Watchers

Forkers

crazyjk olabi damho1104 sidsia111 ddaying codebear16 kishe89 hskimsky lcolok marenectaris hephaex skullface20 delsere-h paulsunnypark caoruidong-1979 jameskim-ai mahsunalkan mistyclawn bombada pjhjohn sskimdev fadedworld jay74jung cafew valleyk ds-chae vulcanidic95 isprometheo f901107 tieck-it zx2390096 smartdolphin maisonde3cochons jmsy2k 2ssoosike coderwpf dnim-laicifitra samuggi miyu4u hangj11 mkang428 draychou darcyg shaneholloman hyojunguy bradbann taeyangsong the-gates-of-zion jsboige visavis2k r2d209git jgomez696 mlnethub mozoloa kelen121 solmyr118 cocktailpeanut moxmoussa sicariussicariistuff moblackwhite mircha anzchy devnumber4 rochemedia noodlea1 sevens2k kcbf ngeniedeveloper smilesmith aierlma imlineco leeyis kontorol youngsun-ai seokyoung-hong pabmar68hotmail kimdoubleb dldidfh blackpeter13 jetdream andyweichen tomchapin nkada kburt63 nowjean mi0308 eunsebi maumboom tutumomo yongsu88 sunnypage antonizdp jmanhype serjik777 rodrigo-rbs sangkyunyoon romico monsterhunters luca1903 p4p4n1ck

whisper-webui's Issues

feature request - Translate into different languages

I tested the project today and it is working really nicly. and I just though would be really usefull if you can add translate into different languages like a small drop down menu next to "Translate to English?"

Issue with large-v3

Which OS are you using?

OS: [e.g. iOS or Windows.. If you are using Google Colab, just Colab.]
Windows 10 (nvidia gpu)

When I select the 'large-v3' model and press the generate button, the following message is displayed, and the process does not continue.

Error transcribing file on line Invalid model size 'large-v3', expected one of: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large C:\Whisper-WebUI\venv\lib\site-packages\torch\cuda\memory.py:329: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats. warnings.warn(

Can we output simple formatted text instead of subtitles ?

Describe the feature you'd like
Can we output simple formatted text instead of subtitles ? I have no use for subtitles, i'd like to have a paragraph of text of the input file with some basic formatting.

T2T translate "Something went wrong Connection errored out."

colab 환경에서 해당 레포 clone 후 진행했습니다.
Youtube 탭에서 srt 자막파일 생성은 정상적으로 작동했습니다.
T2T translate 탭에서 자막파일 업로드 후 실행시 위 사진처럼 나오면서 실패합니다. 에러 메시지도 안뱉어서 뭐가 문제인지를 모르겠어요.. ㅜ

Having problem with installation, venv folder issue?

Windows 11

After I run the Install.bat it says installation was successful, but I think there's something wrong with it

picrel is what I get when I try to run start-webui.bat

notice the "E:\Whisper-WebUI\venv" part
there are 2 of \ instead of 1
Maybe it's my PC's fault but it seems "%~dp0" is not working as intended?
Which is weird because other batch scripts with %~dp0 works correctly

Also I noticed when I tried to reinstall from scratch and run Install.bat
It still says the installation was successful, but it also says couldn't locate Venv folder

error: "Expecting value: line 1 column 1 (char 0)"

error report：reset_max_memory_allocated

Which OS are you using?

OS: [ Windows]
G:\Soft\Whisper-WebUI\venv\lib\site-packages\torch\cuda\memory.py:329: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
warnings.warn(

What does this mean? How to solve it? Thank you.

FP16

Which OS are you using?

OS: [e.g. iOS or Windows.. If you are using Google Colab, just Colab.]
Windows 10

running I keep getting fp16 issues.

tesla m40/ tesla p40/ nvidia 1080ti for testing purposes. might be good to tell the user these cards are not good at fp16

~~avx2 may also play an important role? amd 5/9 series~~

update:

int8
worked as intended :)

int8_float32
worked as intended :)

No event data

I follow the guide to install and convert video or subtitle files, after that I got an error.

Can you tell me how to fix it?

Task exception was never retrieved
future: <Task finished name='93dp2rexo86_0' coro=<Queue.process_events() done, defined at D:\project\subtitle2\Whisper-WebUI\venv\lib\site-packages\gradio\queueing.py:329> exception=AssertionError('No event data')>
Traceback (most recent call last):
File "D:\project\subtitle2\Whisper-WebUI\venv\lib\site-packages\gradio\queueing.py", line 343, in process_events
response = await self.call_prediction(awake_events, batch)
File "D:\project\subtitle2\Whisper-WebUI\venv\lib\site-packages\gradio\queueing.py", line 303, in call_prediction
assert data is not None, "No event data"
AssertionError: No event data

Any option to support d2s dss files?

Is there any option to also put a d2s or dss files? If so how is it possible to add it?

Timings are off on the first few lines

On the first few lines it seems to rapidly put some of the lines, then hold one line for 30 seconds, after that everything is normal. Notice that these lines happen exactly on the second, which makes it already indicate that it's a bug.

Example:

1
00:00:00,000 --> 00:00:01,000
 Okay.

2
00:00:01,000 --> 00:00:02,000
 Delete a flight plan.

3
00:00:02,000 --> 00:00:03,000
 Autopilot off.

4
00:00:03,000 --> 00:00:04,000
 Altitude 3000.

5
00:00:04,000 --> 00:00:05,000
 Okay.

6
00:00:05,000 --> 00:00:32,500
 Contact surface until aboard.

7
00:00:32,500 --> 00:00:38,500
 ♪♪

8
00:00:38,500 --> 00:00:41,039
 -♪♪

9
00:00:41,039 --> 00:00:43,079
 Who is that?

Which OS are you using?

OS: Windows 11

Model v3 selection

Which OS are you using?

OS: [e.g. iOS or Windows.. If you are using Google Colab, just Colab.]
Windows 10

Error: Invalid model size 'large-v3', expected one of: tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large

Translated srt timestamps are rounded to the nearest second

An output SRT file that has been translated all has timings exactly on the second and 000ms. While in the original language does not exactly fall on the second each time.

This with checking the "translate to english" checkbox. Not when using the T2T translation

Which OS are you using?

OS: Windows 11

run error

venv "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\Scripts\Python.exe"

Initializing Model..

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Traceback (most recent call last):
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\lib\site-packages\gradio\routes.py", line 384, in run_predict
output = await app.get_blocks().process_api(
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\lib\site-packages\gradio\blocks.py", line 1024, in process_api
result = await self.call_function(
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\lib\site-packages\gradio\blocks.py", line 836, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\modules\model_Inference.py", line 34, in transcribe_file
audio = whisper.load_audio(fileobj.name)
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\lib\site-packages\whisper\audio.py", line 42, in load_audio
ffmpeg.input(file, threads=0)
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\lib\site-packages\ffmpeg_run.py", line 313, in run
process = run_async(
File "D:\Downloads\SOFTWARE\Whisper\Whsiper-WebUI-master\venv\lib\site-packages\ffmpeg_run.py", line 284, in run_async
return subprocess.Popen(
File "C:\Program Files\Python\lib\subprocess.py", line 969, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files\Python\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

VRAM?

venv "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\venv\Scripts\Python.exe"

Initializing Model..

Traceback (most recent call last):
File "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\app.py", line 24, in
whisper_inf = WhisperInference()
File "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\modules\whisper_Inference.py", line 15, in init
self.model = whisper.load_model(name=DEFAULT_MODEL_SIZE, download_root="models/Whisper")
File "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\venv\lib\site-packages\whisper_init_.py", line 122, in load_model
return model.to(device)
File "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
return self._apply(convert)
File "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "C:\Users\pc\Downloads\Compressed\Whisper-WebUI-master\Whisper-WebUI-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.00 GiB already allocated; 0 bytes free; 7.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Would I have to manually configure the use of another model? Maybe medium? Or is there another solution?

ffmpeg easy install for users

Describe the feature you'd like
A clear and concise description of what you want to happen.
from sad talker

Install Python 3.10.6, checking "Add Python to PATH".
Install git manually (OR scoop install git via scoop).
Install ffmpeg, following this instruction (OR using scoop install ffmpeg via scoop).
Download our SadTalker repository, for example by running git clone https://github.com/Winfredy/SadTalker.git.
Download the checkpoint and gfpgan below↓.
Run start.bat from Windows Explorer as normal, non-administrator, user, a gradio WebUI demo will be started.

Queue multiple files at the same time to generate subtitles all at once.

It would be great to be able to process multiple files at once!

Deactivate Advanced_Parameters

Is there a possibility to not show the "Advanced_Parameters". My goal is to leave it on the default setting, so that people who use the WebUI dont get confused.

Could not find a version that satisfies the requirement torch

Which OS are you using?

OS: MacOS

When I run install.sh, it says

"Could not find a version that satisfies the requirement torch"

open_fodler

https://github.com/jhj0517/Whisper-WebUI/blob/94dc4dc2e5a1204ed10d4c8e28ac6757552d08e2/app.py#LL51C56-L51C56

open_folder 함수를 정상적으로 부르지 못하는 오타가 있습니다.

Unable to save the transcript for a long file name

Colab

By default, the transcript file is saved under the name same as the video title. However, it's unable to save it when the video clip carrying a long title.

Which OS are you using?

OS: [e.g. iOS or Windows.. If you are using Google Colab, just Colab.]

Download Subtitle Issue

Which OS are you using?

OS: Colab

Not able to download subtitle after clicking folder button

Integrate faster models

The large model currently brings good results, but it seems faster versions have emerged here and there, typically through quantization, with similar results, faster inference and a much lower memory footprint.
See for instance:

What would it take to support some of those?

Cant download the file (folder button)

Im using Ubuntu 22.04.3 LTS.

Everything works except when i download (im guessing it is for downloading the file), press the the folder button on the right corner below.

Once i click on it i get following error in the Terminal: "sh: 1: start: not found"

Collapsible Advanced Options & Remember Last Settings

Collapsible Advanced Options: Add a foldable section for advanced prompt settings.

Why: The prompt settings can significantly impact the quality of the generated content. A collapsible section would make it easier to fine-tune these settings without cluttering the UI.

Remember Last Settings: Save last-used settings (model, language) as defaults for the next time.

Why: When I need to queue multiple tasks, I have to open multiple windows and input the same settings repeatedly. This feature would save time and effort. Alternatively, adding a queue feature or a default settings option could also solve this issue.

Recording with Mic doesnt work

Im using Ubuntu 22.04 LTS.

Once i click on the button "Record from microphone", i cant stop the recording, by pressing the button again (nothing happens).
And it also doesnt record and i get following error:

`Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Traceback (most recent call last):
File "/home/b/Whisper-WebUI/venv/lib/python3.10/site-packages/gradio/routes.py", line 439, in run_predict
output = await app.get_blocks().process_api(
File "/home/b/Whisper-WebUI/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1389, in process_api
result = await self.call_function(
File "/home/b/Whisper-WebUI/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/b/Whisper-WebUI/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/b/Whisper-WebUI/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "/home/b/Whisper-WebUI/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/home/b/Whisper-WebUI/venv/lib/python3.10/site-packages/gradio/utils.py", line 704, in wrapper
response = f(*args, **kwargs)
File "/home/b/Whisper-WebUI/modules/faster_whisper_inference.py", line 291, in transcribe_mic
self.remove_input_files([micaudio])
File "/home/b/Whisper-WebUI/modules/base_interface.py", line 19, in remove_input_files
if not os.path.exists(file_path):
File "/usr/lib/python3.10/genericpath.py", line 19, in exists
os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
`

Thanks for the help.

Memory

Hi!
Works very good with Japanese language but,
i have a question: every time after the generation of subtitles - there is less and less memory on the C drive. And it takes 1-2-3GB.
I would like to know where these files are located?

Error transcribing file: DecodingOptions.init() got an unexpected keyword argument 'progress_callback'

Hi!
I have error:
Error transcribing file: DecodingOptions.init() got an unexpected keyword argument 'progress_callback'
when use --disable_faster_whisper=true with all models.

Without --disable_faster_whisper=true it works.

Debian 12.2

Distill whisper

Describe the feature you'd like
A clear and concise description of what you want to happen.

New month new version

https://github.com/huggingface/distil-whisper

Seems faster than it's predecessors

keep having problem with wav file.

had some problem with wav file, and tried to reinstall all.
still have porblem with ffmpeg i guess, so i deleted all and reinstall, make system path correctly, and still have problem.
can i have any help for it to solve?

Traceback (most recent call last):
File "C:\Users\Ha\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py", line 42, in load_audio
ffmpeg.input(file, threads=0)
AttributeError: module 'ffmpeg' has no attribute 'input'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Ha\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\routes.py", line 384, in run_predict
output = await app.get_blocks().process_api(
File "C:\Users\Ha\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1024, in process_api
result = await self.call_function(
File "C:\Users\Ha\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 836, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\Ha\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\Ha\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\Ha\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "D:\Whisper-WebUI-master\modules\whisper_Inference.py", line 38, in transcribe_file
audio = whisper.load_audio(fileobj.name)
File "C:\Users\Ha\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py", line 46, in load_audio
except ffmpeg.Error as e:
AttributeError: module 'ffmpeg' has no attribute 'Error'

Is there an API for collab users?

Describe the feature you'd like
A clear and concise description of what you want to happen.

Automatically Download the Output

Is there a possibilty to configure the script or change something, so that the Output which comes out down below, to automatically download? I want the Output file automatically got created after generating the subititle file.

Thank you.

How to change specific "buttons"

I see that stuff like "No microphone found" or the "Record" button you can find here: Whisper-WebUI/venv/lib/python3.10/site-packages/gradio/templates/frontend/assets/Indexyxyxyxyx

But after changing those the change wont apply. Can you help me there, or am i looking maybe in the wrong folder?

Can't run T2T Translation

Which OS are you using?

OS: Windows 10

Wanted to tinker with T2T Translation to see what it's capable off, but upon loading any NLLB model, the process errors out with this output in the console:

Traceback (most recent call last):
  File "I:\Whisper-WebUI\venv\lib\site-packages\gradio\queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "I:\Whisper-WebUI\venv\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "I:\Whisper-WebUI\venv\lib\site-packages\gradio\blocks.py", line 1570, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "I:\Whisper-WebUI\venv\lib\site-packages\gradio\blocks.py", line 1397, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "I:\Whisper-WebUI\venv\lib\site-packages\gradio\blocks.py", line 1371, in validate_outputs
    raise ValueError(
ValueError: An event handler (translate_file) didn't receive enough output values (needed: 2, received: 1).
Wanted outputs:
    [textbox, file]
Received outputs:
    [None]

...

Doh!, I got the wrong WebUI project. Sorry, you can close/ignore this.

Error transcribing file on line Requested float16 compute type

Which OS are you using?

OS: [e.g. iOS or Windows.. If you are using Google Colab, just Colab.]
I use windows 11 and I always get this problem

Error transcribing file on line Requested float16 compute type, but the target device or backend do not support efficient float16 computation.

what should I do??

[FR] Rest API support

Hi, I'm try to build the workflow that can manage youtube info and transcript bulk of videos when I'm not use my PC.
also I can modify the output vis script.

need help with install

Which OS are you using?

OS: Ubuntu 20.04.1 64Bit.

Installing on ubuntu i got this error message. I'm non very good with python and i was wondering if someone could lend a hand.

./start-webui.sh
venv venv/bin/python

Traceback (most recent call last):
File "app.py", line 5, in
from modules.whisper_Inference import WhisperInference
File "/home/whisper/Scaricati/Whisper-WebUI-master/modules/whisper_Inference.py", line 17, in
class WhisperInference(BaseInterface):
File "/home/whisper/Scaricati/Whisper-WebUI-master/modules/whisper_Inference.py", line 288, in WhisperInference
) -> Tuple[list[dict], float]:
TypeError: 'type' object is not subscriptable

thanks for any help

Large file recognition error

Which OS are you using?

OS: [e.g. iOS or Windows.. If you are using Google Colab, just Colab.]
"Window 11"

Sometimes, when you add a large file or a large amount of files, you can't download it and it keeps uploading.
Can you help me with the above issues?

Silero VAD

First of all, thanks for this project, it's very easy to set up and run locally.

Transcribing on this webui, the large-v2 model skips the first three sentences in a file I tested, just like what happens over here with the Silero VAD turned off : https://huggingface.co/spaces/aadnk/faster-whisper-webui

I guess the VAD is included here (silero_vad.onnx). Is it on by default? Are there any settings I could tweak?

Doesn't use GPU

Which OS are you using?

OS: Windows11
python: 3.10
cuda: 12

When I click to generate subtitle, it won't use GPU at all but only CPUs. Is there other prerequisites or settings that I don't know?

Non-Windows install instructions?

Describe the feature you'd like

I was looking forward to trying this until I got to the part in the README that says:

Run Install.bat from Windows Explorer as a regular, non-administrator user.

Request: Would love to see installation instructions for non-Windows users.

Reason: I'm on a Mac. I also have access to a Linux machine. No Windows.

AMD GPU (GFX803) RX 580

Here is a short description how to use Whisper-WEBUI with older AMD Cards (GFX803) RX 580 and ROCm
https://github.com/viebrix/pytorch-gfx803-for-Whisper

Yesterday I managed to get Whisper (or Whisper-WEBUI) start and running with GPU (GFX803) RX 580 (8GB).

I have tested a 45 minutes mp4 viedeo in language german to transcibe to SRT on my local computer (language setting was also preset to german)

with faster-whisper and CPU mode
with whisper and RX580

Results (which was not really a clean test - because other programs (firefox and so on where running during the test)
1: fasterwhisper with cpu 27.0 minutes 44 seconds!
2: whisper with gpu rx580 8GB (gfx803) 22.0 minutes 54 seconds!

So this is not really a bug report, it's more a solution to a bug report I didn't post some days ago. Because it wasn't easy to use those old AMD GPU with your fantastic software. Also in openai/Whisper Repository they haven't found a solution.

Option for Output File to be named the same as Input Video

If the Subtitle and the Video File have the same name, most video players automatically detect it as a subtitle for the video. Would be convenient to have an option for the Output subtitle file to automatically be given the exact same name as the video file used to generate the subtitles.

Strange space before each sentence in some languages

In some languages there is a strange space before each sentence.
For example, when I transcribe Korean, it looks like this:

1
00:00:00,000 --> 00:00:05,500
 안녕하세요

2
00:00:05,500 --> 00:00:12,439
 반갑습니다

The strange space is generated by the whisper model, not by my WebUI.
So a post-process would be desirable.

Languages currently monitored:
- Korean

Which OS are you using?

OS: Windows

Host the WebUI on HuggingFace Spaces for a Larger Audience

Congratulations on building a cool web UI! It seems like a very handy tool. Well done!

Have you considered showcasing it on HuggingFace Spaces as well? There could be a few benefits for doing so:

Expanded Reach: Introduce your demo to a larger, AI-focused audience.
Seamless Integration: HuggingFace platform offers great compatibility with popular STT/TTS/T-to-T libraries.
Engage the Community: Your project can benefit from a real-time community feedback and interactions via the 'community' section on your Space.

Additionally, our Community GPU Grants might be of interest for a resource support.
We have a step-by-step guide explaining the process of creating a gradio sdk on Spaces, in case you're interested. 😊 This is our docs on Community GPU Grants.

Also wondering if there are any localization options

Change localhost to a different adress

On which file i can change the localhost adress to a specific domain, so that for example other people in the company can use the WebUI?

jhj0517 / whisper-webui Goto Github PK

whisper-webui's Introduction

Whisper-WebUI

Notebook

Feature

Installation and Running

Prerequisite

Automatic Installation

Running with Docker

VRAM Usages

Available models

TODO🗓

whisper-webui's People

Contributors

Stargazers

Watchers

Forkers

whisper-webui's Issues

Recommend Projects

Recommend Topics

Recommend Org