dsymbol / decipher Goto Github PK
View Code? Open in Web Editor NEWEffortlessly add AI-generated transcription subtitles to your videos
License: MIT License
Effortlessly add AI-generated transcription subtitles to your videos
License: MIT License
thanks a lot for this tool!!
I installed it by manually git cloning the repo, and running pip3 install .
the first error I received was:
TypeError: WhisperForConditionalGeneration.__init__() got an unexpected keyword argument 'attn_implementation'
I "fixed" it by removing model_kwargs={"attn_implementation": "sdpa"},
from this line in action.py
the next problem I encountered was AttributeError: module 'torch' has no attribute 'mps'
I "fixed" that one by commenting out these lines in the same file
and then everything worked and I was able to burn the subtitles into the video file! :-) woo
if it helps, I'm using Python 3.11.2 on macos 12.7
thanks again!
Running something like:
decipher transcribe -i ~/video\ with\ spaces.mp4 --model medium
Will return an error since in L#22 there are no single quotes around "input".
It's possible to Generate a srt file with custom file name?
Because I can't change the Generated SRT file name in progress
perform X->X speech recognition
Would love to have an option to bulk-translate to all 111 languages and add the subtitle streams into the container?
I have videos that I'd like to upload to Youtube with all the subtitle tracks possible. For containers, mp4 and mkv both support separate streams
Running decipher transcribe -i E02.mp4 --model small
results in an error:
(base) adam@192 decipher % decipher transcribe -i E02.mp4 --model small Extracting audio file... Traceback (most recent call last): File "/Users/adam/opt/anaconda3/bin/decipher", line 8, in <module> sys.exit(main()) File "/Users/adam/opt/anaconda3/lib/python3.9/site-packages/decipher/__main__.py", line 92, in main transcribe( File "/Users/adam/opt/anaconda3/lib/python3.9/site-packages/decipher/decipher.py", line 18, in transcribe run( File "/Users/adam/opt/anaconda3/lib/python3.9/site-packages/decipher/decipher.py", line 83, in run p = subprocess.run(command, text=True) File "/Users/adam/opt/anaconda3/lib/python3.9/subprocess.py", line 505, in run with Popen(*popenargs, **kwargs) as process: File "/Users/adam/opt/anaconda3/lib/python3.9/subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/Users/adam/opt/anaconda3/lib/python3.9/subprocess.py", line 1821, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg -y -i /Users/adam/Git/decipher/E02.mp4 -vn -acodec copy E02.aac'
Running on MacBook Pro M1
You can use
from static_ffmpef import add_paths
add_paths(weak=True)
To add ffmpeg to your project that will download platform specific ffmpeg for this package if ffmpeg is not already on the system path.
My own package 'transcribe-anything' does this to great effect.
Generally, when making a video, it is often encountered that the narration is not coherent, so it is often necessary to delete the meaningless part. If the video can be edited according to the generated subtitle file, it will greatly reduce the playback time of the video. That is going to be a really useful tool.
AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
I'm trying to generate an SRT file for a video but I get the following error:
RuntimeError: Failed to load audio: ffmpeg version 5.1.1-1ubuntu1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12 (Ubuntu 12.2.0-1ubuntu1)
configuration: --prefix=/usr --extra-version=1ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-shared
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
[aac @ 0x55f18b9979c0] Format aac detected only with low score of 1, misdetection possible!
chuck.s01e02.bluray.1080p.DD5.1.H265-d3g.aac: End of file
Not really sure why. To be clear, I can watch the video without any trouble with audio.
When I was burning generated subtitles into video, an error occurred, prompting me' Conversion failed!' The generated mp4 file is 0kb
Description:
Currently, our project generates subtitles per block automatically in SRT format without any restrictions on length or word count using Decipher. In order to improve the user experience and readability, I would like to suggest if you could implement a feature that allows us to define a max_length or max_words variable, ensuring that the generated subtitles comply with the specified limit.
Something Like that: openai/whisper#314
Hi just used this to transcribe a video - worked great - how do I enable translation (into English)?
Hello,
This is very similar to something I was planning to do.
Wouldn't it be better to generate .srt files instead? So one can manually adjust the errors, fix the size, disable them and so on.
If you want a single file you can separately make a .mkv file and add a subtitle track.
REally love the idea. however for newbie like me, We prefer a GUI to run it. :P , Is it possible? (if you need design help, feel free to contact me )
Thank you!
Tried to transcribe an episode of the rookie
Input #0, matroska,webm, from 'C:\Users\Duckers\Downloads\rookie.mkv':
Metadata:
encoder : libebml v1.4.4 + libmatroska v1.7.1
Duration: 00:43:03.65, start: 0.000000, bitrate: 10197 kb/s
Stream #0:0: Video: h264 (High), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn (default)
Metadata:
BPS : 9555108
DURATION : 00:43:03.623000000
NUMBER_OF_FRAMES: 61945
NUMBER_OF_BYTES : 3085849879
_STATISTICS_WRITING_APP: mkvmerge v74.0.0 ('You Oughta Know') 64-bit
_STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
Stream #0:1(eng): Audio: eac3, 48000 Hz, 5.1(side), fltp, 640 kb/s (default)
Metadata:
BPS : 640000
DURATION : 00:43:03.648000000
NUMBER_OF_FRAMES: 80739
NUMBER_OF_BYTES : 206691840
_STATISTICS_WRITING_APP: mkvmerge v74.0.0 ('You Oughta Know') 64-bit
_STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
Stream #0:2(eng): Subtitle: subrip
Metadata:
BPS : 105
DURATION : 00:42:58.507000000
NUMBER_OF_FRAMES: 1178
NUMBER_OF_BYTES : 34122
_STATISTICS_WRITING_APP: mkvmerge v74.0.0 ('You Oughta Know') 64-bit
_STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
Stream #0:3(eng): Subtitle: subrip (hearing impaired)
Metadata:
title : SDH
BPS : 113
DURATION : 00:42:58.507000000
NUMBER_OF_FRAMES: 1299
NUMBER_OF_BYTES : 36505
_STATISTICS_WRITING_APP: mkvmerge v74.0.0 ('You Oughta Know') 64-bit
_STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
[adts @ 0000026a2c79e600] Only AAC streams can be muxed by the ADTS muxer
[out#0/adts @ 0000026a2c785c80] Could not write header (incorrect codec parameters ?): Invalid argument
Error opening output file rookie.aac.
Error opening output files: Invalid argument
So I have a 3060 and a core i9-12900kf and when im using decipher im getting the error: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
but also this software is absoultely amazing you have done an absolutely amazing job
Hello! Whenever I try to run Decipher on Google Colab, this error message pops up. I think it might be related to your implementation of insanely-fast-whisper, but don't quote me on that (I'm not a coder at all lol):
ImportError Traceback (most recent call last)
in <cell line: 18>()
16 dir = os.getcwd()
17
---> 18 transcribe(
19 input,
20 output_dir if output_dir else "result",
7 frames
/usr/local/lib/python3.10/dist-packages/decipher/action.py in transcribe(video_in, output_dir, model, language, task, batch_size, subs)
75
76 temp_srt = mktemp(suffix=".srt", dir=os.getcwd())
---> 77 audio_to_srt(audio_file, temp_srt, model, task, language, batch_size)
78 os.remove(audio_file)
79 srt_filename = video_in.stem + ".srt"
/usr/local/lib/python3.10/dist-packages/decipher/action.py in audio_to_srt(audio_file, temp_srt, model, task, language, batch_size)
35 print(f"{device.upper()} is being used for this transcription, this process may take a while.")
36
---> 37 pipe = pipeline(
38 "automatic-speech-recognition",
39 model=f"openai/whisper-{model}",
/usr/local/lib/python3.10/dist-packages/transformers/pipelines/init.py in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
903 if isinstance(model, str) or framework is None:
904 model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]}
--> 905 framework, model = infer_framework_load_model(
906 model,
907 model_classes=model_classes,
/usr/local/lib/python3.10/dist-packages/transformers/pipelines/base.py in infer_framework_load_model(model, config, model_classes, task, framework, **model_kwargs)
277
278 try:
--> 279 model = model_class.from_pretrained(model, **kwargs)
280 if hasattr(model, "eval"):
281 model = model.eval()
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
559 elif type(config) in cls._model_mapping.keys():
560 model_class = _get_model_class(config, cls._model_mapping)
--> 561 return model_class.from_pretrained(
562 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
563 )
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3367
3368 config = copy.deepcopy(config) # We do not want to modify the config inplace in from_pretrained.
-> 3369 config = cls._autoset_attn_implementation(
3370 config, use_flash_attention_2=use_flash_attention_2, torch_dtype=torch_dtype, device_map=device_map
3371 )
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in _autoset_attn_implementation(cls, config, use_flash_attention_2, torch_dtype, device_map, check_device_map)
1367 elif requested_attn_implementation in [None, "sdpa"]:
1368 # use_flash_attention_2 takes priority over SDPA, hence SDPA treated in this elif.
-> 1369 config = cls._check_and_enable_sdpa(
1370 config,
1371 hard_check_only=False if requested_attn_implementation is None else True,
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in _check_and_enable_sdpa(cls, config, hard_check_only)
1529 )
1530 if not is_torch_sdpa_available():
-> 1531 raise ImportError(
1532 "PyTorch SDPA requirements in Transformers are not met. Please install torch>=2.1.1."
1533 )
ImportError: PyTorch SDPA requirements in Transformers are not met. Please install torch>=2.1.1.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.