thiojoe / auto-synced-translated-dubs Goto Github PK

Automatically translates the text of a video based on a subtitle file, and also uses AI voice to dub the video, and synced using the subtitle's timings

License: GNU General Public License v3.0

Python 100.00%

auto-synced-translated-dubs's Introduction

Auto Synced & Translated Dubs

Automatically translates the text of a video into chosen languages based on a subtitle file, and also uses AI voice to dub the video, while keeping it properly synced to the original video using the subtitle's timings.

How It Works

If you already have a human-made SRT subtitles file for a video, this will:

Use Google Cloud/DeepL to automatically translate the text, and create new translated SRT files
Use the timings of the subtitle lines to calculate the correct duration of each spoken audio clip
Create text-to-speech audio clips of the translated text (using more realistic neural voices)
Stretch or shrink the translated audio clip to be exactly the same length as the original speech.
- Optional (On by Default): Instead of stretching the audio clips, you can instead do a second pass at synthesizing each clip through the API using the proper speaking speed calculated during the first pass. This slightly improves audio quality.
- If using Azure TTS, this entire step is not necessary because it allows specifying the desired duration of the speech before synthesis
Builds the audio track by inserting the new audio clips at their correct time points. Therefore the translated speech will remain perfectly in sync with the original video.

More Key Features

Creates translated versions of the SRT subtitle file
Batch processing of multiple languages in sequence
Config files to save translation, synthesis, and language settings for re-use
Allows detailed control over how the text is translated and synthesized
- Including: A "Don't Translate" phrase list, a manual translation list, a phoneme pronunciation list, and more

Additional Included Tools

TrackAdder.py: Adds all language audio tracks to a video file
- With ability to merge a sound effects track into each language track
TitleTranslator.py: Translates a YouTube video Title and Description to multiple languages
TitleDescriptionUpdater.py: Uses YouTube API to update the localized titles and descriptions for a YouTube video using output of TitleTranslator.py
SubtitleTrackRemover.py: Uses YouTube API to remove a specific audio track from a YouTube video
TranscriptTranslator.py: Translates an entire transcript of text
TranscriptAutoSyncUploader.py: Using YouTube API, it lets you upload a transcript for a video, then have YouTube sync the text to the video
- You can also upload multiple pre-translated transcripts and have YouTube sync it, assuming the language is supported
YouTube_Synced_Translations_Downloader.py: Using YouTube API, translate the captions of a video into the specified languages, then download the auto-synced subtitle file created by YouTube

Instructions

External Requirements:

ffmpeg must be installed (https://ffmpeg.org/download.html)

Optional External Tools:

Optional: Instead of ffmpeg for audio stretching, you could use the program'rubberband'
- I've actually found ffmpeg works better, but I'll still leave the option for rubberband if you want.
- If using Rubberband, yoou'll need the rubberband binaries. Specifically on [this page]((https://breakfastquay.com/rubberband/), find the download link for "Rubber Band Library v3.3.0 command-line utility" (Pick the Windows or MacOS version depending). Then extract the archive to find:
  - On Windows: rubberband.exe, rubberband-r3.exe, and sndfile.dll
  - On MacOS: rubberband, rubberband-r3
- Doesn't need to be installed, just put the above mentioned files in the same directory as main.py

Setup & Configuration

Download or clone the repo and install the requirements using pip install -r requirements.txt
- I wrote this using Python 3.9 but it will probably work with earlier versions too
Install the programs mentioned in the 'External Requirements' above.
Setup your Google Cloud (See Wiki), Microsoft Azure API access and/or DeepL API Token, and set the variables in cloud_service_settings.ini.
- I recommend Azure for the TTS voice synthesizing because they have newer and better voices in my opinion, and in higher quality (Azure supports sample rate up to 48KHz vs 24KHz with Google).
- Google Cloud is faster, cheaper and supports more languages for text translation, but you can also use DeepL.
Set up your configuration settings in config.ini. The default settings should work in most cases, but read through them especially if you are using Azure for TTS because there are more applicable options you may want to customize.
- This config includes options such as the ability to skip text translation, setting formats and sample rate, and using two-pass voice synthesizing
Finally open batch.ini to set the language and voice settings that will be used for each run.
- In the top [SETTINGS] section you will enter the path to the original video file (used to get the correct audio length), and the original subtitle file path
- Also you can use the enabled_languages variable to list all the languages that will be translated and synthesized at once. The numbers will correspond to the [LANGUAGE-#] sections in the same config file. The program will process only the languages listed in this variable.
- This lets you add as many language presets as you want (such as the preferred voice per language), and can choose which languages you want to use (or not use) for any given run.
- Make sure to check supported languages and voices for each service in their respective documentation.

Usage Instructions

How to Run: After configuring the config files, simply run the main.py script using python main.py and let it run to completion
- Resulting translated subtitle files and dubbed audio tracks will be placed in a folder called 'output'
Optional: You can use the separate TrackAdder.py script to automatically add the resulting language tracks to an mp4 video file. Requires ffmpeg to be installed.
- Open the script file with a text editor and change the values in the "User Settings" section at the top.
- This will label the tracks so the video file is ready to be uploaded to YouTube. HOWEVER, the multiple audio tracks feature is only available to a limited number of channels. You will most likely need to contact YouTube creator support to ask for access, but there is no guarantee they will grant it.
Optional: You can use the separate TitleTranslator.py script if uploading to YouTube, which lets you enter a video's Title and Description, and the text will be translated into all the languages enabled in batch.ini. They wil be placed together in a single text file in the "output" folder.

Additional Notes:

This works best with subtitles that do not remove gaps between sentences and lines.
For now the process only assumes there is one speaker. However, if you can make separate SRT files for each speaker, you could generate each TTS track separately using different voices, then combine them afterwards.
It supports both Google Translate API and DeepL for text translation, and Google, Azure, and Eleven Labs for Text-To-Speech with neural voices.
This script was written with my own personal workflow in mind. That is:
- I use OpenAI Whisper to transcribe the videos locally, then use Descript to sync that transcription and touch it up with corrections.
- Then I export the SRT file with Descript, which is ideal because it does not just butt the start and end times of each subtitle line next to each other. This means the resulting dub will preserve the pauses between sentences from the original speech. If you use subtitles from another program, you might find the pauses between lines are too short.
- The SRT export settings in Descript that seem to work decently for dubbing are 150 max characters per line, and 1 max line per card.
The "Two Pass" synthesizing feature (can be enabled in the config) will drastically improve the quality of the final result, but will require synthesizing each clip twice, therefore doubling any API costs.

Currently Supported Text-To-Speech Services:

Microsoft Azure
Google Cloud
Eleven Labs

Currently Supported Translation Services:

Google Translate
DeepL

For more information on supported languages by service:

For Result Examples See: Examples Wiki Page

For Planned Features See: Planned Features Wiki Page

For Google Cloud Project Setup Instructions See: Instructions Wiki Page

For Microsoft Azure Setup Instructions See: Azure Instructions Wiki Page

auto-synced-translated-dubs's People

Contributors

Stargazers

Watchers

Forkers

parthib22 coderchintu 253ping superfeliz billnature prakash-rokade shifaau9 qiamast sam-tovar laxmannepal 5l1v3r1 seredeep switchalpha rogalikyt zikyfranky bahaaeldeen1999 ceeroblaq phoenix2077 abelkrijgtalles sofiadparamo gabrielmaestre surgeontalus betoxx1 renatofrota iramarfalcao starry8004 clarkjoao yanstan pamskye moaltawil jvsdv muhammedashraf9244 analogpvt vinayrajput05 intelmib rushchang kirillovdv adynr17 edugg djedu28 simaopedros feimpraim alexisnovas cate9021 hadryan hmd83 ehles louderthanthunderx1 folkevil monicaarnaud dereckysany lyrl jiraspiom mehmetsafabenli vanbac91 11jjchina lipe-lx antor44 gabrielknot diegofornalha scatolo osu78 soebb scriptyyy13 a-russkikh dimzeofficial ryanpnayr imdark futurizerush rama1277 ethpony deepansharya1111 erodneycorus spacekingboiiii marcinorlowski googlesearchbot libeerdev toread-jxj qkjin ronivaldo alani1 babyblue26 kuny12345 ebash3 ca6ypo adamkowalski-dev danielrobin13 wendellchi mustafayuce33 anarch-ia mpathy orkunisitmak de30 gorillatwin botheory skyroot yhbbobo moyermk akbarazimifar c0xinha4xisd

auto-synced-translated-dubs's Issues

Having a large subtitles file causes a crash (chunk translation)

The usage of batch translation causes the program to crash due to an list index out of range, as seen here:

...
  File "Auto-Synced-Translated-Dubs\main.py", line 425, in translate_dictionary
    inputSubsDict[key]['translated_text'] = translatedTexts[i]
IndexError: list index out of range

This is due to the program iterating thru the chunks of translated text, and going to the range of the size of the whole list of texts to translate, causing it to crash if there are more than 100 texts in the srt file.

The list will always have 100 chunks of text, but the for loop is going beyond that.

main.py
                   Here's the problem
                           |
                           v
                   ------------------------
424| for i, key in enumerate(inputSubsDict):
425|                    inputSubsDict[key]['translated_text'] = translatedTexts[i]
426|                    # Print progress, ovwerwrite the same line
427|                    print(f' Translated: {key} of {len(inputSubsDict)}', end='\r')

Does not work on machines without a web browser

Since I am running this in a vm without a gui, it can't start a browser.
The program should show a URL that can be pasted in a browser on the laptop.

--- dialog looks like this --

/opt/git/Auto-Synced-Translated-Dubs.local# python3 ./main.py

Please login using the browser window that opened just now.
Waiting for authorization. See message above.

Restrict maximum and minimum speed factor

I just watched your youtube video with Hindi audio track and there were parts of videos where the AI spoke incredibly fast or incredibly slow. I think there needs to be a compromise between millisecond accuracy versus speech speed.

Maybe during the second pass the script can create an array of speed factors and if an element is too high or too low, it can try to average it with it's neighbours. For eg: [ ... 1.05, 1.5, 1.1 ...] would get averaged out to [ ... 1.2, 1.2, 1.2 ...] or something like that.

There should be compatibility with local and open source programs

Why?

The reason why is to stop relying on these cloud servers entirely. It would also reduce the cost for API credits(local does not require credits, and some open source APIs are self-hostable with changeable servers that it would help).

examples of These types of projects

Translation

LibreTranslate is an open-source self-hostable API. Some servers(including the official one) require credits, while others do not.
TranslateLocally and Firefox Translations are local tools that are similar to each other and perform a little bit better.

note: DO NOT CONFUSE "Firefox Translations" WITH To Google Translate"

Text To Speech

Coqui.ai's open source engine works similar to the cloud TTS engines you use. It is a self-hostable API that you could host locally. There are more accurate TTS engines out there(I listed this one to give an example).

Implementation:

Translation

For LibreTranslate, You can use LibreTranslate-py for the Python API.
For TranslateLocally, You could probably run the commands from the website in Python.

TTS

For Coqui.ai, you could use the examples provided

Sections of translated audio slightly overlap.

I'm using the Google APIs for both translation and dubbing. I've tested using multiple languages, including Spanish, Portuguese, and Arabic.

In all of my tests, there is some overlapping of some of the words. It's almost as if sections of audio got combined together but the sections start before the previous one finishes.

Is there some setting I'm missing?

Here is a 4-second example. The Spanish should be "Hay mucha sintaxis. Hay muchas cosas pequeñas que si te equivocas..." In the subtitle file between the first sentence of that text and the second sentence is a section break. That is the part where the overlap occurs.

overlap_spanish.mov

Can someone tell me why can't I run multiple languages separated by a comma? I can only run one language at a time

Program crashes if no output/workingFolder dirs

The program crashes if it finds no output or workingFolder created during execution, a common problem on first-time executions.

...
  File "Auto-Synced-Translated-Dubs\main.py", line 310, in translate_dictionary
    with open(translatedSrtFileName, 'w', encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'output\\video - Spanish - es.srt'

After creating the output directory, it crashes with:

...
  File "Auto-Synced-Translated-Dubs\TTS.py", line 237, in synthesize_text_azure_batch
    for filename in os.listdir('workingFolder'):
FileNotFoundError: [WinError 3] El sistema no puede encontrar la ruta especificada: 'workingFolder'

HTTP 400 from Google (googleapiclient.errors.HttpError:)

Hello!

I've tried using, but I'm getting error 400 from Google.

Any ideias? I've tried reseting my client secret but that didn't worked

Also here is the SRT file:
subtitles.zip

Translating text using Google...
Traceback (most recent call last):
 File "D:\auto-dub\main.py", line 262, in <module>
   process_language(langData)
 File "D:\auto-dub\main.py", line 240, in process_language
   individualLanguageSubsDict = translate.translate_dictionary(individualLanguageSubsDict, langDict, skipTranslation=skipTranslation)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "D:\auto-dub\translate.py", line 169, in translate_dictionary
   ).execute()
     ^^^^^^^^^
 File "C:\Users\pdv\AppData\Local\Programs\Python\Python311\Lib\site-packages\googleapiclient\_helpers.py", line 130, in positional_wrapper
   return wrapped(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "C:\Users\pdv\AppData\Local\Programs\Python\Python311\Lib\site-packages\googleapiclient\http.py", line 938, in execute
   raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://translation.googleapis.com/v3beta1/projects/your-project-name:translateText?alt=json returned "Invalid 'parent'.; Invalid resource name. Missing locations/ part.;  Resource type: location". Details: "Invalid 'parent'.; Invalid resource name. Missing locations/ part.;  Resource type: location"> ```

Apparently doesn't work with free tier + issue with encoding the tts audio files

I tried to run the main.py and i'm using the f0 tier of Azure, but I get this error message for a split second. Failed to submit batch synthesis job: Only "Standard"\ subscriptions for this region of called service are valid. What is the problem here? Removing unnecessary folders then got me this error: no such file or directory

[Feature Request] Add a configuration option for the output directory

Why

Well people like to customize the directory for workflow stuff and to have everything cleanly in one place.

How to implement

Just add an option in the config.ini, parse the variable in the script and pass it in.

If you are very busy, or have no idea on how to implement it, I (or another contributor) cloud do that for you...

Is this feature bloat?

Depends. It cloud be implemented, see if people use it, then remove it if it's unpopular. Also, it doesn't really slow down the script...

MacOS support?

You'll need the binaries for a program called 'rubberband' ( https://breakfastquay.com/rubberband/ ) . Doesn't need to be installed, just put both exe's and the dll file in the same directory as the scripts.

For this information, what should I do for the MacOS for this dependency?

Google Colab Implement

I was trying to implement it in google colab, I don't know about programming, could someone continue to improve it?

https://colab.research.google.com/drive/1ox8rvSKtL1WBplJoeVET7Bv9eC6F-Plh?usp=sharing

ImportError: cannot import name 'parseBool' from partially initialized module 'Scripts.utils' (most likely due to a circular import)

Most likely due to circular imports
shared_imports and utils cycled
I'm a novice how should I solve him

Missing modules in requirements.txt

import numpy as np
ModuleNotFoundError: No module named 'numpy'

break_until_next'

I got following error, how can I fix this, while executing the program
subsDict[str(int(line)-1)]['break_until_next'] = processedTime1 - int(subsDict[str(int(line) - 1)]['end_ms'])

KeyError: '1'

Combine characters does not work on pre-translated subtitles

I always used pre-translated srt files as I wanted full control over the dub, and I have been using version 0.10.0 since yesterday, when I updated to version 0.14.1 and noticed that the number of audio processed was the same as the number of subtitles, resulting in unwanted pauses between 2 or more subtitles that compose a single sentence.

Can this setting be reintegrated?

Adding local workflow and TTS to software

I want to have the option to do this fully locally aside from potential LLM translators in the future. We could implement whisper or https://alphacephei.com/vosk/models and maybe balabolka with microsoft voices or other local alternatives. We could potentially implement it into https://github.com/SubtitleEdit/subtitleedit and maybe use RVC AI for voice conversion.

Cloud IAM permission 'cloudtranslate.generalModels.predict' denied. "

Hi,
I`m getting this error even after I was successfully logged in to my google account and 'token.pickle' got created.

    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://translation.googleapis.com/v3beta1/projects/sasa:translateText?alt=json returned "Cloud IAM permission 'cloudtranslate.generalModels.predict' denied.". Details: "Cloud IAM permission 'cloudtranslate.generalModels.predict' denied. "

Thank you.

Problem using the Batch processing (Azure)

When I try to Batch Process with Azure, it gives me this error.

Translating text using Google...

Waiting for Azure batch synthesis job to finish. Status: [NotStarted]
ERROR: Batch synthesis job failed!
Reason:OK
Traceback (most recent call last):
  File "F:\Auto-Synced-Translated-Dubs-0.14.1\main.py", line 281, in <module>
    process_language(langData, processedCount, totalLanguages)
  File "F:\Auto-Synced-Translated-Dubs-0.14.1\main.py", line 267, in process_language
    individualLanguageSubsDict = audio_builder.build_audio(individualLanguageSubsDict, langDict, totalAudioLength, config['two_pass_voice_synth'])
  File "F:\Auto-Synced-Translated-Dubs-0.14.1\Scripts\audio_builder.py", line 76, in build_audio
    rawClip = AudioSegment.from_file(value['TTS_FilePath'], format="mp3", frame_rate=int(config['synth_sample_rate']))
KeyError: 'TTS_FilePath'

F:\Auto-Synced-Translated-Dubs-0.14.1>

Just for information, I am on the standard subscription, not the free one. So, I tried to deactivate the Azure Batch process, and it shows me the error:

Traceback (most recent call last):
  File "F:\Auto-Synced-Translated-Dubs-0.14.1\main.py", line 281, in <module>
    process_language(langData, processedCount, totalLanguages)
  File "F:\Auto-Synced-Translated-Dubs-0.14.1\main.py", line 267, in process_language
    individualLanguageSubsDict = audio_builder.build_audio(individualLanguageSubsDict, langDict, totalAudioLength, config['two_pass_voice_synth'])
  File "F:\Auto-Synced-Translated-Dubs-0.14.1\Scripts\audio_builder.py", line 76, in build_audio
    rawClip = AudioSegment.from_file(value['TTS_FilePath'], format="mp3", frame_rate=int(config['synth_sample_rate']))
  File "C:\Users\Brown\AppData\Local\Programs\Python\Python310\lib\site-packages\pydub\audio_segment.py", line 773, in from_file
    raise CouldntDecodeError(
pydub.exceptions.CouldntDecodeError: Decoding failed. ffmpeg returned error code: 1

Output from ffmpeg/avlib:

ffmpeg version 6.0-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libjxl --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libvpl --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
[mp3 @ 000001d67b755900] Failed to read frame size: Could not seek to 1026.
workingFolder\1.mp3: Invalid argument


F:\Auto-Synced-Translated-Dubs-0.14.1>python main.py

And it ends up happening that Azure creates 0kb .mp3 audio files, and the error that they had fixed in fix #17 reappears. For information, I am using the latest version of ASTD 0.14.1.

I tried changing add_line_buffer_milliseconds = 0 to add_line_buffer_milliseconds = 1. Because I saw it was a possible error cause, but it still didn't work, and the issue with empty audio files persists.

Azure gets empty audios when add_line_buffer_milliseconds is different than 0 and batch_tts_synthesize is False

Azure returns empty audio files (0 bytes) when using the option add_line_buffer_milliseconds with a value different than 0, and using batch_tts_synthesize=True under config.ini.

The reason is a typo inside the code with an error in the ssml, causing a parsing error on Azure API and therefore, returning invalid mp3 files.

HTTPError during runninf main.py

Hi!
Could someone help me with this error please?

----- Beginning Processing of Languages -----
----- Beginning Processing of Language: es-MX -----
Translating text using Google...

raise HttpError(resp, content, uri=self.uri)

googleapiclient.errors.HttpError: <HttpError 400 when requesting https://translation.googleapis.com/v3beta1/projects/sasa:translateText?alt=json returned "Empty request.". Details: "[{'@type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'contents', 'description': '_****No text contents provided.'}]}]">_****

add voice cloning

would be cool.
i think they have a api and they write some stuff about different languages on there web page you probably have to ask but the voices sound extremely good. https://beta.elevenlabs.io/
the demo has 10.000 free char of voice cloning

skip translation

I don't want to translate the subtitles. I only want the project to read out the subtitles with Azure. So, I set skip_translation = True in config.ini file.
In my expectation, the project then won't ask me to provide the Google API key (cloud_secrets_file.json) and the DeepL API key, but it still require me to provide the Google API or the DeepL API.
Would the author of this project please help me?

HTTP Error 403

I got following problem:

googleapiclient.errors.HttpError: <HttpError 403 when requesting https://translation.googleapis.com/v3beta1/projects/third-framing-374214:translateText?alt=json returned "Cloud Translation API has not been used in project 234603848874 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/translate.googleapis.com/overview?project=234603848874 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.". Details: "[{'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Google developers console API activation', 'url': 'https://console.developers.google.com/apis/api/translate.googleapis.com/overview?project=234603848874'}]}, {'@type': 'type.googleapis.com/google.rpc.ErrorInfo', 'reason': 'SERVICE_DISABLED', 'domain': 'googleapis.com', 'metadata': {'service': 'translate.googleapis.com', 'consumer': 'projects/234603848874'}}]">

Consider adapting the translation to the duration

Often what is done when translating something to be dubbed is adjusting the translation - not the speed - to cater to audio constraints. It's often possible to say things in multiple ways - meme for reference - still keeping the original meaning or something close to it.

There are articles about controlling the output length of a machine translation such this one (I remember reading other but I could not find it, but I found one better in the Edit below) which can be researched for that.

One idea I had and tested is using the fact many libraries - such as Hugging Face's Transformers - support providing multiple choices for translations. An example: I used the "Helsinki-NLP/opus-mt-tc-big-itc-itc" model, set num_return_sequences=5 to make it return 5 translations and translated "Ok" (like the meme) to Spanish, then it returned "De acuerdo.", "Está bien.", "Bien.", "Muy bien." and "De acuerdo", which are mostly correct translations (well, at least from what I know of when I studied Spanish a long time ago; by the way the last translation is just the first without a period).

One downside of this idea is that it restricts models to only models supported by the library and someone might prefer a proprietary translation model instead, then one possibility is using a Summarization model to at least avoid the case of having to speed up the dub voice in order to read a way too long translation. Note that I don't tried this yet and there is a chance those models might not work well summarizing small sentences.

Edit: this 2021 paper from Amazon AI addresses a lot of things related to this project. Its references are quite good too.

[Feature request] Add Lithuanian language

Add Lithuanian language for my fellow Lithuanian speakers, please.

Failed to submit batch synthesis job

[04/28/2023 11:05:10 PM Central Daylight Time] Failed to submit batch synthesis job: {
"statusCode": 401,
"value": {
"code": "Unauthorized",
"message": "Authentication is required to access the resource."
}
}
Traceback (most recent call last):
File "c:\Laurence\main.py", line 282, in
process_language(langData, processedCount, totalLanguages)
File "c:\Laurence\main.py", line 268, in process_language
individualLanguageSubsDict = audio_builder.build_audio(individualLanguageSubsDict, langDict, totalAudioLength, config['two_pass_voice_synth'])
File "c:\Laurence\Scripts\audio_builder.py", line 76, in build_audio
rawClip = AudioSegment.from_file(value['TTS_FilePath'], format="mp3", frame_rate=int(config['synth_sample_rate']))
KeyError: 'TTS_FilePath'

skip translation and do synthesizing only still impossible

I set skip_translation=true, I want it to synthesize with Azure, and run the main.py, then it gives the following message:
------- 'Auto Synced Translated Dubs' script by ThioJoe - Release version 0.13.0 -------

     ----- [!] Error: client_secrets.json file not found -----

----- Did you create a Google Cloud Platform Project to access the API? -----

Press Enter to Exit...

Add bark for very good local multilingual TTS

https://github.com/suno-ai/bark

and the converted version that can generate longer audio then 14s

https://github.com/JonathanFly/bark

in future it will support voice cloning and at the moment it works in many languages and it sounds very good

it would be cool to add this

The dubbing is not respecting the pause that exists in the subtitles.

I'm a Brazilian content creator and I'm using Google, I couldn't get Azure to work.

I generated 3 voiceovers, English, Spanish and Chinese to test. The quality is pretty cool, but when checking the editor's timeline, in some moments the subtitle pause time is respected, but in others, it is completely ignored, and it is very out of sync with what is being shown in the video . There are several moments in the video where there is no narration and the dubbing did not respect these times.

In Chinese for example, he respected the pauses 3 times in the entire video, which was totally out of sync.

where should I put the pre-translated SRT file or what do I do wrong with something else

I put the pre-translated SRT file in the directory same as main.py, and also copy it into workingfolder. Then it gives me the following message:
------- 'Auto Synced Translated Dubs' script by ThioJoe - Release version 0.13.1 -------

----- Beginning Processing of Languages -----

----- Beginning Processing of Language (1/1): zh-CN -----
Skip translation enabled. Checking for pre-translated subtitles...
Pre-translated subtitles not found for language: zh-CN. Skipping.

Please login using the browser window that opened just now. Error: Client secrets must be for a web or installed app.

Please login using the browser window that opened just now.

Traceback (most recent call last):
File "C:\Users\lzb\Auto-Synced-Translated-Dubs-0.14.1\Scripts\auth.py", line 147, in first_authentication
GOOGLE_TTS_API, GOOGLE_TRANSLATE_API = get_authenticated_service() # Create authentication object
File "C:\Users\lzb\Auto-Synced-Translated-Dubs-0.14.1\Scripts\auth.py", line 105, in get_authenticated_service
flow = InstalledAppFlow.from_client_secrets_file(secrets_file, scopes=API_SCOPES)
File "C:\Users\lzb\AppData\Local\Programs\Python\Python39\lib\site-packages\google_auth_oauthlib\flow.py", line 201, in from_client_secrets_file
return cls.from_client_config(client_config, scopes=scopes, **kwargs)
File "C:\Users\lzb\AppData\Local\Programs\Python\Python39\lib\site-packages\google_auth_oauthlib\flow.py", line 159, in from_client_config
raise ValueError("Client secrets must be for a web or installed app.")
ValueError: Client secrets must be for a web or installed app.

[!!!] Error: Client secrets must be for a web or installed app.

Error: Something went wrong during authentication. Try deleting the token.pickle file.
Press Enter to Exit...

[Feature Idea] overlay generated voices on original track

A lot of german media does translation in an interesting way.
The original audio gets reduced in volume and the translated audio gets overlayed above that.

I really like this method and it could save a lot of API calls for the TTS services.

This method would even keep some of the backgroud sounds of the original audio to at least some extend.

Example

You have an audio section and translated audio file in the video that is looks like this:

O: Original
T: Translation

O: |----------------------------|
T: |--------------|

To merge these Tracks without using the twopass method could look something like this:

O: |---v------------------^----|
T:       |--------------|

v: turn down the volume
^: turn the audio back up

Possible problems

One Problem could be when the translated audio is longer than the original track.
This would still need a second pass to the TTS API but it would reduce the amound of calls to the API saving money.

Config

Some of the configuration options could be something like this:

whether the generated audio is centered on the original audio
what offset the start of this audio has from the start of the section (most of the time it has a short delay when I have seen this method)
behaviour for when the translation is longer than the original section
volume of the turned down audio

Existing examples

This was done for stuff like Top Gear, Pawn Stars and many documentaries. At least those come to memory for me.
One example is something like this: https://www.youtube.com/watch?v=121t4E3EM48 (first thing I found while searching)

Some Azure dll error

Good day. I get this when I try to run the script (I tried to follow all the requirements and instructions as close as possible). My Python version is 3.9 as the readme.md of this repo suggested.

------- 'Auto Synced Translated Dubs' script by ThioJoe - Release version 0.7.0 -------
Traceback (most recent call last):
  File "C:\Users\Andrey\Downloads\Auto-Synced-Translated-Dubs-main\main.py", line 13, in <module>
    import TTS
  File "C:\Users\Andrey\Downloads\Auto-Synced-Translated-Dubs-main\TTS.py", line 6, in <module>
    import azure.cognitiveservices.speech as speechsdk
  File "C:\Users\Andrey\AppData\Local\Programs\Python\Python39\lib\site-packages\azure\cognitiveservices\speech\__init__.py", line 8, in <module>
    from .speech import *
  File "C:\Users\Andrey\AppData\Local\Programs\Python\Python39\lib\site-packages\azure\cognitiveservices\speech\speech.py", line 13, in <module>
    from .interop import (
  File "C:\Users\Andrey\AppData\Local\Programs\Python\Python39\lib\site-packages\azure\cognitiveservices\speech\interop.py", line 20, in <module>
    _sdk_lib = load_library.LoadLibrary(lib_path)
  File "C:\Users\Andrey\AppData\Local\Programs\Python\Python39\lib\ctypes\__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "C:\Users\Andrey\AppData\Local\Programs\Python\Python39\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\Andrey\AppData\Local\Programs\Python\Python39\lib\site-packages\azure\cognitiveservices\speech\Microsoft.CognitiveServices.Speech.core.dll' (or one of its dependencies). Try using the full path with constructor syntax.```
What can I do about this?

[Feature Suggestion/Bug Fix] Manual Review

In some cases, auto-translation get message REALLY wrong, for example RJ45 can get translated as Rio de Janeiro 45, so a manual review option would be great!

Problem with synchronization

Hello!

I present a problem with the synchronization of the original audio and the translated audio.

Some languages are much more concise than others. For example, English vs Spanish. Normally more words are needed to express something in Spanish than it takes to say the same thing in English.

I give a concrete example:

1
00:00:00,100 --> 00:00:06,195
¿Qué creéis que ocurriría si mezclamos los personajes de Poppy Playtime con los personajes de Rainbow Friends?

As you can see, this phrase needs 6 seconds to be spoken. However, the translation would be this:

1
00:00:00,100 --> 00:00:06,195
What do you think would happen if we mix Poppy Playtime with the Rainbow Friends characters?

And the resulting unsynced audio takes 4 seconds:

1.mp3.zip

When synchronizing the English audio it is forced to 6 seconds which produces a very unsatisfactory result (very slow).

1.mp3.zip

Would there be a way to tell the program that, in the case of such a difference between the times of the languages, apply a maximum of X seconds between the original and the translation? For example max_diff = 1000ms. In this way the audio in English would last 6 seconds instead of 7 and it would not be so weird.

This is especially useful in videos where there is no camera and synchronization is not strictly necessary (if there are empty gaps without audio, the video could be clipped).

Best!

Error after setting batch_tts_synthesize = True

Failed to submit batch synthesis job: {
"code": "Forbidden",
"message": "Only "Standard" subscriptions for the region of the called service are valid."
}
Traceback (most recent call last):
File "E:\01project\12youtube\02program\Auto-Synced-Translated-Dubs\main.py", line 262, in
process_language(langData)
File "E:\01project\12youtube\02program\Auto-Synced-Translated-Dubs\main.py", line 253, in process_language
individualLanguageSubsDict = audio_builder.build_audio(individualLanguageSubsDict, langDict, totalAudioLength, twoPassVoiceSynth)
File "E:\01project\12youtube\02program\Auto-Synced-Translated-Dubs\audio_builder.py", line 100, in build_audio
rawClip = AudioSegment.from_file(value['TTS_FilePath'], format="mp3", frame_rate=nativeSampleRate)
KeyError: 'TTS_FilePath'

I use the latest code. The project can work well when batch_tts_synthesize is set False.
Please check it.
Thank you.

KeyError: 'google_project_id'

in "cloud service settings" change the values of "google_project_id" and "your-project-name" to those of my account, I guess I did it right but I don't know why I don't get that

Empty audio clips

The biggest issue, Azure creates empty audio clips, but not all of them. I updated the code according to #17, disabled batch_tts_synthesize and set add_line_buffer_milliseconds to 0.
On the side, all the clips are downloaded outside the workingFolder and then never deleted.

Here is a screenshot of the terminal:

And the app folder looks like this afterwards

Can we simplfy the install instructions on the front page?

On the external requirements:
"You'll need the binaries for a program called 'rubberband' ( http://breakfastquay.com/rubberband/ ). Doesn't need to be installed, just put both exe's and the dll file in the same directory as the scripts.

I need more context than what is given. First there are no exe's or dll file in the download provided here and I don't know what scripts are being implied to put with them if I'm over looking them.

Add parallel to trimm audio

One of the slowest moments
You shouldn't use only one cpu's core

Adding support for tortoise-tts to clone voices

https://github.com/neonbjb/tortoise-tts
this tts support voice cloning I wonder if it's possible to modify this repo to use this tts ?

if anyone able to do this it will be great

--- Thanks in advance to whoever add this

Azure second pass fails because of undeclared variable

keyIndex is not being fetched when secondPass is True in TTS.py. This causes a crash if second pass is enabled and using Azure.

Donot translate not works for hindi and other Indian languages

Hello, When I put some English text in dont_translate_phrases.txt , it does not get translated to any language. But when I want to translate to Hindi and other Indian languages, it does not skip the translation. I am putting the Hindi text in dont_translate_phrases.txt file. Is there something you can help with it?

Created a google platform project but it is still returning an error

I have made a Google platform project on the google cloud console but it is returning an error. I pasted the project ID and was unable to do the billing. I don't know how to correct this.

    ----- [!] Error: client_secrets.json file not found -----

----- Did you create a Google Cloud Platform Project to access the API? -----

Access blocked: This app's request is invalid

After running main.py, it takes me to the Google login screen with the following error: "You cannot sign in because this app has sent an invalid request. You can try again later or contact the developer to fix the problem. Learn more about this error. Error 400: redirect_uri_mismatch"

Better audio speed

The youtube playback speed uses the default browser, so in your case, it is Chrome, you can look at chromium (the open source part of the browser) and see what they are using...

https://www.chromium.org/developers/design-documents/video/

But corting short, they are using ffmpeg which is open source and can be used to improve the audio quality.

Error in subtitles English translation

Hi,

I'm trying to generate audios from Spanish to English, Portuguese, Russian and more and I'm having issues with the English translation. Only with the English version.

First lines of the original subtitles don't be translated in the first lines of the English .srt and it causes two situations: First TTS starts at an incorrect timestamp and last TTS finishs at the original timestamp. I show you (I will put errors with asterisks):

**00:00:07,958** --> 00:00:11,773
and while we were trying to steal the cheese from Mickey.exe

2
00:00:11,993 --> 00:00:17,078
Today while browsing Roblox, this game by Hungry Nora was recommended to me

(...)

115
00:09:54,599 --> 00:09:56,349
remember that if you have arrived new, subscribe and activate

116
00:09:56,349 --> **00:00:07,738**
the bell. a big hug and see you in the next videos guys Goodbye! **In the last days we have had enough complications while we were trying to steal Peppa Pig's food**

"Goodbye" is the last word in the original version, and "In the last days..." is the first sentence.

Original:

1
00:00:00,221 --> 00:00:04,194
En los últimos días hemos tenido bastantes complicaciones

2
00:00:04,454 --> 00:00:07,738
mientras intentábamos robarle la comida a Peppa Pig

(...)

261
00:09:57,793 --> 00:09:59,396
y nos vemos en los siguientes videos chicos

262
00:10:00,210 --> 00:10:01,407
Adios!

Am I doing something wrong?

Thanks!

Clone Voice?

I've to admit this programm is absolutely gold.
I also discovered an AI website that, in addition to dubbing the audio, also clones the original voice in order to maintain tones and even pronunciation defects (i don't know if I can leave the link here).
What about add this feature too?

Subprocess.CalledProcessError

Good morning!

Sorry for the question that surely is simple.

I would like to know what is wrong here? I think I have a problem in "totalAudioLength = get_duration(originalVideoFile)"

So, I assume that I inserted the wrong path to the file.

I tried in these ways:
original_video_file_path = E:\VideosTranslation\video.mp4
srt_file_path = E:\VideosTranslation\subtitles.srt

original_video_file_path = "E:\VideosTranslation\video.mp4"
srt_file_path = "E:\VideosTranslation\subtitles.srt"

or even leaving it in the same file as main.py and putting simply :

original_video_file_path = video.mp4
srt_file_path = subtitles.srt

But I always have the same error. Is it a problem with some other configuration?

Just to mention that I don't know anything about programming or Github either. Possibly I am thinking that it is an "X" error and in reality it has nothing to do

thiojoe / auto-synced-translated-dubs Goto Github PK

auto-synced-translated-dubs's Introduction

Auto Synced & Translated Dubs

How It Works

More Key Features

Additional Included Tools

Instructions

External Requirements:

Optional External Tools:

Setup & Configuration

Usage Instructions

Additional Notes:

Currently Supported Text-To-Speech Services:

Currently Supported Translation Services:

For more information on supported languages by service:

For Result Examples See: Examples Wiki Page

For Planned Features See: Planned Features Wiki Page

For Google Cloud Project Setup Instructions See: Instructions Wiki Page

For Microsoft Azure Setup Instructions See: Azure Instructions Wiki Page

auto-synced-translated-dubs's People

Contributors

Stargazers

Watchers

Forkers

auto-synced-translated-dubs's Issues

Why?

examples of These types of projects

Translation

note: DO NOT CONFUSE "Firefox Translations" WITH To Google Translate"

Text To Speech

Implementation:

Translation

TTS

Why

How to implement

Is this feature bloat?

Example

Possible problems

Config

Existing examples

Recommend Projects

Recommend Topics

Recommend Org