manimcommunity / manim-voiceover Goto Github PK

View Code? Open in Web Editor NEW

162.0 162.0 20.0 1.12 MB

Manim plugin for all things voiceover

Home Page: https://voiceover.manim.community/en/stable

License: MIT License

Python 100.00%

ai manim math-animations speech-synthesis text-to-speech tts voice-synthesis voiceover

manim-voiceover's People

Contributors

Stargazers

Watchers

Forkers

pesho-ivanov anuragd-27 gschramm o-alexandre-felipe mdarshad928 pranavmaneriker mkuehne-git kariuki-kithinji germanzhu mkali-weizmann haochenuw inoculate23 myfefilmora mohit2152sharma cow-berry sueskind mtcelestema alexocculate maxwin-z

manim-voiceover's Issues

Remove bookmark tags from subcaptions automatically

Description of proposed feature

The bookmark tag <bookmark> appears in the subtitles.

It would be good to automatically strip these bookmark tags.

How can the new feature be used?

manim-voiceover should automatically strip any <bookmark> tag before generating the subcaptions.

Additional comments

A workaround is to use the subcaption parameter to add a caption that does not include the bookmark tag.

Manim crashes when using voiceover with option -s (==save_last_frame)

Description of bug / unexpected behavior

I have a scene file manim-voiceover-issue.py using voice over. If I render it with the -s option (== save_last_frame) an exception is thrown (see Log section).

Root Cause

The configuration item config.output_file is not yet set when write_subcaption_file is invoked. In case of -s it will be set at the very end of the processing chain. If you render a Scene instead of VoiceoverScene the self.subcaption is None, avoiding the issue.

BTW: I am wondering how config.output_file can safely be configured by the user, since it is simply overwritten by the code.

Solution Proposal

I have changed the VoiceoverScene as follows, and this change seems to fix the issue for me.

class VoiceoverScene(Scene):
    """A scene class that can be used to add voiceover to a scene."""

    speech_service: SpeechService
    current_tracker: Optional[VoiceoverTracker]
    create_subcaption: bool
    create_script: bool

    def set_speech_service(
        self,
        speech_service: SpeechService,
        create_subcaption: bool = True,
    ) -> None:
        """Sets the speech service to be used for the voiceover. This method
        should be called before adding any voiceover to the scene.

        Args:
            speech_service (SpeechService): The speech service to be used.
            create_subcaption (bool, optional): Whether to create subcaptions for the scene. Defaults to True. If `config.save_last_frame` is True, the argument is
            ignored and no subcaptions will be created.
        """
        self.speech_service = speech_service
        self.current_tracker = None
        if config.save_last_frame:
            self.create_subcaption = False
        else:
            self.create_subcaption = create_subcaption

Expected behavior

No exception shall be thrown. And the final image shall be rendered.

It is even questionable if the mp3 should be created.

(.venv-3.9) [mk@archlinux media]$ tree
.
|-- images
|   `-- manim-voiceover-issue
`-- voiceovers
    |-- cache.json
    `-- this-circle-is-drawn-as-i-speak-73a82f70.mp3

4 directories, 2 files

How to reproduce the issue

Run the following scene with manim -pql manim-voiceover-issue.py VoiceScene -s -v DEBUG.

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.gtts import GTTSService

config.disable_caching = True

class VoiceScene(VoiceoverScene):
    def construct(self):
        self.set_speech_service(GTTSService())
        with self.voiceover(text="This circle is drawn as I speak."):
            self.play(Create(Circle()))
        self.wait()

class StandardScene(Scene):
    def construct(self):
        config.save_last_frame=True
        self.play(Create(Circle()))
        self.wait()

Additional media files

Images/GIFs

Logs

Terminal output

Render with -s, exception occurs

(.venv-3.9) [mk@archlinux gist]$ manim -pql manim-voiceover-issue.py VoiceScene -s -v DEBUG
Manim Community v0.17.3
[09/04/23 15:43:41] DEBUG    Skipping animation 0                                                                                                                                                                                                                                                                            cairo_renderer.py:63
                    DEBUG    List of the first few animation hashes of the scene: [None]                                                                                                                                                                                                                                     cairo_renderer.py:87
                    DEBUG    Animation with empty mobject                                                                                                                                                                                                                                                                        animation.py:174
                    DEBUG    Skipping animation 1                                                                                                                                                                                                                                                                            cairo_renderer.py:63
                    DEBUG    List of the first few animation hashes of the scene: [None, None]                                                                                                                                                                                                                               cairo_renderer.py:87
                    DEBUG    Animation with empty mobject                                                                                                                                                                                                                                                                        animation.py:174
                    DEBUG    Skipping animation 2                                                                                                                                                                                                                                                                            cairo_renderer.py:63
                    DEBUG    List of the first few animation hashes of the scene: [None, None, None]                                                                                                                                                                                                                         cairo_renderer.py:87
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/mk/dev/manim/.venv-3.9/lib/python3.9/site-packages/manim/cli/render/commands.py:115 in     │
│ render                                                                                           │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/mk/dev/manim/.venv-3.9/lib/python3.9/site-packages/manim/scene/scene.py:233 in render      │
│                                                                                                  │
│    230 │   │   │   return True                                                                   │
│    231 │   │   self.tear_down()                                                                  │
│    232 │   │   # We have to reset these settings in case of multiple renders.                    │
│ ❱  233 │   │   self.renderer.scene_finished(self)                                                │
│    234 │   │                                                                                     │
│    235 │   │   # Show info only if animations are rendered or to get image                       │
│    236 │   │   if (                                                                              │
│                                                                                                  │
│ /home/mk/dev/manim/.venv-3.9/lib/python3.9/site-packages/manim/renderer/cairo_renderer.py:259 in │
│ scene_finished                                                                                   │
│                                                                                                  │
│   256 │   def scene_finished(self, scene):                                                       │
│   257 │   │   # If no animations in scene, render an image instead                               │
│   258 │   │   if self.num_plays:                                                                 │
│ ❱ 259 │   │   │   self.file_writer.finish()                                                      │
│   260 │   │   elif config.write_to_movie:                                                        │
│   261 │   │   │   config.save_last_frame = True                                                  │
│   262 │   │   │   config.write_to_movie = False                                                  │
│                                                                                                  │
│ /home/mk/dev/manim/.venv-3.9/lib/python3.9/site-packages/manim/scene/scene_file_writer.py:468 in │
│ finish                                                                                           │
│                                                                                                  │
│   465 │   │   │   target_dir = self.image_file_path.parent / self.image_file_path.stem           │
│   466 │   │   │   logger.info("\n%i images ready at %s\n", self.frame_count, str(target_dir))    │
│   467 │   │   if self.subcaptions:                                                               │
│ ❱ 468 │   │   │   self.write_subcaption_file()                                                   │
│   469 │                                                                                          │
│   470 │   def open_movie_pipe(self, file_path=None):                                             │
│   471 │   │   """                                                                                │
│                                                                                                  │
│ /home/mk/dev/manim/.venv-3.9/lib/python3.9/site-packages/manim/scene/scene_file_writer.py:729 in │
│ write_subcaption_file                                                                            │
│                                                                                                  │
│   726 │                                                                                          │
│   727 │   def write_subcaption_file(self):                                                       │
│   728 │   │   """Writes the subcaption file."""                                                  │
│ ❱ 729 │   │   subcaption_file = Path(config.output_file).with_suffix(".srt")                     │
│   730 │   │   subcaption_file.write_text(srt.compose(self.subcaptions), encoding="utf-8")        │
│   731 │   │   logger.info(f"Subcaption file has been written as {subcaption_file}")              │
│   732                                                                                            │
│                                                                                                  │
│ /usr/lib/python3.9/pathlib.py:1082 in __new__                                                    │
│                                                                                                  │
│   1079 │   def __new__(cls, *args, **kwargs):                                                    │
│   1080 │   │   if cls is Path:                                                                   │
│   1081 │   │   │   cls = WindowsPath if os.name == 'nt' else PosixPath                           │
│ ❱ 1082 │   │   self = cls._from_parts(args, init=False)                                          │
│   1083 │   │   if not self._flavour.is_supported:                                                │
│   1084 │   │   │   raise NotImplementedError("cannot instantiate %r on your system"              │
│   1085 │   │   │   │   │   │   │   │   │     % (cls.__name__,))                                  │
│                                                                                                  │
│ /usr/lib/python3.9/pathlib.py:707 in _from_parts                                                 │
│                                                                                                  │
│    704 │   │   # We need to call _parse_args on the instance, so as to get the                   │
│    705 │   │   # right flavour.                                                                  │
│    706 │   │   self = object.__new__(cls)                                                        │
│ ❱  707 │   │   drv, root, parts = self._parse_args(args)                                         │
│    708 │   │   self._drv = drv                                                                   │
│    709 │   │   self._root = root                                                                 │
│    710 │   │   self._parts = parts                                                               │
│                                                                                                  │
│ /usr/lib/python3.9/pathlib.py:691 in _parse_args                                                 │
│                                                                                                  │
│    688 │   │   │   if isinstance(a, PurePath):                                                   │
│    689 │   │   │   │   parts += a._parts                                                         │
│    690 │   │   │   else:                                                                         │
│ ❱  691 │   │   │   │   a = os.fspath(a)                                                          │
│    692 │   │   │   │   if isinstance(a, str):                                                    │
│    693 │   │   │   │   │   # Force-cast str subclasses to str (issue #21127)                     │
│    694 │   │   │   │   │   parts.append(str(a))                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: expected str, bytes or os.PathLike object, not NoneType
(.venv-3.9) [mk@archlinux gist]$

Render without -s, no exception as excpected.

.venv-3.9) [mk@archlinux gist]$ manim -pql manim-voiceover-issue.py VoiceScene -v DEBUG
Manim Community v0.17.3
[09/04/23 15:46:26] INFO     Log file will be saved in /home/mk/dev/manim/ecc/gist/media/logs/manim-voiceover-issue_VoiceScene.log                                                                                                                                                                                            logger_utils.py:170
                    INFO     Caching disabled.                                                                                                                                                                                                                                                                               cairo_renderer.py:68
                    DEBUG    List of the first few animation hashes of the scene: ['uncached_00000']                                                                                                                                                                                                                         cairo_renderer.py:87
[09/04/23 15:46:27] INFO     Animation 0 : Partial movie file written in '/home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/partial_movie_files/VoiceScene/uncached_00000.mp4'                                                                                                                       scene_file_writer.py:527
                    DEBUG    Animation with empty mobject                                                                                                                                                                                                                                                                        animation.py:174
                    INFO     Caching disabled.                                                                                                                                                                                                                                                                               cairo_renderer.py:68
                    DEBUG    List of the first few animation hashes of the scene: ['uncached_00000', 'uncached_00001']                                                                                                                                                                                                       cairo_renderer.py:87
                    INFO     Animation 1 : Partial movie file written in '/home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/partial_movie_files/VoiceScene/uncached_00001.mp4'                                                                                                                       scene_file_writer.py:527
                    DEBUG    Animation with empty mobject                                                                                                                                                                                                                                                                        animation.py:174
                    INFO     Caching disabled.                                                                                                                                                                                                                                                                               cairo_renderer.py:68
                    DEBUG    List of the first few animation hashes of the scene: ['uncached_00000', 'uncached_00001', 'uncached_00002']                                                                                                                                                                                     cairo_renderer.py:87
                    INFO     Animation 2 : Partial movie file written in '/home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/partial_movie_files/VoiceScene/uncached_00002.mp4'                                                                                                                       scene_file_writer.py:527
                    INFO     Combining to Movie file.                                                                                                                                                                                                                                                                    scene_file_writer.py:617
                    DEBUG    Partial movie files to combine (3 files): ['/home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/partial_movie_files/VoiceScene/uncached_00000.mp4',                                                                                                                       scene_file_writer.py:561
                             '/home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/partial_movie_files/VoiceScene/uncached_00001.mp4', '/home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/partial_movie_files/VoiceScene/uncached_00002.mp4']                                                               
                    DEBUG    Setting config.output_file: '/home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/VoiceScene.mp4'                                                                                                                                                                                      utils.py:336
                    INFO                                                                                                                                                                                                                                                                                                 scene_file_writer.py:736
                             File ready at '/home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/VoiceScene.mp4'                                                                                                                                                                                                                
                    INFO     Subcaption file has been written as /home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/VoiceScene.srt                                                                                                                                                                    scene_file_writer.py:731
                    INFO     Rendered VoiceScene                                                                                                                                                                                                                                                                                     scene.py:241
                             Played 3 animations                                                                                                                                                                                                                                                                                                 
                    INFO     Previewed File at: '/home/mk/dev/manim/ecc/gist/media/videos/manim-voiceover-issue/480p15/VoiceScene.mp4'                                                                                                                                                                                            file_ops.py:227
kf.service.services: KApplicationTrader: mimeType "x-scheme-handler/file" not found
(.venv-3.9) [mk@archlinux gist]$ VLC media player 3.0.18 Vetinari (revision 3.0.13-8-g41878ff4f2)

System specifications

System Details

OS Arch Linux, x86_64 Linux 6.4.12-arch1-1
RAM: 32GB
Python version: Python 3.9.18:
Installed modules (provide output from pip list):

Package                        Version
------------------------------ ----------
absl-py                        1.4.0
accelerate                     0.22.0
aiohttp                        3.8.5
aiosignal                      1.3.1
anyascii                       0.3.2
appdirs                        1.4.4
async-timeout                  4.0.3
attrs                          23.1.0
audioread                      3.0.0
azure-cognitiveservices-speech 1.31.0
Babel                          2.12.1
bangla                         0.0.2
blinker                        1.6.2
bnnumerizer                    0.0.2
bnunicodenormalizer            0.1.1
boltons                        23.0.0
cachetools                     5.3.1
certifi                        2023.7.22
cffi                           1.15.1
charset-normalizer             3.2.0
clean-fid                      0.1.35
click                          8.1.7
click-default-group            1.2.4
clip-anytorch                  2.5.2
cloup                          0.13.1
cmake                          3.27.2
colour                         0.1.5
contourpy                      1.1.0
coqpit                         0.0.17
cycler                         0.11.0
Cython                         0.29.30
dateparser                     1.1.8
decorator                      5.1.1
deepl                          1.15.0
docker-pycreds                 0.4.0
docopt                         0.6.2
einops                         0.6.1
encodec                        0.1.1
evdev                          1.6.1
ffmpeg-python                  0.2.0
filelock                       3.12.3
Flask                          2.3.3
fonttools                      4.42.1
frozenlist                     1.4.0
fsspec                         2023.9.0
ftfy                           6.1.1
future                         0.18.3
g2pkk                          0.1.2
gitdb                          4.0.10
GitPython                      3.1.34
glcontext                      2.4.0
google-auth                    2.22.0
google-auth-oauthlib           1.0.0
grpcio                         1.57.0
gruut                          2.2.3
gruut-ipa                      0.13.0
gruut-lang-de                  2.0.0
gruut-lang-en                  2.0.0
gruut-lang-es                  2.0.0
gruut-lang-fr                  2.0.2
gTTS                           2.3.2
huggingface-hub                0.16.4
idna                           3.4
imageio                        2.31.3
importlib-metadata             6.8.0
importlib-resources            6.0.1
inflect                        5.6.0
isosurfaces                    0.1.0
itsdangerous                   2.1.2
jamo                           0.4.1
jieba                          0.42.1
Jinja2                         3.1.2
joblib                         1.3.2
jsonlines                      1.2.0
jsonmerge                      1.9.2
jsonschema                     4.19.0
jsonschema-specifications      2023.7.1
k-diffusion                    0.0.16
kiwisolver                     1.4.5
kornia                         0.7.0
lazy_loader                    0.3
librosa                        0.10.0
lit                            16.0.6
llvmlite                       0.40.1
manim                          0.17.3
manim-voiceover                0.3.4
ManimPango                     0.4.3
mapbox-earcut                  1.0.1
Markdown                       3.4.4
markdown-it-py                 3.0.0
MarkupSafe                     2.1.3
matplotlib                     3.7.2
mdurl                          0.1.2
moderngl                       5.8.2
moderngl-window                2.4.4
more-itertools                 10.1.0
mpmath                         1.3.0
msgpack                        1.0.5
multidict                      6.0.4
multipledispatch               1.0.0
mutagen                        1.47.0
networkx                       2.8.8
nltk                           3.8.1
num2words                      0.5.12
numba                          0.57.0
numpy                          1.22.0
nvidia-cublas-cu11             11.10.3.66
nvidia-cuda-cupti-cu11         11.7.101
nvidia-cuda-nvrtc-cu11         11.7.99
nvidia-cuda-runtime-cu11       11.7.99
nvidia-cudnn-cu11              8.5.0.96
nvidia-cufft-cu11              10.9.0.58
nvidia-curand-cu11             10.2.10.91
nvidia-cusolver-cu11           11.4.0.1
nvidia-cusparse-cu11           11.7.4.91
nvidia-nccl-cu11               2.14.3
nvidia-nvtx-cu11               11.7.91
oauthlib                       3.2.2
openai-whisper                 20230314
packaging                      23.1
pandas                         2.0.3
pathtools                      0.1.2
Pillow                         9.5.0
pip                            23.2.1
platformdirs                   3.10.0
pooch                          1.7.0
protobuf                       4.24.2
psutil                         5.9.5
pyasn1                         0.5.0
pyasn1-modules                 0.3.0
PyAudio                        0.2.13
pycairo                        1.24.0
pycparser                      2.21
pydub                          0.25.1
pyglet                         2.0.9
Pygments                       2.16.1
pynndescent                    0.5.10
pynput                         1.7.6
pyparsing                      3.0.9
pypinyin                       0.49.0
pyrr                           0.10.3
pysbd                          0.3.4
python-crfsuite                0.9.9
python-dateutil                2.8.2
python-dotenv                  0.21.1
python-slugify                 8.0.1
python-xlib                    0.33
pyttsx3                        2.90
pytz                           2023.3
PyWavelets                     1.4.1
PyYAML                         6.0.1
referencing                    0.30.2
regex                          2023.8.8
requests                       2.31.0
requests-oauthlib              1.3.1
resize-right                   0.0.2
rich                           13.5.2
rpds-py                        0.10.0
rsa                            4.9
safetensors                    0.3.3
scikit-image                   0.21.0
scikit-learn                   1.3.0
scipy                          1.11.2
screeninfo                     0.8.1
sentry-sdk                     1.30.0
setproctitle                   1.3.2
setuptools                     59.8.0
six                            1.16.0
skia-pathops                   0.7.4
smmap                          5.0.0
soundfile                      0.12.1
sox                            1.4.1
soxr                           0.3.6
srt                            3.5.3
stable-ts                      2.9.0
svgelements                    1.9.6
sympy                          1.12
tensorboard                    2.14.0
tensorboard-data-server        0.7.1
text-unidecode                 1.3
threadpoolctl                  3.2.0
tifffile                       2023.8.30
tiktoken                       0.3.1
tokenizers                     0.13.3
torch                          2.0.1
torchaudio                     2.0.2
torchdiffeq                    0.2.3
torchsde                       0.2.5
torchvision                    0.15.2
tqdm                           4.66.1
trainer                        0.0.31
trampoline                     0.1.2
transformers                   4.32.1
triton                         2.0.0
TTS                            0.16.5
typing_extensions              4.7.1
tzdata                         2023.3
tzlocal                        5.0.1
umap-learn                     0.5.1
urllib3                        1.26.16
wandb                          0.15.9
watchdog                       2.3.1
wcwidth                        0.2.6
Werkzeug                       2.3.7
wheel                          0.41.0
yarl                           1.9.2
zipp                           3.16.2

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

built with gcc 13.1.1 (GCC) 20230429
configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libjxl --enable-libmfx --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librav1e --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc --enable-opencl --enable-opengl --enable-shared --enable-version3 --enable-vulkan
libavutil      58.  2.100 / 58.  2.100
libavcodec     60.  3.100 / 60.  3.100
libavformat    60.  3.100 / 60.  3.100
libavdevice    60.  1.100 / 60.  1.100
libavfilter     9.  3.100 /  9.  3.100
libswscale      7.  1.100 /  7.  1.100
libswresample   4. 10.100 /  4. 10.100
libpostproc    57.  1.100 / 57.  1.100

Additional comments

Recording crashes due to possible libc++ error?

Description of bug / unexpected behavior

Crashes when "r" key is pressed during recording

Expected behavior

Not to crash

How to reproduce the issue

Code for reproducing the problem

# Write your code here :-)

from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.gtts import GTTSService
from manim_voiceover.services.recorder import RecorderService

class GTTSExample(VoiceoverScene):
    def construct(self):
        #self.set_speech_service(GTTSService(lang="en", tld="com"))
        self.set_speech_service(RecorderService())

        circle = Circle()
        square = Square().shift(2 * RIGHT)

        with self.voiceover(text="here is the first text it is really long and it goes on and on forever and ever and ever.") as tracker:
            self.play(Create(circle))
            
        with self.voiceover(text="here is more text it is really long and it goes on and on forever and ever and ever.") as tracker:
            self.play(circle.animate.shift(2 * LEFT))

        self.wait()

Logs

Terminal output

(base) mnica@Mihais-MBP manimCLT % manim -pql manim-vo.py --disable_caching
Manim Community v0.17.3

-------------------------device list-------------------------
Input Device id  1  -  MacBook Pro Microphone
Input Device id  3  -  Microsoft Teams Audio
-------------------------------------------------------------
Please select an input device id to record from:
1
Selected device: MacBook Pro Microphone
╔══════════════════════════════════════════════════════════════════════════════════╗
║ Voiceover:                                                                       ║
║ here is the first text it is really long and it goes on and on forever and ever  ║
║ and ever.                                                                        ║
╚══════════════════════════════════════════════════════════════════════════════════╝
Press and hold the 'r' key to begin recording
Wait for 1 second, then start speaking.
Wait for at least 1 second after you finish speaking.
This is to eliminate any sounds that may come from your keyboard.
The silence at the beginning and end will be trimmed automatically.
You can adjust this setting using the `trim_silence_threshold` argument.
These instructions are only shown once.
Release the 'r' key to end recording
This process is not trusted! Input event monitoring will not be possible until it is added to accessibility clients.
rStream active: True
start Stream
libc++abi: terminating
zsh: abort      manim -pql manim-vo.py --disable_caching

System specifications

System Details

macOS Venture 13.4 (M1 mac)
Python version (python/py/python3 --version): 3.9.12
Installed modules (provide output from pip list):

aiohttp                              3.8.1
aiosignal                            1.2.0
alabaster                            0.7.12
anaconda-client                      1.9.0
anaconda-navigator                   2.2.0
anaconda-project                     0.10.2
anyio                                3.5.0
appdirs                              1.4.4
applaunchservices                    0.2.1
appnope                              0.1.2
appscript                            1.1.2
argon2-cffi                          21.3.0
argon2-cffi-bindings                 21.2.0
arrow                                1.2.2
astroid                              2.6.6
astropy                              5.0.4
asttokens                            2.0.5
async-timeout                        4.0.1
atomicwrites                         1.4.0
attrs                                21.4.0
Automat                              20.2.0
autopep8                             1.6.0
azure-cognitiveservices-speech       1.31.0
Babel                                2.9.1
backcall                             0.2.0
backports.functools-lru-cache        1.6.4
backports.tempfile                   1.0
backports.weakref                    1.0.post1
bcrypt                               3.2.0
beautifulsoup4                       4.11.1
binaryornot                          0.4.4
bitarray                             2.4.1
bkcharts                             0.2
black                                19.10b0
bleach                               4.1.0
bokeh                                2.4.2
boto3                                1.21.32
botocore                             1.24.32
Bottleneck                           1.3.4
brotlipy                             0.7.0
cachetools                           4.2.2
certifi                              2021.10.8
cffi                                 1.15.0
chardet                              4.0.0
charset-normalizer                   2.0.4
click                                8.0.4
click-default-group                  1.2.2
cloudpickle                          2.0.0
cloup                                0.13.1
clyent                               1.2.2
colorama                             0.4.4
colorcet                             2.0.6
colour                               0.1.5
commonmark                           0.9.1
conda                                22.11.1
conda-build                          3.21.8
conda-content-trust                  0+unknown
conda-pack                           0.6.0
conda-package-handling               1.8.1
conda-repo-cli                       1.0.4
conda-token                          0.3.0
conda-verify                         3.4.2
constantly                           15.1.0
cookiecutter                         1.7.3
cryptography                         3.4.8
cssselect                            1.1.0
cycler                               0.11.0
Cython                               0.29.28
cytoolz                              0.11.0
daal4py                              2021.5.0
dask                                 2022.2.1
datashader                           0.13.0
datashape                            0.5.4
debugpy                              1.5.1
decorator                            5.1.1
defusedxml                           0.7.1
diff-match-patch                     20200713
distributed                          2022.2.1
docutils                             0.17.1
entrypoints                          0.4
et-xmlfile                           1.1.0
executing                            0.8.3
fastjsonschema                       2.15.1
ffmpeg-python                        0.2.0
filelock                             3.6.0
flake8                               3.9.2
Flask                                1.1.2
fonttools                            4.25.0
frozenlist                           1.2.0
fsspec                               2022.2.0
future                               0.18.2
gensim                               4.1.2
glcontext                            2.3.6
glob2                                0.7
gmpy2                                2.1.2
google-api-core                      1.25.1
google-auth                          1.33.0
google-cloud-core                    1.7.1
google-cloud-storage                 1.31.0
google-crc32c                        1.1.2
google-resumable-media               1.3.1
googleapis-common-protos             1.53.0
greenlet                             1.1.1
grpcio                               1.42.0
gTTS                                 2.3.2
h5py                                 3.6.0
HeapDict                             1.0.1
holoviews                            1.14.8
huggingface-hub                      0.16.4
hvplot                               0.7.3
hyperlink                            21.0.0
idna                                 3.3
imagecodecs                          2021.8.26
imageio                              2.9.0
imagesize                            1.3.0
importlib-metadata                   4.11.3
incremental                          21.3.0
inflection                           0.5.1
iniconfig                            1.1.1
intake                               0.6.5
intervaltree                         3.1.0
ipykernel                            6.9.1
ipython                              8.2.0
ipython-genutils                     0.2.0
ipywidgets                           7.6.5
isort                                5.9.3
isosurfaces                          0.1.0
itemadapter                          0.3.0
itemloaders                          1.0.4
itsdangerous                         2.0.1
jdcal                                1.4.1
jedi                                 0.18.1
Jinja2                               2.11.3
jinja2-time                          0.2.0
jmespath                             0.10.0
joblib                               1.1.0
json5                                0.9.6
jsonschema                           4.4.0
jupyter                              1.0.0
jupyter-client                       6.1.12
jupyter-console                      6.4.0
jupyter-core                         4.9.2
jupyter-server                       1.13.5
jupyterlab                           3.3.2
jupyterlab-pygments                  0.1.2
jupyterlab-server                    2.10.3
jupyterlab-widgets                   1.0.0
keyring                              23.4.0
kiwisolver                           1.3.2
lazy-object-proxy                    1.6.0
libarchive-c                         2.9
llvmlite                             0.38.0
locket                               0.2.1
lxml                                 4.8.0
manim                                0.17.3
manim-voiceover                      0.3.4
ManimPango                           0.4.1
mapbox-earcut                        1.0.1
Markdown                             3.3.4
markdown-it-py                       3.0.0
MarkupSafe                           2.0.1
matplotlib                           3.5.1
matplotlib-inline                    0.1.2
mccabe                               0.6.1
mdurl                                0.1.2
mistune                              0.8.4
mkl-fft                              1.3.1
mkl-random                           1.2.2
mkl-service                          2.4.0
mock                                 4.0.3
moderngl                             5.6.4
moderngl-window                      2.4.1
more-itertools                       10.1.0
mpmath                               1.2.1
msgpack                              1.0.2
multidict                            5.2.0
multipledispatch                     0.6.0
munkres                              1.1.4
mutagen                              1.46.0
mypy-extensions                      0.4.3
navigator-updater                    0.2.1
nbclassic                            0.3.5
nbclient                             0.5.13
nbconvert                            6.4.4
nbformat                             5.3.0
nest-asyncio                         1.5.5
networkx                             2.7.1
nltk                                 3.7
nose                                 1.3.7
notebook                             6.4.8
numba                                0.55.1
numexpr                              2.8.1
numpy                                1.21.6
numpydoc                             1.2
olefile                              0.46
openai-whisper                       20230314
openpyxl                             3.0.9
packaging                            21.3
pandas                               1.4.2
pandocfilters                        1.5.0
panel                                0.13.0
param                                1.12.0
parsel                               1.6.0
parso                                0.8.3
partd                                1.2.0
pathspec                             0.7.0
patsy                                0.5.2
pep8                                 1.7.1
pexpect                              4.8.0
pickleshare                          0.7.5
Pillow                               9.2.0
pip                                  23.2.1
pkginfo                              1.8.2
plotly                               5.6.0
pluggy                               1.0.0
poyo                                 0.5.0
prometheus-client                    0.13.1
prompt-toolkit                       3.0.20
Protego                              0.1.16
protobuf                             3.19.1
psutil                               5.8.0
ptyprocess                           0.7.0
pure-eval                            0.2.2
py                                   1.11.0
pyasn1                               0.4.8
pyasn1-modules                       0.2.8
PyAudio                              0.2.11
pycairo                              1.24.0
pycodestyle                          2.7.0
pycosat                              0.6.3
pycparser                            2.21
pyct                                 0.4.6
pycurl                               7.44.1
PyDispatcher                         2.0.5
pydocstyle                           6.1.1
pydub                                0.25.1
pyerfa                               2.0.0
pyflakes                             2.3.1
pyglet                               1.5.26
Pygments                             2.11.2
PyHamcrest                           2.0.2
PyJWT                                2.1.0
pylint                               2.9.6
pyls-spyder                          0.4.0
pynput                               1.7.6
pyobjc-core                          9.2
pyobjc-framework-ApplicationServices 9.2
pyobjc-framework-Cocoa               9.2
pyobjc-framework-Quartz              9.2
pyodbc                               4.0.32
pyOpenSSL                            21.0.0
pyparsing                            3.0.4
pyrr                                 0.10.3
pyrsistent                           0.18.0
PySocks                              1.7.1
pytest                               7.1.1
python-dateutil                      2.8.2
python-dotenv                        0.21.1
python-lsp-black                     1.0.0
python-lsp-jsonrpc                   1.0.0
python-lsp-server                    1.2.4
python-slugify                       8.0.1
python-snappy                        0.6.0
pytz                                 2021.3
pyviz-comms                          2.0.2
PyWavelets                           1.3.0
PyYAML                               6.0
pyzmq                                22.3.0
QDarkStyle                           3.0.2
qstylizer                            0.1.10
QtAwesome                            1.0.3
qtconsole                            5.3.0
QtPy                                 2.0.1
queuelib                             1.5.0
regex                                2022.3.15
requests                             2.27.1
requests-file                        1.5.1
rich                                 12.5.1
rope                                 0.22.0
rsa                                  4.7.2
Rtree                                0.9.7
ruamel.yaml                          0.17.32
ruamel.yaml.clib                     0.2.7
ruamel-yaml-conda                    0.15.100
s3transfer                           0.5.0
safetensors                          0.3.2
scikit-image                         0.19.2
scikit-learn                         1.0.2
scikit-learn-intelex                 2021.20220215.132722
scipy                                1.11.1
Scrapy                               2.6.1
screeninfo                           0.8.1
seaborn                              0.11.2
Send2Trash                           1.8.0
service-identity                     18.1.0
setuptools                           68.0.0
sip                                  4.19.13
six                                  1.16.0
skia-pathops                         0.7.2
smart-open                           5.1.0
sniffio                              1.2.0
snowballstemmer                      2.2.0
sortedcollections                    2.1.0
sortedcontainers                     2.4.0
soupsieve                            2.3.1
sox                                  1.4.1
Sphinx                               4.4.0
sphinxcontrib-applehelp              1.0.2
sphinxcontrib-devhelp                1.0.2
sphinxcontrib-htmlhelp               2.0.0
sphinxcontrib-jsmath                 1.0.1
sphinxcontrib-qthelp                 1.0.3
sphinxcontrib-serializinghtml        1.1.5
spyder                               5.1.5
spyder-kernels                       2.1.3
SQLAlchemy                           1.4.32
srt                                  3.5.2
stable-ts                            2.8.1
stack-data                           0.2.0
statsmodels                          0.13.2
svgelements                          1.9.5
sympy                                1.10.1
tables                               3.6.1
tabulate                             0.8.9
TBB                                  0.2
tblib                                1.7.0
tenacity                             8.0.1
terminado                            0.13.1
testpath                             0.5.0
text-unidecode                       1.3
textdistance                         4.2.1
threadpoolctl                        2.2.0
three-merge                          0.1.1
tifffile                             2021.7.2
tiktoken                             0.3.1
tinycss                              0.4
tldextract                           3.2.0
tokenizers                           0.13.3
toml                                 0.10.2
tomli                                1.2.2
toolz                                0.11.2
torch                                2.0.1
torchaudio                           2.0.2
tornado                              6.1
tqdm                                 4.64.0
traitlets                            5.1.1
transformers                         4.31.0
Twisted                              22.2.0
typed-ast                            1.4.3
typing_extensions                    4.1.1
ujson                                5.1.0
Unidecode                            1.2.0
urllib3                              1.26.9
w3lib                                1.21.0
watchdog                             2.1.6
wcwidth                              0.2.5
webencodings                         0.5.1
websocket-client                     0.58.0
Werkzeug                             2.0.3
wheel                                0.41.1
widgetsnbextension                   3.5.2
wrapt                                1.12.1
wurlitzer                            3.0.2
xarray                               0.20.1
xlrd                                 2.0.1
XlsxWriter                           3.0.3
xlwings                              0.24.9
yapf                                 0.31.0
yarl                                 1.6.3
zict                                 2.0.0
zipp                                 3.7.0
zope.interface                       5.4.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

built with Apple clang version 14.0.3 (clang-1403.0.22.14.1)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/6.0 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox --enable-neon
libavutil      58.  2.100 / 58.  2.100
libavcodec     60.  3.100 / 60.  3.100
libavformat    60.  3.100 / 60.  3.100
libavdevice    60.  1.100 / 60.  1.100
libavfilter     9.  3.100 /  9.  3.100
libswscale      7.  1.100 /  7.  1.100
libswresample   4. 10.100 /  4. 10.100
libpostproc    57.  1.100 / 57.  1.100

Additional comments

The only error it throws is "libc++abi: terminating". Any help or ideas to try and get it to work are much appreciated!

Matcha-TTS

Could you add Matcha-TTS to manim-voiceover?

Problem to use the RecorderService - Cannot install on Python version 3.12.0; only versions >=3.8,<3.12 are supported.

Preliminaries

I have followed the latest version of the
installation instructions.

Description of error

I have a working sample using the Azure Voiceover and I try to record my voice using RecorderService like this:

from manim import *
from manim_voiceover import VoiceoverScene
#from manim_voiceover.services.azure import AzureService
from manim_voiceover.services.recorder import RecorderService

class TriangleArea(VoiceoverScene):
    def construct(self):

        self.set_speech_service(
          RecorderService()
        )

        # self.set_speech_service(
        #    AzureService(

When I run my test, it asks me to install manim-voiceover[transcribe] (something I don't need actually). I accept but there is an error:

Cannot install on Python version 3.12.0; only versions >=3.8,<3.12 are supported.

Installation logs

$ manim -pql test.py Test --disable_caching
Manim Community v0.18.0


[01/04/24 19:54:20] ERROR    Missing packages. Run `pip install "manim-voiceover[transcribe]"` to be able to transcribe voiceovers.       base.py:145

                    INFO     The extra packages required by SpeechService.set_transcription() are not installed. Shall I install them   helper.py:165

                             for you? [Y/n]                                                                                                          

[01/04/24 19:58:54] INFO     Installing transcribe...                                                                                   helper.py:175

WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.

Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.

To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.

Requirement already satisfied: manim-voiceover[transcribe] in /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages (0.3.4.post1)

Requirement already satisfied: manim in /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages (from manim-voiceover[transcribe]) (0.18.0)

Requirement already satisfied: mutagen<2.0.0,>=1.46.0 in /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages (from manim-voiceover[transcribe]) (1.47.0)

Collecting openai-whisper<20230315,>=20230314 (from manim-voiceover[transcribe])

  Using cached openai-whisper-20230314.tar.gz (792 kB)

  Installing build dependencies ... done

  Getting requirements to build wheel ... done

  Preparing metadata (pyproject.toml) ... done

Requirement already satisfied: pip>=21.0.1 in /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages (from manim-voiceover[transcribe]) (23.3.2)

Requirement already satisfied: pydub<0.26.0,>=0.25.1 in /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages (from manim-voiceover[transcribe]) (0.25.1)

Requirement already satisfied: python-dotenv<0.22.0,>=0.21.0 in /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages (from manim-voiceover[transcribe]) (0.21.1)

Requirement already satisfied: python-slugify<9.0.0,>=8.0.1 in /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages (from manim-voiceover[transcribe]) (8.0.1)

Requirement already satisfied: sox<2.0.0,>=1.4.1 in /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages (from manim-voiceover[transcribe]) (1.4.1)

Collecting stable-ts<3.0.0,>=2.6.2 (from manim-voiceover[transcribe])

  Using cached stable-ts-2.14.2.tar.gz (121 kB)

  Installing build dependencies ... done

  Getting requirements to build wheel ... done

  Preparing metadata (pyproject.toml) ... done

Collecting numba (from openai-whisper<20230315,>=20230314->manim-voiceover[transcribe])

  Using cached numba-0.58.1.tar.gz (2.6 MB)

  Installing build dependencies ... done

  Getting requirements to build wheel ... error

  error: subprocess-exited-with-error

  

  × Getting requirements to build wheel did not run successfully.

  │ exit code: 1

  ╰─> [21 lines of output]

      Traceback (most recent call last):

        File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>

          main()

        File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main

          json_out['return_val'] = hook(**hook_input['kwargs'])

                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel

          return hook(config_settings)

                 ^^^^^^^^^^^^^^^^^^^^^

        File "/private/var/folders/dn/t93q8nhd5zdfpvn62xkhxx1r0000gn/T/pip-build-env-l8xzv1z6/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel

          return self._get_build_requires(config_settings, requirements=['wheel'])

                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        File "/private/var/folders/dn/t93q8nhd5zdfpvn62xkhxx1r0000gn/T/pip-build-env-l8xzv1z6/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires

          self.run_setup()

        File "/private/var/folders/dn/t93q8nhd5zdfpvn62xkhxx1r0000gn/T/pip-build-env-l8xzv1z6/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 480, in run_setup

          super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)

        File "/private/var/folders/dn/t93q8nhd5zdfpvn62xkhxx1r0000gn/T/pip-build-env-l8xzv1z6/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 311, in run_setup

          exec(code, locals())

        File "<string>", line 51, in <module>

        File "<string>", line 48, in _guard_py_ver

      RuntimeError: Cannot install on Python version 3.12.0; only versions >=3.8,<3.12 are supported.

      [end of output]

  

  note: This error originates from a subprocess, and is likely not a problem with pip.

error: subprocess-exited-with-error



× Getting requirements to build wheel did not run successfully.

│ exit code: 1

╰─> See above for output.



note: This error originates from a subprocess, and is likely not a problem with pip.

[04.01.2024 19:59:04] INFO     Installed missing extras. Please run Manim again.                                                        helper.py:177

Installed missing extras. Please run Manim again.

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): macOS Ventura 13.6.1
Python version (python/py/python3 --version): 3.12.0
Installed modules (provide output from pip list):

Package                              Version

------------------------------------ -----------

azure-cognitiveservices-speech       1.32.1

certifi                              2023.7.22

charset-normalizer                   3.3.2

click                                8.1.7

click-default-group                  1.2.4

cloup                                2.1.2

Cython                               3.0.5

decorator                            5.1.1

glcontext                            2.5.0

gTTS                                 2.4.0

idna                                 3.4

isosurfaces                          0.1.0

manim                                0.18.0

manim-voiceover                      0.3.4.post1

ManimPango                           0.5.0

mapbox-earcut                        1.0.1

markdown-it-py                       3.0.0

mdurl                                0.1.2

moderngl                             5.8.2

moderngl-window                      2.4.4

multipledispatch                     1.0.0

mutagen                              1.47.0

networkx                             3.2.1

numpy                                1.26.1

Pillow                               9.5.0

pip                                  23.3.2

PyAudio                              0.2.14

pycairo                              1.25.1

pydub                                0.25.1

pyglet                               2.0.10

Pygments                             2.16.1

pynput                               1.7.6

pyobjc-core                          10.1

pyobjc-framework-ApplicationServices 10.1

pyobjc-framework-Cocoa               10.1

pyobjc-framework-CoreText            10.1

pyobjc-framework-Quartz              10.1

pyrr                                 0.10.3

python-dotenv                        0.21.1

python-slugify                       8.0.1

requests                             2.31.0

rich                                 13.6.0

scipy                                1.11.3

screeninfo                           0.8.1

setuptools                           68.2.2

six                                  1.16.0

skia-pathops                         0.8.0.post1

sox                                  1.4.1

srt                                  3.5.3

svgelements                          1.9.6

text-unidecode                       1.3

tqdm                                 4.66.1

typing_extensions                    4.8.0

urllib3                              2.0.7

watchdog                             3.0.0

Additional comments

Add support for other transcription services

Description of proposed feature

Currently, manim-voiceover only supports Whisper as a transcription service, and it is hard coded for all SpeechService backends. I propose that the manim_voiceover.services module be modified to have flexibility in which transcription backend is being used.

How can the new feature be used?

Not only will this give users a choice in which transcription service to use, but it will also make it much easier for users to add transcription services that are not yet covered.

Additional comments

Whisper is no longer the best transcription service available, being beaten by other services such as AssemblyAI, which also supports word-level timestamps among other features.

manim-voiceover xtts AttributeError and voice cloning

I'm using Coqui TTS with xtts model.


class test(VoiceoverScene):
    def construct(self):
        self.set_speech_service(CoquiService(model_name="tts_models/multilingual/multi-dataset/xtts_v2",gpu=True))

gives AttributeError: 'TTS' object has no attribute 'languages' But it does have that attribute! With other models, it doesn't throw this error.

Also, where can I set speaker_wav=["speaker.wav"] for voice cloning?

Temp file not found when running GTTSExample

Description of bug / unexpected behavior

After installing manim-voiceover I checked installation with gtts-example.py running

manim -pql gtts-example.py --disable_caching GTTSExample

but obtained an error saying that the file GTTSExample_temp.mp4 has not been found.
The voiceovers and the video get created, but the video has no voiceover

Expected behavior

Manim should produce the video with a voiceover and should report no errors. Indeed, if I remove the manim-voiceover stuff from the python file, using a run_time of 2 seconds, all works smoothly

How to reproduce the issue

Code for reproducing the problem

See gtts-example.py distributed file

Additional media files

Images/GIFs

Logs

Terminal output

Manim Community v0.17.2

[03/19/23 17:32:19] INFO     Caching disabled.                                                     cairo_renderer.py:68
                    DEBUG    List of the first few animation hashes of the scene:                  cairo_renderer.py:87
                             ['uncached_00000']
                    INFO     Animation 0 : Partial movie file written in                       scene_file_writer.py:527
                             'C:\Users\ale\Python\media\videos\gtts-example\480p15\partial_mov
                             ie_files\GTTSExample\uncached_00000.mp4'
[03/19/23 17:32:20] INFO     Caching disabled.                                                     cairo_renderer.py:68
                    DEBUG    List of the first few animation hashes of the scene:                  cairo_renderer.py:87
                             ['uncached_00000', 'uncached_00001']
                    INFO     Animation 1 : Partial movie file written in                       scene_file_writer.py:527
                             'C:\Users\ale\Python\media\videos\gtts-example\480p15\partial_mov
                             ie_files\GTTSExample\uncached_00001.mp4'
                    INFO     Caching disabled.                                                     cairo_renderer.py:68
                    DEBUG    List of the first few animation hashes of the scene:                  cairo_renderer.py:87
                             ['uncached_00000', 'uncached_00001', 'uncached_00002']
[03/19/23 17:32:21] INFO     Animation 2 : Partial movie file written in                       scene_file_writer.py:527
                             'C:\Users\ale\Python\media\videos\gtts-example\480p15\partial_mov
                             ie_files\GTTSExample\uncached_00002.mp4'
                    INFO     Caching disabled.                                                     cairo_renderer.py:68
                    DEBUG    List of the first few animation hashes of the scene:                  cairo_renderer.py:87
                             ['uncached_00000', 'uncached_00001', 'uncached_00002',
                             'uncached_00003']
                    INFO     Animation 3 : Partial movie file written in                       scene_file_writer.py:527
                             'C:\Users\ale\Python\media\videos\gtts-example\480p15\partial_mov
                             ie_files\GTTSExample\uncached_00003.mp4'
                    DEBUG    Animation with empty mobject                                              animation.py:174
                    INFO     Caching disabled.                                                     cairo_renderer.py:68
                    DEBUG    List of the first few animation hashes of the scene:                  cairo_renderer.py:87
                             ['uncached_00000', 'uncached_00001', 'uncached_00002',
                             'uncached_00003', 'uncached_00004']
[03/19/23 17:32:22] INFO     Animation 4 : Partial movie file written in                       scene_file_writer.py:527
                             'C:\Users\ale\Python\media\videos\gtts-example\480p15\partial_mov
                             ie_files\GTTSExample\uncached_00004.mp4'
                    DEBUG    Animation with empty mobject                                              animation.py:174
                    INFO     Caching disabled.                                                     cairo_renderer.py:68
                    DEBUG    List of the first few animation hashes of the scene:                  cairo_renderer.py:87
                             ['uncached_00000', 'uncached_00001', 'uncached_00002',
                             'uncached_00003', 'uncached_00004']
                    INFO     Animation 5 : Partial movie file written in                       scene_file_writer.py:527
                             'C:\Users\ale\Python\media\videos\gtts-example\480p15\partial_mov
                             ie_files\GTTSExample\uncached_00005.mp4'
                    INFO     Combining to Movie file.                                          scene_file_writer.py:617
                    DEBUG    Partial movie files to combine (6 files):                         scene_file_writer.py:561
                             ['C:\\Users\\ale\\Python\\media\\videos\\gtts-example\\480p15\\pa
                             rtial_movie_files\\GTTSExample\\uncached_00000.mp4',
                             'C:\\Users\\ale\\Python\\media\\videos\\gtts-example\\480p15\\par
                             tial_movie_files\\GTTSExample\\uncached_00001.mp4',
                             'C:\\Users\\ale\\Python\\media\\videos\\gtts-example\\480p15\\par
                             tial_movie_files\\GTTSExample\\uncached_00002.mp4',
                             'C:\\Users\\ale\\Python\\media\\videos\\gtts-example\\480p15\\par
                             tial_movie_files\\GTTSExample\\uncached_00003.mp4',
                             'C:\\Users\\ale\\Python\\media\\videos\\gtts-example\\480p15\\par
                             tial_movie_files\\GTTSExample\\uncached_00004.mp4']
[mp3 @ 0000022C2CC8FE00] Failed to read frame size: Could not seek to 1026.
C:\Users\ale\Python\media\videos\gtts-example\480p15\GTTSExample.wav: Invalid argument
┌─────────────────────────────── Traceback (most recent call last) ────────────────────────────────┐
│ C:\Users\ale\anaconda3\envs\my-manim-environment\lib\shutil.py:816 in move                       │
│                                                                                                  │
│    813 │   │   if os.path.exists(real_dst):                                                      │
│    814 │   │   │   raise Error("Destination path '%s' already exists" % real_dst)                │
│    815 │   try:                                                                                  │
│ >  816 │   │   os.rename(src, real_dst)                                                          │
│    817 │   except OSError:                                                                       │
│    818 │   │   if os.path.islink(src):                                                           │
│    819 │   │   │   linkto = os.readlink(src)                                                     │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
FileNotFoundError: [WinError 2] The system cannot find the file specified:
'C:\\Users\\ale\\Python\\media\\videos\\gtts-example\\480p15\\GTTSExample_temp.mp4' ->
'C:\\Users\\ale\\Python\\media\\videos\\gtts-example\\480p15\\GTTSExample.mp4'

During handling of the above exception, another exception occurred:

┌─────────────────────────────── Traceback (most recent call last) ────────────────────────────────┐
│ C:\Users\ale\anaconda3\envs\my-manim-environment\lib\site-packages\manim\cli\render\commands.py: │
│ 115 in render                                                                                    │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ > 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\my-manim-environment\lib\site-packages\manim\scene\scene.py:233 in   │
│ render                                                                                           │
│                                                                                                  │
│    230 │   │   │   return True                                                                   │
│    231 │   │   self.tear_down()                                                                  │
│    232 │   │   # We have to reset these settings in case of multiple renders.                    │
│ >  233 │   │   self.renderer.scene_finished(self)                                                │
│    234 │   │                                                                                     │
│    235 │   │   # Show info only if animations are rendered or to get image                       │
│    236 │   │   if (                                                                              │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\my-manim-environment\lib\site-packages\manim\renderer\cairo_renderer │
│ .py:259 in scene_finished                                                                        │
│                                                                                                  │
│   256 │   def scene_finished(self, scene):                                                       │
│   257 │   │   # If no animations in scene, render an image instead                               │
│   258 │   │   if self.num_plays:                                                                 │
│ > 259 │   │   │   self.file_writer.finish()                                                      │
│   260 │   │   elif config.write_to_movie:                                                        │
│   261 │   │   │   config.save_last_frame = True                                                  │
│   262 │   │   │   config.write_to_movie = False                                                  │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\my-manim-environment\lib\site-packages\manim\scene\scene_file_writer │
│ .py:457 in finish                                                                                │
│                                                                                                  │
│   454 │   │   if write_to_movie():                                                               │
│   455 │   │   │   if hasattr(self, "writing_process"):                                           │
│   456 │   │   │   │   self.writing_process.terminate()                                           │
│ > 457 │   │   │   self.combine_to_movie()                                                        │
│   458 │   │   │   if config.save_sections:                                                       │
│   459 │   │   │   │   self.combine_to_section_videos()                                           │
│   460 │   │   │   if config["flush_cache"]:                                                      │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\my-manim-environment\lib\site-packages\manim\scene\scene_file_writer │
│ .py:664 in combine_to_movie                                                                      │
│                                                                                                  │
│   661 │   │   │   │   str(temp_file_path),                                                       │
│   662 │   │   │   ]                                                                              │
│   663 │   │   │   subprocess.call(commands)                                                      │
│ > 664 │   │   │   shutil.move(str(temp_file_path), str(movie_file_path))                         │
│   665 │   │   │   sound_file_path.unlink()                                                       │
│   666 │   │                                                                                      │
│   667 │   │   self.print_file_ready_message(str(movie_file_path))                                │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\my-manim-environment\lib\shutil.py:836 in move                       │
│                                                                                                  │
│    833 │   │   │   │   │    symlinks=True)                                                       │
│    834 │   │   │   rmtree(src)                                                                   │
│    835 │   │   else:                                                                             │
│ >  836 │   │   │   copy_function(src, real_dst)                                                  │
│    837 │   │   │   os.unlink(src)                                                                │
│    838 │   return real_dst                                                                       │
│    839                                                                                           │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\my-manim-environment\lib\shutil.py:434 in copy2                      │
│                                                                                                  │
│    431 │   """                                                                                   │
│    432 │   if os.path.isdir(dst):                                                                │
│    433 │   │   dst = os.path.join(dst, os.path.basename(src))                                    │
│ >  434 │   copyfile(src, dst, follow_symlinks=follow_symlinks)                                   │
│    435 │   copystat(src, dst, follow_symlinks=follow_symlinks)                                   │
│    436 │   return dst                                                                            │
│    437                                                                                           │
│                                                                                                  │
│ C:\Users\ale\anaconda3\envs\my-manim-environment\lib\shutil.py:254 in copyfile                   │
│                                                                                                  │
│    251 │   if not follow_symlinks and _islink(src):                                              │
│    252 │   │   os.symlink(os.readlink(src), dst)                                                 │
│    253 │   else:                                                                                 │
│ >  254 │   │   with open(src, 'rb') as fsrc:                                                     │
│    255 │   │   │   try:                                                                          │
│    256 │   │   │   │   with open(dst, 'wb') as fdst:                                             │
│    257 │   │   │   │   │   # macOS                                                               │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
FileNotFoundError: [Errno 2] No such file or directory:
'C:\\Users\\ale\\Python\\media\\videos\\gtts-example\\480p15\\GTTSExample_temp.mp4'

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Windows 10 Pro version 22H2 build 19045.2728
RAM: 8 GB
Python version (python/py/python3 --version): Python 3.10.9
Installed modules (provide output from pip list):

Package                        Version
------------------------------ -----------
alabaster                      0.7.13
arrow                          1.2.3
astroid                        2.15.0
asttokens                      2.2.1
atomicwrites                   1.4.1
attrs                          22.2.0
autopep8                       1.6.0
azure-cognitiveservices-speech 1.26.0
Babel                          2.12.1
backcall                       0.2.0
backports.functools-lru-cache  1.6.4
bcrypt                         3.2.2
beautifulsoup4                 4.11.2
binaryornot                    0.4.4
black                          23.1.0
bleach                         6.0.0
brotlipy                       0.7.0
CacheControl                   0.12.11
certifi                        2022.12.7
cffi                           1.15.1
chardet                        5.1.0
charset-normalizer             2.1.1
cleo                           2.0.1
click                          8.1.3
click-default-group            1.2.2
cloudpickle                    2.2.1
cloup                          0.13.1
colorama                       0.4.6
colour                         0.1.5
comm                           0.1.2
comtypes                       1.1.14
cookiecutter                   2.1.1
crashtest                      0.4.1
cryptography                   39.0.2
cycler                         0.11.0
Cython                         0.29.33
dataclasses                    0.8
debugpy                        1.6.6
decorator                      5.1.1
deepl                          1.14.0
defusedxml                     0.7.1
diff-match-patch               20200713
dill                           0.3.6
distlib                        0.3.6
docstring-to-markdown          0.11
docutils                       0.19
dulwich                        0.20.50
entrypoints                    0.4
executing                      1.2.0
fastjsonschema                 2.16.3
filelock                       3.10.0
flake8                         6.0.0
fonttools                      4.39.2
future                         0.18.3
glcontext                      2.3.7
gTTS                           2.3.1
html5lib                       1.1
humanhash3                     0.0.6
idna                           3.4
imagesize                      1.4.1
importlib-metadata             6.1.0
importlib-resources            5.12.0
inflection                     0.5.1
intervaltree                   3.0.2
ipykernel                      6.21.3
ipython                        8.11.0
ipython-genutils               0.2.0
isort                          5.12.0
isosurfaces                    0.1.0
jaraco.classes                 3.2.3
jedi                           0.18.2
jellyfish                      0.9.0
Jinja2                         3.1.2
jinja2-time                    0.2.0
jsonschema                     4.17.3
jupyter_client                 7.4.9
jupyter_core                   5.3.0
jupyterlab-pygments            0.2.2
keyring                        23.13.1
kiwisolver                     1.4.4
lazy-object-proxy              1.9.0
lockfile                       0.12.2
manim                          0.17.2
manim-voiceover                0.3.0
ManimPango                     0.4.3
mapbox-earcut                  1.0.0
markdown-it-py                 2.2.0
MarkupSafe                     2.1.2
matplotlib                     3.5.3
matplotlib-inline              0.1.6
mccabe                         0.7.0
mdurl                          0.1.0
mistune                        2.0.5
mkl-service                    2.4.0
moderngl                       5.8.1
moderngl-window                2.4.1
more-itertools                 9.1.0
mpmath                         1.3.0
msgpack                        1.0.5
multipledispatch               0.6.0
munkres                        1.1.4
mutagen                        1.46.0
mypy-extensions                1.0.0
nbclient                       0.7.2
nbconvert                      7.2.9
nbformat                       5.7.3
nest-asyncio                   1.5.6
networkx                       2.8.8
numpy                          1.24.2
numpydoc                       1.5.0
packaging                      23.0
pandas                         1.5.3
pandocfilters                  1.5.0
paramiko                       3.1.0
parso                          0.8.3
pathspec                       0.11.1
pexpect                        4.8.0
pickleshare                    0.7.5
Pillow                         9.4.0
pip                            23.0.1
pkginfo                        1.9.6
pkgutil_resolve_name           1.3.10
platformdirs                   2.6.2
playsound                      1.3.0
pluggy                         1.0.0
ply                            3.11
poetry                         1.3.1
poetry-core                    1.4.0
poetry-plugin-export           1.3.0
pooch                          1.7.0
prompt-toolkit                 3.0.38
psutil                         5.9.4
ptyprocess                     0.7.0
pure-eval                      0.2.2
PyAudio                        0.2.13
pycairo                        1.23.0
pycodestyle                    2.10.0
pycparser                      2.21
pydocstyle                     6.2.3
pydub                          0.25.1
pyflakes                       3.0.1
pyglet                         1.5.27
Pygments                       2.14.0
pylint                         2.17.0
pylint-venv                    3.0.1
pyls-spyder                    0.4.0
PyNaCl                         1.5.0
pynput                         1.7.6
pyOpenSSL                      23.0.0
pyparsing                      3.0.9
pypiwin32                      223
PyQt5                          5.15.7
PyQt5-sip                      12.11.0
PyQtWebEngine                  5.15.4
pyrr                           0.10.3
pyrsistent                     0.19.3
PySocks                        1.7.1
python-dateutil                2.8.2
python-dotenv                  0.21.1
python-lsp-black               1.2.1
python-lsp-jsonrpc             1.0.0
python-lsp-server              1.7.1
python-slugify                 8.0.1
pytoolconfig                   1.2.5
pyttsx3                        2.90
pytz                           2022.7.1
pywin32                        304
pywin32-ctypes                 0.2.0
PyYAML                         6.0
pyzmq                          25.0.1
QDarkStyle                     3.0.3
qstylizer                      0.2.2
QtAwesome                      1.2.3
qtconsole                      5.4.1
QtPy                           2.3.0
rapidfuzz                      2.13.7
requests                       2.28.2
requests-toolbelt              0.10.1
rich                           13.3.2
rope                           1.7.0
Rtree                          1.0.1
scipy                          1.10.1
screeninfo                     0.8.1
setuptools                     67.6.0
shellingham                    1.5.1
sip                            6.7.7
six                            1.16.0
skia-pathops                   0.7.4
snowballstemmer                2.2.0
sortedcontainers               2.4.0
soupsieve                      2.3.2.post1
sox                            1.4.1
Sphinx                         6.1.3
sphinxcontrib-applehelp        1.0.4
sphinxcontrib-devhelp          1.0.2
sphinxcontrib-htmlhelp         2.0.1
sphinxcontrib-jsmath           1.0.1
sphinxcontrib-qthelp           1.0.3
sphinxcontrib-serializinghtml  1.1.5
spyder                         5.4.2
spyder-kernels                 2.4.2
srt                            3.5.2
stack-data                     0.6.2
svgelements                    1.9.1
sympy                          1.11.1
text-unidecode                 1.3
textdistance                   4.5.0
three-merge                    0.1.1
tinycss2                       1.2.1
toml                           0.10.2
tomli                          2.0.1
tomlkit                        0.11.6
torch                          1.12.1
tornado                        6.2
tqdm                           4.65.0
traitlets                      5.9.0
trove-classifiers              2023.3.9
typing_extensions              4.5.0
ujson                          5.7.0
unicodedata2                   15.0.0
Unidecode                      1.3.6
urllib3                        1.26.15
virtualenv                     20.21.0
watchdog                       2.3.1
wcwidth                        0.2.6
webencodings                   0.5.1
whatthepatch                   1.0.4
wheel                          0.40.0
win-inet-pton                  1.1.0
wrapt                          1.15.0
yapf                           0.32.0
zipp                           3.15.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
built with clang version 15.0.7
configuration: --prefix=/d/bld/ffmpeg_1674566436592/_h_env/Library --cc=clang.exe --cxx=clang++.exe --nm=llvm-nm --ar=llvm-ar --disable-doc --disable-openssl --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --ld=lld-link --target-os=win64 --enable-cross-compile --toolchain=msvc --host-cc=clang.exe --extra-libs=ucrt.lib --extra-libs=vcruntime.lib --extra-libs=oldnames.lib --strip=llvm-strip --disable-stripping --host-extralibs= --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libopus --pkg-config=/d/bld/ffmpeg_1674566436592/_build_env/Library/bin/pkg-config
libavutil      57. 28.100 / 57. 28.100
libavcodec     59. 37.100 / 59. 37.100
libavformat    59. 27.100 / 59. 27.100
libavdevice    59.  7.100 / 59.  7.100
libavfilter     8. 44.100 /  8. 44.100
libswscale      6.  7.100 /  6.  7.100
libswresample   4.  7.100 /  4.  7.100
libpostproc    56.  6.100 / 56.  6.100

Additional comments

Integration for Google Cloud Platform

Description of proposed feature

We already have integrations for Azure in this library but some people are more comfortable with GCP (Google Cloud Platform). We need to introduce GCP integration as well.

Additional comments

I already have a developed code for that, please assign this issue to me and I would love to create a pull request and contribute.

Whe trying to use the TTS package missing error message

Currently when importing the coqui service it just prints that the package is not installed if something fails but there is still an error message attached

try:
    from TTS.api import TTS
except ImportError as e:
    logger.error(e)
    logger.error("Missing packages. Run `pip install TTS` to use CoquiService.")

This would solve the problem and report the actual error to the user

 WARNING  Japanese requires mecab-python3 and unidic-lite.

Like this in my case. I struggled a while to find out that installing the package is not the actual error

Automatic stretching of animations to voiceover duration

Description of proposed feature

I suggest to proportionally stretch all animations to match their voiceover duration.

How can the new feature be used?

It can be fiddly to manually tune animation durations. In order to eliminate the additional complication of the voiceover duration, this can be automatized.

Additional comments

I reimplemented the voiceover generator to do everything behind the scenes with no additional complexity:
https://github.com/pesho-ivanov/thesis-manim

I have successfully used my implementation but it is probably not complete. Will be cool if it can be integrated into manim-voiceover or even manim by someone who is more experienced in the libraries.

Use Pydantic for serializing/deserializing voiceover data

Description of proposed feature

TBD

How can the new feature be used?

Additional comments

"TimeInterpolator received weird input" warning for certain Azure voices

Description of bug / unexpected behavior

The Azure voice de-DE-ConradNeural leads to warnings

TimeInterpolator received weird input, 
there may be something wrong with the                  
word boundaries.

during compilation. Timing using bookmarks is off in the resulting video.

Expected behavior

Correct timing via bookmarks as with the voice en-US-AriaNeural

How to reproduce the issue

Use the voice de-DE-ConradNeural instead of en-US-AriaNeural in bookmark-example.py

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene

# from manim_voiceover.services.coqui import CoquiService

from manim_voiceover.services.azure import AzureService


class BookmarkExample(VoiceoverScene):
    def construct(self):
        # self.set_speech_service(CoquiService())
        self.set_speech_service(
            AzureService(
                voice="de-DE-ConradNeural"
                # style="newscast-casual",
            )
        )

        blist = BulletedList(
            "Trigger animations", "At any word", "Bookmarks", font_size=64
        )

        with self.voiceover(
            text="""Manim-Voiceover allows you to <bookmark mark='A'/>trigger
            animations <bookmark mark='B'/>at any word in the middle of a sentence by
            adding simple <bookmark mark='C'/>bookmarks to your text."""
        ) as tracker:
            self.wait_until_bookmark("A")

            self.play(
                Write(blist[0]), run_time=tracker.time_until_bookmark("B", limit=1)
            )
            self.wait_until_bookmark("B")
            self.play(
                Write(blist[1]), run_time=tracker.time_until_bookmark("C", limit=1)
            )
            self.wait_until_bookmark("C")
            self.play(Write(blist[2]))

        self.play(FadeOut(blist))

        sentence = Tex(
            r"\texttt{``The quick brown fox <bookmark mark=\textquotesingle A\textquotesingle/>jumps\\over the lazy dog.''}"
        )
        xml_tag = sentence[0][18:37]
        xml_tag_box = SurroundingRectangle(xml_tag, color=MAROON)

        with self.voiceover(
            text="You simply add an <bookmark mark='A'/>XML tag to where you want to trigger the animation."
        ) as tracker:
            self.play(Write(sentence), run_time=tracker.time_until_bookmark("A"))
            self.play(xml_tag.animate.set_color(MAROON), Create(xml_tag_box))

        fox = Text("Fox")
        fox = VGroup(fox, SurroundingRectangle(fox, color=WHITE)).shift(
            3 * DOWN + 2 * LEFT
        )

        dog = Text("Dog")
        dog = VGroup(dog, SurroundingRectangle(dog, color=WHITE)).shift(3 * DOWN)

        path_arc = Arc(radius=2, angle=TAU / 2, arc_center=dog.get_center()).flip()
        with self.voiceover(
            text="Let's see it in action. The quick brown fox <bookmark mark='A'/>jumps <bookmark mark='B'/>over the lazy dog."
        ) as tracker:
            self.play(FadeIn(fox, dog))
            self.wait_until_bookmark("A")
            self.play(
                MoveAlongPath(fox, path_arc), run_time=tracker.time_until_bookmark("B")
            )

        with self.voiceover(
            text="The timing of that animation was computed implicitly using the output from the text-to-speech engine."
        ) as tracker:
            pass

        s32s_text = Tex("Supercalifragilisticexpialidocious", font_size=72)
        super_text = s32s_text[0][:5]
        cali_text = s32s_text[0][5:9]
        fragilistic_text = s32s_text[0][9:20]
        expiali_text = s32s_text[0][20:27]
        docious_text = s32s_text[0][27:]

        with self.voiceover(
            text="But we can go even finer than that, down to the syllable level. <bookmark mark='A'/>See how we sync the animations as we recite the word that you see on your screen."
        ) as tracker:
            self.play(FadeOut(fox, dog, sentence, xml_tag_box))
            self.wait_until_bookmark("A")
            self.play(Write(s32s_text), run_time=tracker.get_remaining_duration())

        with self.voiceover(
            text="Super<bookmark mark='A'/>cali<bookmark mark='B'/>fragilistic<bookmark mark='C'/>expiali<bookmark mark='D'/>docious."
        ) as tracker:
            self.play(
                super_text.animate.set_color(RED),
                run_time=tracker.time_until_bookmark("A"),
            )
            self.play(
                cali_text.animate.set_color(ORANGE),
                run_time=tracker.time_until_bookmark("B"),
            )
            self.play(
                fragilistic_text.animate.set_color(YELLOW),
                run_time=tracker.time_until_bookmark("C"),
            )
            self.play(
                expiali_text.animate.set_color(GREEN),
                run_time=tracker.time_until_bookmark("D"),
            )
            self.play(
                docious_text.animate.set_color(BLUE),
                run_time=tracker.get_remaining_duration(),
            )

        with self.voiceover(
            text="""To sync animations with syllables, we do linear interpolation,
            as the output from the text-to-speech engine is not that fine yet."""
        ) as tracker:
            self.safe_wait(tracker.get_remaining_duration() - 1)
            self.play(FadeOut(s32s_text))

        self.wait()


with tempconfig({"quality": "low_quality", "preview": False, "progress_bar" : 'none', "disable_caching" : True}):
	scene = BookmarkExample()
	scene.render()

Additional media files

Images/GIFs

BookmarkExample.mp4

Logs

Terminal output

Manim Community v0.17.3

[05/12/23 12:21:32] WARNING  TimeInterpolator received weird input,    tracker.py:30
                             there may be something wrong with the                  
                             word boundaries.                                       
                    DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000']                                     
                    INFO     Animation 0 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00000.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001']                                      
[05/12/23 12:21:33] INFO     Animation 1 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00001.mp4'                                            
                    DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002']                                      
                    INFO     Animation 2 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00002.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003']                                      
                    INFO     Animation 3 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00003.mp4'                                            
                    DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 4 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00004.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 5 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00005.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
[05/12/23 12:21:34] INFO     Animation 6 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00006.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
[05/12/23 12:21:35] INFO     Animation 7 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00007.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
[05/12/23 12:21:36] INFO     Animation 8 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00008.mp4'                                            
                    DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 9 : Partial movie    scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00009.mp4'                                            
                    WARNING  TimeInterpolator received weird input,    tracker.py:30
                             there may be something wrong with the                  
                             word boundaries.                                       
                    WARNING  TimeInterpolator received weird input,    tracker.py:30
                             there may be something wrong with the                  
                             word boundaries.                                       
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 10 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00010.mp4'                                            
                    DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 11 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00011.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 12 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00012.mp4'                                            
                    DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
[05/12/23 12:21:37] INFO     Animation 13 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00013.mp4'                                            
                    DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 14 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00014.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
[05/12/23 12:21:38] INFO     Animation 15 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00015.mp4'                                            
                    DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 16 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00016.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
[05/12/23 12:21:39] INFO     Animation 17 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00017.mp4'                                            
                    WARNING  TimeInterpolator received weird input,    tracker.py:30
                             there may be something wrong with the                  
                             word boundaries.                                       
                    WARNING  TimeInterpolator received weird input,    tracker.py:30
                             there may be something wrong with the                  
                             word boundaries.                                       
                    WARNING  TimeInterpolator received weird input,    tracker.py:30
                             there may be something wrong with the                  
                             word boundaries.                                       
                    WARNING  TimeInterpolator received weird input,    tracker.py:30
                             there may be something wrong with the                  
                             word boundaries.                                       
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
[05/12/23 12:21:40] INFO     Animation 18 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00018.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 19 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00019.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 20 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00020.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 21 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00021.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 22 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00022.mp4'                                            
[05/12/23 12:21:41] DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 23 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00023.mp4'                                            
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 24 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00024.mp4'                                            
                    DEBUG    Animation with empty mobject           animation.py:174
                    INFO     Caching disabled.                  cairo_renderer.py:68
                    DEBUG    List of the first few animation    cairo_renderer.py:87
                             hashes of the scene:                                   
                             ['uncached_00000',                                     
                             'uncached_00001',                                      
                             'uncached_00002',                                      
                             'uncached_00003',                                      
                             'uncached_00004']                                      
                    INFO     Animation 25 : Partial movie   scene_file_writer.py:527
                             file written in                                        
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00025.mp4'                                            
                    INFO     Combining to Movie file.       scene_file_writer.py:617
                    DEBUG    Partial movie files to combine scene_file_writer.py:561
                             (26 files):                                            
                             ['/Users/bantje/manim-test/med                         
                             ia/videos/480p15/partial_movie                         
                             _files/BookmarkExample/uncache                         
                             d_00000.mp4',                                          
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00001.mp4',                                           
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00002.mp4',                                           
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00003.mp4',                                           
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/partial_movie_                         
                             files/BookmarkExample/uncached                         
                             _00004.mp4']                                           
[05/12/23 12:21:44] INFO                                    scene_file_writer.py:736
                             File ready at                                          
                             '/Users/bantje/manim-test/medi                         
                             a/videos/480p15/BookmarkExampl                         
                             e.mp4'                                                 
                                                                                    
                    INFO     Subcaption file has been       scene_file_writer.py:731
                             written as                                             
                             /Users/bantje/manim-test/media                         
                             /videos/480p15/BookmarkExample                         
                             .srt                                                   
                    INFO     Rendered BookmarkExample                   scene.py:241
                             Played 26 animations                                   
R

System specifications

System Details

macOS 10.15 (Catalina)
RAM: 32 GB
Python 3.11
Installed modules (provide output from pip list):

Package                              Version
------------------------------------ --------
appnope                              0.1.3
asttokens                            2.2.1
azure-cognitiveservices-speech       1.28.0
backcall                             0.2.0
certifi                              2023.5.7
charset-normalizer                   3.1.0
click                                8.1.3
click-default-group                  1.2.2
cloup                                0.13.1
colour                               0.1.5
comm                                 0.1.3
Cython                               0.29.34
debugpy                              1.6.7
decorator                            5.1.1
executing                            1.2.0
ffmpeg-python                        0.2.0
filelock                             3.12.0
fsspec                               2023.5.0
future                               0.18.3
glcontext                            2.3.7
gTTS                                 2.3.2
huggingface-hub                      0.14.1
humanhash3                           0.0.6
idna                                 3.4
ipykernel                            6.23.0
ipython                              8.12.2
isosurfaces                          0.1.0
jedi                                 0.18.2
Jinja2                               3.1.2
jupyter_client                       8.2.0
jupyter_core                         5.3.0
manim                                0.17.3
manim-voiceover                      0.3.0
ManimPango                           0.4.3
mapbox-earcut                        1.0.1
markdown-it-py                       2.2.0
MarkupSafe                           2.1.2
matplotlib-inline                    0.1.6
mdurl                                0.1.2
moderngl                             5.8.2
moderngl-window                      2.4.3
more-itertools                       9.1.0
mpmath                               1.3.0
multipledispatch                     0.6.0
mutagen                              1.46.0
nest-asyncio                         1.5.6
networkx                             2.8.8
numpy                                1.24.3
openai-whisper                       20230117
packaging                            23.1
parso                                0.8.3
pexpect                              4.8.0
pickleshare                          0.7.5
Pillow                               9.5.0
pip                                  23.1.2
platformdirs                         3.5.0
prompt-toolkit                       3.0.38
psutil                               5.9.5
ptyprocess                           0.7.0
pure-eval                            0.2.2
PyAudio                              0.2.13
pycairo                              1.23.0
pydub                                0.25.1
pyglet                               2.0.7
Pygments                             2.15.1
pynput                               1.7.6
pyobjc-core                          9.1.1
pyobjc-framework-ApplicationServices 9.1.1
pyobjc-framework-Cocoa               9.1.1
pyobjc-framework-Quartz              9.1.1
pyrr                                 0.10.3
python-dateutil                      2.8.2
python-dotenv                        0.21.1
PyYAML                               6.0
pyzmq                                25.0.2
regex                                2023.5.5
requests                             2.30.0
rich                                 13.3.5
scipy                                1.10.1
screeninfo                           0.8.1
setuptools                           67.6.1
six                                  1.16.0
skia-pathops                         0.7.4
sox                                  1.4.1
srt                                  3.5.3
stable-ts                            1.1.0
stack-data                           0.6.2
svgelements                          1.9.3
sympy                                1.12
tokenizers                           0.13.3
torch                                2.0.1
torchaudio                           2.0.2
tornado                              6.3.1
tqdm                                 4.65.0
traitlets                            5.9.0
transformers                         4.29.0
typing_extensions                    4.5.0
urllib3                              2.0.2
watchdog                             2.3.1
wcwidth                              0.2.6
wheel                                0.40.0

LaTeX details

LaTeX distribution: TeX Live 2023

FFMPEG

Output of ffmpeg -version:

ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
built with Apple clang version 14.0.3 (clang-1403.0.22.14.1)
configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
libavutil      58.  2.100 / 58.  2.100
libavcodec     60.  3.100 / 60.  3.100
libavformat    60.  3.100 / 60.  3.100
libavdevice    60.  1.100 / 60.  1.100
libavfilter     9.  3.100 /  9.  3.100
libswscale      7.  1.100 /  7.  1.100
libswresample   4. 10.100 /  4. 10.100
libpostproc    57.  1.100 / 57.  1.100

Additional comments

This behaviour hints at inconsistencies on the Azure side of things or an issue in parsing its output.

I would also be interested in possible workarounds:

A list of voices that work properly, if this information exists – I am not that keen on using my free account to try them all out 😬
Some way to use whisper to override the word boundaries provided by Azure(?) – adding transcription_model='base' had no effect though.

Installing multiple stable-ts versions and struck in a long loop

Preliminaries

I have followed the latest version of the
installation instructions.

Description of error

These statements are repeated multiple times with some INFO and Warnings in between.

INFO: pip is looking at multiple versions of stable-ts to determine which version is compatible with other requirements. This could take a while.
Collecting stable-ts<3.0.0,>=2.6.2 (from manim-voiceover[transcribe])
Using cached stable-ts-2.15.10.tar.gz (144 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Using cached stable-ts-2.15.9.tar.gz (142 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Using cached stable-ts-2.15.8.tar.gz (142 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Using cached stable-ts-2.15.7.tar.gz (142 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Using cached stable-ts-2.15.6.tar.gz (139 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Using cached stable-ts-2.15.5.tar.gz (138 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Using cached stable-ts-2.15.4.tar.g

"INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. "

Installation logs

Terminal output

PASTE HERE OR PROVIDE LINK TO https://pastebin.com/ OR SIMILAR

System specifications

Using WSL2 in windows

OS (Ubuntu ): Ubuntu 20.04.6 LTS
RAM: 16GB
Python version (python/py/python3 --version): Python 3.8.10
Installed modules (provide output from pip list):

Package                        Version
------------------------------ -----------
annotated-types                0.6.0
asttokens                      2.4.1
azure-cognitiveservices-speech 1.36.0
backcall                       0.2.0
certifi                        2024.2.2
charset-normalizer             3.3.2
click                          8.1.7
click-default-group            1.2.4
cloup                          2.1.2
decorator                      5.1.1
elevenlabs                     0.2.27
executing                      2.0.1
glcontext                      2.5.0
gTTS                           2.5.1
idna                           3.6
ipython                        8.12.3
isosurfaces                    0.1.0
jedi                           0.19.1
lit                            18.1.2
manim                          0.18.0
manim-voiceover                0.3.6.post0
ManimPango                     0.5.0
mapbox-earcut                  1.0.1
markdown-it-py                 3.0.0
matplotlib-inline              0.1.6
mdurl                          0.1.2
moderngl                       5.10.0
moderngl-window                2.4.4
mpmath                         1.3.0
multipledispatch               1.0.0
mutagen                        1.47.0
networkx                       3.1
numpy                          1.24.4
nvidia-cuda-nvrtc-cu11         11.7.99
nvidia-cufft-cu11              10.9.0.58
nvidia-nccl-cu11               2.14.3
packaging                      24.0
parso                          0.8.4
pexpect                        4.9.0
pickleshare                    0.7.5
Pillow                         9.5.0
pip                            24.0
pkg_resources                  0.0.0
prompt-toolkit                 3.0.43
ptyprocess                     0.7.0
pure-eval                      0.2.2
PyAudio                        0.2.14
pycairo                        1.26.0
pydantic                       2.6.4
pydantic_core                  2.16.3
pydub                          0.25.1
pyglet                         2.0.15
Pygments                       2.17.2
pyrr                           0.10.3
python-dotenv                  0.21.1
python-slugify                 8.0.4
PyYAML                         6.0.1
regex                          2023.12.25
requests                       2.31.0
rich                           13.7.1
safetensors                    0.4.2
scipy                          1.10.1
screeninfo                     0.8.1
setuptools                     44.0.0
six                            1.16.0
skia-pathops                   0.7.4
sox                            1.5.0
srt                            3.5.3
stack-data                     0.6.3
svgelements                    1.9.6
sympy                          1.12
text-unidecode                 1.3
tqdm                           4.66.2
traitlets                      5.14.2
typing_extensions              4.11.0
urllib3                        2.2.1
watchdog                       3.0.0
wcwidth                        0.2.13
websockets                     12.0
wheel                          0.43.0
zipp                           3.18.1

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 31.100 / 56. 31.100
libavcodec     58. 54.100 / 58. 54.100
libavformat    58. 29.100 / 58. 29.100
libavdevice    58.  8.100 / 58.  8.100
libavfilter     7. 57.100 /  7. 57.100
libavresample   4.  0.  0 /  4.  0.  0
libswscale      5.  5.100 /  5.  5.100
libswresample   3.  5.100 /  3.  5.100
libpostproc    55.  5.100 / 55.  5.100

Additional comments

Incompatibilities due to new openai-whisper version

Preliminaries

I have followed the latest version of the
installation instructions.

Description of error

I believe that it has become impossible to use transcripition with openai-whisper on macOS (and presumably Windows) in the latest version of manim-voiceover.

Let me explain: Recently the transcription module was updated to be compatible with openai-whisper>20230306. This whisper version adds a dependency on openai's triton, which unfortunately has only manylinux wheels, as you can see here. Hence, on non-Linux machines manually building triton (in version >=2.0.0) is the only option to get the transcription part of manim-voiceover to work.

To me personally this means, that I will stick to the Azure service or even use two different python environments (I wanted to use GTTS for early versions of my next video because its free).

Installation logs

Terminal output

poetry error when setting up the environment:

Unable to find installation candidates for triton (2.0.0.post1)

System specifications

System Details

OS: macoS 13.3.1
RAM: 64 GB
Python version (python/py/python3 --version): 3.11

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

PASTE HERE

Additional comments

CouldntEncodeError: Encoding failed

Description of bug / unexpected behavior

I took the following example from the VoiceOver Website:


class MyScene(VoiceoverScene):

    def construct(self):
        self.set_speech_service(RecorderService( ))
        with self.voiceover(text="This circle is drawn as I speak.") as tracker:
            self.play(Create(circle), run_time=tracker.duration))

I then ran manim -pqh myfile.py MyScene --disable_caching. I was requested to chose from which input device to record. I chose "default" (13). I recorded my voice as instructed, holding the 'r' key.

Upon finishing my recording, the following message appeared on the console:

Finished recording, saving to media/voiceovers/alaska-venus-montana-robin.mp3
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim/cli/render/commands.py:115 in render                                                  │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim/scene/scene.py:223 in render                                                          │
│                                                                                                  │
│    220 │   │   """                                                                               │
│    221 │   │   self.setup()                                                                      │
│    222 │   │   try:                                                                              │
│ ❱  223 │   │   │   self.construct()                                                              │
│    224 │   │   except EndSceneEarlyException:                                                    │
│    225 │   │   │   pass                                                                          │
│    226 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ /home/santiago/repos/manim/intro.py:29 in construct                                              │
│                                                                                                  │
│    26 │                                                                                          │
│    27 │   def construct(self):                                                                   │
│    28 │   │   self.set_speech_service(RecorderService(format=1, channels=128, chunk=1024, tran   │
│ ❱  29 │   │   with self.voiceover(text="This circle is drawn as I speak.") as tracker:           │
│    30 │   │   │   self.play(Create(circle), run_time=tracker.duration)                           │
│    31 │   │   v = [r"\{a\}", r"\{b\}", r"\{a, b\}", r"\{a, b, c\}", r"\{a, b, c, f, g\}",        │
│    32 │   │   │    r"\{f\}", r"\{f, g\}"]                                                        │
│                                                                                                  │
│ /usr/lib/python3.11/contextlib.py:137 in __enter__                                               │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/voiceover_scene.py:180 in voiceover                                         │
│                                                                                                  │
│   177 │   │                                                                                      │
│   178 │   │   try:                                                                               │
│   179 │   │   │   if text is not None:                                                           │
│ ❱ 180 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   181 │   │   │   elif ssml is not None:                                                         │
│   182 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   183 │   │   finally:                                                                           │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/voiceover_scene.py:63 in add_voiceover_text                                 │
│                                                                                                  │
│    60 │   │   │   │   "You need to call init_voiceover() before adding a voiceover."             │
│    61 │   │   │   )                                                                              │
│    62 │   │                                                                                      │
│ ❱  63 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│    64 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    65 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    66 │   │   self.current_tracker = tracker                                                     │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/base.py:85 in _wrap_generate_from_text                             │
│                                                                                                  │
│    82 │   │   # Replace newlines with lines, reduce multiple consecutive spaces to single        │
│    83 │   │   text = " ".join(text.split())                                                      │
│    84 │   │                                                                                      │
│ ❱  85 │   │   dict_ = self.generate_from_text(text, cache_dir=None, path=path, **kwargs)         │
│    86 │   │   original_audio = dict_["original_audio"]                                           │
│    87 │   │                                                                                      │
│    88 │   │   # Check whether word boundaries exist and if not run stt                           │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/__init__.py:101 in generate_from_text                     │
│                                                                                                  │
│    98 │   │                                                                                      │
│    99 │   │   self.recorder._trigger_set_device()                                                │
│   100 │   │   box = msg_box("Voiceover:\n\n" + input_text)                                       │
│ ❱ 101 │   │   self.recorder.record(str(Path(cache_dir) / audio_path), box)                       │
│   102 │   │                                                                                      │
│   103 │   │   json_dict = {                                                                      │
│   104 │   │   │   "input_text": text,                                                            │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/utility.py:225 in record                                  │
│                                                                                                  │
│   222 │   def record(self, path: str, message: str = None):                                      │
│   223 │   │   if message is not None:                                                            │
│   224 │   │   │   print(message)                                                                 │
│ ❱ 225 │   │   self._record(path)                                                                 │
│   226 │   │                                                                                      │
│   227 │   │   while True:                                                                        │
│   228 │   │   │   print(                                                                         │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/utility.py:110 in _record                                 │
│                                                                                                  │
│   107 │   │   self.event = self.task.enter(                                                      │
│   108 │   │   │   self.callback_delay, 1, self._record_task, ([path])                            │
│   109 │   │   )                                                                                  │
│ ❱ 110 │   │   self.task.run()                                                                    │
│   111 │   │                                                                                      │
│   112 │   │   return                                                                             │
│   113                                                                                            │
│                                                                                                  │
│ /usr/lib/python3.11/sched.py:151 in run                                                          │
│                                                                                                  │
│   148 │   │   │   │   │   return time - now                                                      │
│   149 │   │   │   │   delayfunc(time - now)                                                      │
│   150 │   │   │   else:                                                                          │
│ ❱ 151 │   │   │   │   action(*argument, **kwargs)                                                │
│   152 │   │   │   │   delayfunc(0)   # Let other threads run                                     │
│   153 │                                                                                          │
│   154 │   @property                                                                              │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/services/recorder/utility.py:208 in _record_task                            │
│                                                                                                  │
│   205 │   │   │   │   buffer_start=self.trim_buffer_start,                                       │
│   206 │   │   │   │   buffer_end=self.trim_buffer_end,                                           │
│   207 │   │   │   ).export(wav_path, format="wav")                                               │
│ ❱ 208 │   │   │   wav2mp3(wav_path)                                                              │
│   209 │   │   │                                                                                  │
│   210 │   │   │   for e in self.task._queue:                                                     │
│   211 │   │   │   │   self.task.cancel(e)                                                        │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/manim_voiceover/helper.py:31 in wav2mp3                                                     │
│                                                                                                  │
│    28 │   │   mp3_path = Path(wav_path).with_suffix(".mp3")                                      │
│    29 │                                                                                          │
│    30 │   # Convert to mp3                                                                       │
│ ❱  31 │   AudioSegment.from_wav(wav_path).export(mp3_path, format="mp3", bitrate=bitrate)        │
│    32 │                                                                                          │
│    33 │   if remove_wav:                                                                         │
│    34 │   │   # Remove the .wav file                                                             │
│                                                                                                  │
│ /home/santiago/.local/share/venvs/83241af131ae2bea5e060955fe3fb67f/venv/lib/python3.11/site-pack │
│ ages/pydub/audio_segment.py:970 in export                                                        │
│                                                                                                  │
│    967 │   │   log_subprocess_output(p_err)                                                      │
│    968 │   │                                                                                     │
│    969 │   │   if p.returncode != 0:                                                             │
│ ❱  970 │   │   │   raise CouldntEncodeError(                                                     │
│    971 │   │   │   │   "Encoding failed. ffmpeg/avlib returned error code: {0}\n\nCommand:{1}\n  │
│    972 │   │   │   │   │   p.returncode, conversion_command, p_err.decode(errors='ignore') ))    │
│    973                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CouldntEncodeError: Encoding failed. ffmpeg/avlib returned error code: 1

Command:['ffmpeg', '-y', '-f', 'wav', '-i', '/tmp/tmp8ska799i', '-b:a', '312k', '-f', 'mp3', '/tmp/tmpnp4_0fh8']

Output from ffmpeg/avlib:

ffmpeg version 5.1.2-3ubuntu1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12 (Ubuntu 12.2.0-14ubuntu2)
  configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa
--enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang
--enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband
--enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp
--enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl
--enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  WARNING: library configuration mismatch
  avfilter    configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa
--enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang
--enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband
--enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp
--enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl
--enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared --enable-version3
--disable-doc --disable-programs --enable-libaribb24 --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc --enable-libsmbclient
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Input #0, wav, from '/tmp/tmp8ska799i':
  Duration: 00:00:01.67, bitrate: 90317 kb/s
  Stream #0:0: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 44100 Hz, 64 channels, s32, 90316 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s32le (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
[auto_aresample_0 @ 0x559814942200] [SWR @ 0x559814942380] Rematrix is needed between 64 channels and stereo but there is not enough information to do it
[auto_aresample_0 @ 0x559814942200] Failed to configure output pad on auto_aresample_0
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Conversion failed!

I run Ubuntu 23.04. All dependencies were installed and are up-to-date.

Expected behavior

After executing manim -pqh myfile.py MyScene --disable_caching and recording my voice with my (functioning) microphone, I expected the recording to be succesfully embedded to the video and an .mp4 file to be outputted with the recording.

How to reproduce the issue

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.recorder import RecorderService

class MyScene(VoiceoverScene):


    def construct(self):
        self.set_speech_service(RecorderService( ))
        with self.voiceover(text="This circle is drawn as I speak.") as tracker:
            self.play(Create(circle), run_time=tracker.duration)

## System specifications

System Details

OS: Ubuntu 23.04
RAM: 16 GB
Python version 3.11.2
Installed modules (provide output from pip list):

Package                        Version
------------------------------ ----------
azure-cognitiveservices-speech 1.28.0
build                          0.10.0
certifi                        2022.12.7
charset-normalizer             3.1.0
click                          8.1.3
click-default-group            1.2.2
cloup                          0.13.1
cmake                          3.26.3
colour                         0.1.5
decorator                      5.1.1
docstring-to-markdown          0.12
evdev                          1.6.1
ffmpeg-python                  0.2.0
filelock                       3.12.0
fsspec                         2023.4.0
future                         0.18.3
glcontext                      2.3.7
greenlet                       2.0.2
gTTS                           2.3.2
huggingface-hub                0.14.1
humanhash3                     0.0.6
idna                           3.4
isosurfaces                    0.1.0
jedi                           0.17.2
Jinja2                         3.1.2
lit                            16.0.2
llvmlite                       0.40.0
manim                          0.17.3
manim-voiceover                0.3.0
ManimPango                     0.4.3
mapbox-earcut                  1.0.1
markdown-it-py                 2.2.0
MarkupSafe                     2.1.2
mdurl                          0.1.2
moderngl                       5.8.2
moderngl-window                2.4.3
more-itertools                 9.1.0
mpmath                         1.3.0
msgpack                        1.0.5
multipledispatch               0.6.0
mutagen                        1.46.0
networkx                       2.8.8
numba                          0.57.0
numpy                          1.24.3
nvidia-cublas-cu11             11.10.3.66
nvidia-cuda-cupti-cu11         11.7.101
nvidia-cuda-nvrtc-cu11         11.7.99
nvidia-cuda-runtime-cu11       11.7.99
nvidia-cudnn-cu11              8.5.0.96
nvidia-cufft-cu11              10.9.0.58
nvidia-curand-cu11             10.2.10.91
nvidia-cusolver-cu11           11.4.0.1
nvidia-cusparse-cu11           11.7.4.91
nvidia-nccl-cu11               2.14.3
nvidia-nvtx-cu11               11.7.91
openai-whisper                 20230314
packaging                      23.1
parso                          0.7.1
Pillow                         9.5.0
pip                            23.1.2
pip-tools                      6.13.0
playsound                      1.3.0
pluggy                         1.0.0
PyAudio                        0.2.13
pycairo                        1.23.0
pydub                          0.25.1
pyglet                         2.0.5
Pygments                       2.15.1
pynput                         1.7.6
pynvim                         0.4.3
pyproject_hooks                1.0.0
pyrr                           0.10.3
python-dotenv                  0.21.1
python-jsonrpc-server          0.4.0
python-language-server         0.36.2
python-lsp-jsonrpc             1.0.0
python-lsp-server              1.7.2
python-xlib                    0.33
PyYAML                         6.0
regex                          2023.5.5
requests                       2.29.0
rich                           13.3.5
scipy                          1.10.1
screeninfo                     0.8.1
setuptools                     66.1.1
six                            1.16.0
skia-pathops                   0.7.4
sox                            1.4.1
srt                            3.5.3
stable-ts                      2.5.3
svgelements                    1.9.3
sympy                          1.11.1
tiktoken                       0.3.1
tokenizers                     0.13.3
torch                          2.0.0
torchaudio                     2.0.1
tqdm                           4.65.0
transformers                   4.28.1
triton                         2.0.0
typing_extensions              4.5.0
ujson                          5.7.0
urllib3                        1.26.15
watchdog                       2.3.1
wheel                          0.40.0

FFMPEG

Output of ffmpeg -version:

ffmpeg version 5.1.2-3ubuntu1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12 (Ubuntu 12.2.0-14ubuntu2)
configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
libavutil      57. 28.100 / 57. 28.100
libavcodec     59. 37.100 / 59. 37.100
libavformat    59. 27.100 / 59. 27.100
libavdevice    59.  7.100 / 59.  7.100
libavfilter     8. 44.100 /  8. 44.100
libswscale      6.  7.100 /  6.  7.100
libswresample   4.  7.100 /  4.  7.100
libpostproc    56.  6.100 / 56.  6.100

Aditional comments

Choosing HDA Intel PCH: ALC897 Analog or HDA Intel PCH: ALC897 Alt Analog as input devices, instead of default, did not produce the same issue. However, the recordings were of terrible quality (not a microphone issue, tested the same microphone on an online recorder and had good quality).

Do not add sounds when skip_animations is true

Description of proposed feature

When adding

self.next_section(skip_animations=True)

before a voiceover, the voice over is still generated and played which end up in weird result.

When working on a animation, I use the sections to focus on my current part of the scene. Adding the voiceover breaks this workflow.

rendering the Azure Example give this error

when I run the following command I get the next error
manim -pql azure-example.py --disable_caching

Error details: USP error: timeout waiting for the first audio chunk
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Python39\lib\site-packages\manim\cli\render\commands.py:121 in render                         │
│                                                                                                  │
│   118 │   │   │   try:                                                                           │
│   119 │   │   │   │   with tempconfig(config):                                                   │
│   120 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 121 │   │   │   │   │   scene.render()                                                         │
│   122 │   │   │   except Exception:                                                              │
│   123 │   │   │   │   error_console.print_exception()                                            │
│   124 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ C:\Python39\lib\site-packages\manim\scene\scene.py:222 in render                                 │
│                                                                                                  │
│    219 │   │   """                                                                               │
│    220 │   │   self.setup()                                                                      │
│    221 │   │   try:                                                                              │
│ ❱  222 │   │   │   self.construct()                                                              │
│    223 │   │   except EndSceneEarlyException:                                                    │
│    224 │   │   │   pass                                                                          │
│    225 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ G:\1-PHD MAIN\manim-speech\new manim voiceover\manim-voiceover\examples\azure-example.py:18 in   │
│ construct                                                                                        │
│                                                                                                  │
│   15 │   │   circle = Circle()                                                                   │
│   16 │   │   square = Square().shift(2 * RIGHT)                                                  │
│   17 │   │                                                                                       │
│ ❱ 18 │   │   with self.voiceover(text="This circle is drawn as I speak.") as tracker:            │
│   19 │   │   │   self.play(Create(circle), run_time=tracker.duration)                            │
│   20 │   │                                                                                       │
│   21 │   │   with self.voiceover(text="Let's shift it to the left 2 units.") as tracker:         │
│                                                                                                  │
│ C:\Python39\lib\contextlib.py:117 in __enter__                                                   │
│                                                                                                  │
│   114 │   │   # they are only needed for recreation, which is not possible anymore               │
│   115 │   │   del self.args, self.kwds, self.func                                                │
│   116 │   │   try:                                                                               │
│ ❱ 117 │   │   │   return next(self.gen)                                                          │
│   118 │   │   except StopIteration:                                                              │
│   119 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   120                                                                                            │
│                                                                                                  │
│ C:\Python39\lib\site-packages\manim_voiceover\__init__.py:207 in voiceover                       │
│                                                                                                  │
│   204 │   │                                                                                      │
│   205 │   │   try:                                                                               │
│   206 │   │   │   if text is not None:                                                           │
│ ❱ 207 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   208 │   │   │   elif ssml is not None:                                                         │
│   209 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   210 │   │   finally:                                                                           │
│                                                                                                  │
│ C:\Python39\lib\site-packages\manim_voiceover\__init__.py:118 in add_voiceover_text              │
│                                                                                                  │
│   115 │   │   │   │   "You need to call init_voiceover() before adding a voiceover."             │
│   116 │   │   │   )                                                                              │
│   117 │   │                                                                                      │
│ ❱ 118 │   │   dict_ = self.speech_service.synthesize_from_text(text, **kwargs)                   │
│   119 │   │   tracker = VoiceoverTracker(self, dict_["json_path"])                               │
│   120 │   │   self.add_sound(dict_["final_audio"])                                               │
│   121 │   │   self.current_tracker = tracker                                                     │
│                                                                                                  │
│ C:\Python39\lib\site-packages\manim_voiceover\services\base.py:27 in synthesize_from_text        │
│                                                                                                  │
│   24 │   │   # Replace newlines with lines, reduce multiple consecutive spaces to single         │
│   25 │   │   text = " ".join(text.split())                                                       │
│   26 │   │                                                                                       │
│ ❱ 27 │   │   dict_ = self.generate_from_text(text, output_dir=None, path=path, **kwargs)         │
│   28 │   │   # path = dict_["original_audio"]                                                    │
│   29 │   │   # import ipdb; ipdb.set_trace()                                                     │
│   30                                                                                             │
│                                                                                                  │
│ C:\Python39\lib\site-packages\manim_voiceover\services\azure.py:159 in generate_from_text        │
│                                                                                                  │
│   156 │   │   │   │   │   print(                                                                 │
│   157 │   │   │   │   │   │   "Error details: {}".format(cancellation_details.error_details)     │
│   158 │   │   │   │   │   )                                                                      │
│ ❱ 159 │   │   │   raise Exception("Speech synthesis failed")                                     │
│   160 │   │                                                                                      │
│   161 │   │   return json_dict                                                                   │
│   162                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: Speech synthesis failed

When I run the gtts example it works

Global Pitch

Description of proposed feature

Sometimes we want to configure the global pitch used in each voiceover piece

How can the new feature be used?

setting the pitch globally so we can avoid repeating prosody in each voiceover step

Numpy ValueError while running RecorderService example

Description of bug / unexpected behavior

I am trying to run the basic usage example of manim-voiceover given at https://docs.manim.community/en/stable/guides/add_voiceovers.html . When I try to run it, I get the following error:

$ manim -pql voice_over.py --disable_caching
Manim Community v0.17.2

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim/cli/render/c │
│ ommands.py:115 in render                                                                         │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim/scene/scene. │
│ py:223 in render                                                                                 │
│                                                                                                  │
│    220 │   │   """                                                                               │
│    221 │   │   self.setup()                                                                      │
│    222 │   │   try:                                                                              │
│ ❱  223 │   │   │   self.construct()                                                              │
│    224 │   │   except EndSceneEarlyException:                                                    │
│    225 │   │   │   pass                                                                          │
│    226 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/circle_test/voice_over.py:17 in construct            │
│                                                                                                  │
│   14 │   │   circle = Circle()                                                                   │
│   15 │   │                                                                                       │
│   16 │   │   # Surround animation sections with with-statements:                                 │
│ ❱ 17 │   │   with self.voiceover(text="This circle is drawn as I speak.") as tracker:            │
│   18 │   │   │   self.play(Create(circle), run_time=tracker.duration)                            │
│   19 │   │   │   # The duration of the animation is received from the audio file                 │
│   20 │   │   │   # and passed to the tracker automatically.                                      │
│                                                                                                  │
│ /usr/lib/python3.10/contextlib.py:135 in __enter__                                               │
│                                                                                                  │
│   132 │   │   # they are only needed for recreation, which is not possible anymore               │
│   133 │   │   del self.args, self.kwds, self.func                                                │
│   134 │   │   try:                                                                               │
│ ❱ 135 │   │   │   return next(self.gen)                                                          │
│   136 │   │   except StopIteration:                                                              │
│   137 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   138                                                                                            │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/vo │
│ iceover_scene.py:180 in voiceover                                                                │
│                                                                                                  │
│   177 │   │                                                                                      │
│   178 │   │   try:                                                                               │
│   179 │   │   │   if text is not None:                                                           │
│ ❱ 180 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   181 │   │   │   elif ssml is not None:                                                         │
│   182 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   183 │   │   finally:                                                                           │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/vo │
│ iceover_scene.py:64 in add_voiceover_text                                                        │
│                                                                                                  │
│    61 │   │   │   )                                                                              │
│    62 │   │                                                                                      │
│    63 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│ ❱  64 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    65 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    66 │   │   self.current_tracker = tracker                                                     │
│    67                                                                                            │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/tr │
│ acker.py:58 in __init__                                                                          │
│                                                                                                  │
│    55 │   │   self.end_t = last_t + self.duration                                                │
│    56 │   │                                                                                      │
│    57 │   │   if "word_boundaries" in self.data:                                                 │
│ ❱  58 │   │   │   self._process_bookmarks()                                                      │
│    59 │                                                                                          │
│    60 │   def _process_bookmarks(self) -> None:                                                  │
│    61 │   │   self.bookmark_times = {}                                                           │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/tr │
│ acker.py:63 in _process_bookmarks                                                                │
│                                                                                                  │
│    60 │   def _process_bookmarks(self) -> None:                                                  │
│    61 │   │   self.bookmark_times = {}                                                           │
│    62 │   │   self.bookmark_distances = {}                                                       │
│ ❱  63 │   │   self.time_interpolator = TimeInterpolator(self.data["word_boundaries"])            │
│    64 │   │   net_text_len = len(remove_bookmarks(self.data["input_text"]))                      │
│    65 │   │   if "transcribed_text" in self.data:                                                │
│    66 │   │   │   transcribed_text_len = len(self.data["transcribed_text"].strip())              │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/tr │
│ acker.py:24 in __init__                                                                          │
│                                                                                                  │
│    21 │   │   │   self.x.append(wb["text_offset"])                                               │
│    22 │   │   │   self.y.append(wb["audio_offset"] / AUDIO_OFFSET_RESOLUTION)                    │
│    23 │   │                                                                                      │
│ ❱  24 │   │   self.f = interp1d(self.x, self.y)                                                  │
│    25 │                                                                                          │
│    26 │   def interpolate(self, distance: int) -> np.ndarray:                                    │
│    27 │   │   try:                                                                               │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/scipy/interpolate/ │
│ _interpolate.py:484 in __init__                                                                  │
│                                                                                                  │
│    481 │   │                                                                                     │
│    482 │   │   # Interpolation goes internally along the first axis                              │
│    483 │   │   self.y = y                                                                        │
│ ❱  484 │   │   self._y = self._reshape_yi(self.y)                                                │
│    485 │   │   self.x = x                                                                        │
│    486 │   │   del y, x  # clean up namespace to prevent misuse; use attributes                  │
│    487 │   │   self._kind = kind                                                                 │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/scipy/interpolate/ │
│ _polyint.py:110 in _reshape_yi                                                                   │
│                                                                                                  │
│   107 │   │   │   ok_shape = "%r + (N,) + %r" % (self._y_extra_shape[-self._y_axis:],            │
│   108 │   │   │   │   │   │   │   │   │   │      self._y_extra_shape[:-self._y_axis])            │
│   109 │   │   │   raise ValueError("Data must be of shape %s" % ok_shape)                        │
│ ❱ 110 │   │   return yi.reshape((yi.shape[0], -1))                                               │
│   111 │                                                                                          │
│   112 │   def _set_yi(self, yi, xi=None, axis=None):                                             │
│   113 │   │   if axis is None:                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: cannot reshape array of size 0 into shape (0,newaxis)

Expected behavior

The example should run correctly and should prompt me for selecting a recording device and then record the audio.

How to reproduce the issue

Code for reproducing the problem

from manim_voiceover import VoiceoverScene
from manim_voiceover.services.recorder import RecorderService


# Simply inherit from VoiceoverScene instead of Scene to get all the
# voiceover functionality.
class RecorderExample(VoiceoverScene):
    def construct(self):
        # You can choose from a multitude of TTS services,
        # or in this example, record your own voice:
        self.set_speech_service(RecorderService())

        circle = Circle()

        # Surround animation sections with with-statements:
        with self.voiceover(text="This circle is drawn as I speak.") as tracker:
            self.play(Create(circle), run_time=tracker.duration)
            # The duration of the animation is received from the audio file
            # and passed to the tracker automatically.

        # This part will not start playing until the previous voiceover is finished.
        with self.voiceover(text="Let's shift it to the left 2 units.") as tracker:
            self.play(circle.animate.shift(2 * LEFT), run_time=tracker.duration)

Additional media files

Images/GIFs

Logs

Terminal output

$ manim -v DEBUG -pql voice_over.py --disable_caching
Manim Community v0.17.2

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim/cli/render/c │
│ ommands.py:115 in render                                                                         │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim/scene/scene. │
│ py:223 in render                                                                                 │
│                                                                                                  │
│    220 │   │   """                                                                               │
│    221 │   │   self.setup()                                                                      │
│    222 │   │   try:                                                                              │
│ ❱  223 │   │   │   self.construct()                                                              │
│    224 │   │   except EndSceneEarlyException:                                                    │
│    225 │   │   │   pass                                                                          │
│    226 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/circle_test/voice_over.py:17 in construct            │
│                                                                                                  │
│   14 │   │   circle = Circle()                                                                   │
│   15 │   │                                                                                       │
│   16 │   │   # Surround animation sections with with-statements:                                 │
│ ❱ 17 │   │   with self.voiceover(text="This circle is drawn as I speak.") as tracker:            │
│   18 │   │   │   self.play(Create(circle), run_time=tracker.duration)                            │
│   19 │   │   │   # The duration of the animation is received from the audio file                 │
│   20 │   │   │   # and passed to the tracker automatically.                                      │
│                                                                                                  │
│ /usr/lib/python3.10/contextlib.py:135 in __enter__                                               │
│                                                                                                  │
│   132 │   │   # they are only needed for recreation, which is not possible anymore               │
│   133 │   │   del self.args, self.kwds, self.func                                                │
│   134 │   │   try:                                                                               │
│ ❱ 135 │   │   │   return next(self.gen)                                                          │
│   136 │   │   except StopIteration:                                                              │
│   137 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   138                                                                                            │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/vo │
│ iceover_scene.py:180 in voiceover                                                                │
│                                                                                                  │
│   177 │   │                                                                                      │
│   178 │   │   try:                                                                               │
│   179 │   │   │   if text is not None:                                                           │
│ ❱ 180 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   181 │   │   │   elif ssml is not None:                                                         │
│   182 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   183 │   │   finally:                                                                           │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/vo │
│ iceover_scene.py:64 in add_voiceover_text                                                        │
│                                                                                                  │
│    61 │   │   │   )                                                                              │
│    62 │   │                                                                                      │
│    63 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│ ❱  64 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    65 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    66 │   │   self.current_tracker = tracker                                                     │
│    67                                                                                            │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/tr │
│ acker.py:58 in __init__                                                                          │
│                                                                                                  │
│    55 │   │   self.end_t = last_t + self.duration                                                │
│    56 │   │                                                                                      │
│    57 │   │   if "word_boundaries" in self.data:                                                 │
│ ❱  58 │   │   │   self._process_bookmarks()                                                      │
│    59 │                                                                                          │
│    60 │   def _process_bookmarks(self) -> None:                                                  │
│    61 │   │   self.bookmark_times = {}                                                           │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/tr │
│ acker.py:63 in _process_bookmarks                                                                │
│                                                                                                  │
│    60 │   def _process_bookmarks(self) -> None:                                                  │
│    61 │   │   self.bookmark_times = {}                                                           │
│    62 │   │   self.bookmark_distances = {}                                                       │
│ ❱  63 │   │   self.time_interpolator = TimeInterpolator(self.data["word_boundaries"])            │
│    64 │   │   net_text_len = len(remove_bookmarks(self.data["input_text"]))                      │
│    65 │   │   if "transcribed_text" in self.data:                                                │
│    66 │   │   │   transcribed_text_len = len(self.data["transcribed_text"].strip())              │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/manim_voiceover/tr │
│ acker.py:24 in __init__                                                                          │
│                                                                                                  │
│    21 │   │   │   self.x.append(wb["text_offset"])                                               │
│    22 │   │   │   self.y.append(wb["audio_offset"] / AUDIO_OFFSET_RESOLUTION)                    │
│    23 │   │                                                                                      │
│ ❱  24 │   │   self.f = interp1d(self.x, self.y)                                                  │
│    25 │                                                                                          │
│    26 │   def interpolate(self, distance: int) -> np.ndarray:                                    │
│    27 │   │   try:                                                                               │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/scipy/interpolate/ │
│ _interpolate.py:484 in __init__                                                                  │
│                                                                                                  │
│    481 │   │                                                                                     │
│    482 │   │   # Interpolation goes internally along the first axis                              │
│    483 │   │   self.y = y                                                                        │
│ ❱  484 │   │   self._y = self._reshape_yi(self.y)                                                │
│    485 │   │   self.x = x                                                                        │
│    486 │   │   del y, x  # clean up namespace to prevent misuse; use attributes                  │
│    487 │   │   self._kind = kind                                                                 │
│                                                                                                  │
│ /home/pranjal/PycharmProjects/CipherCompute/venv/lib/python3.10/site-packages/scipy/interpolate/ │
│ _polyint.py:110 in _reshape_yi                                                                   │
│                                                                                                  │
│   107 │   │   │   ok_shape = "%r + (N,) + %r" % (self._y_extra_shape[-self._y_axis:],            │
│   108 │   │   │   │   │   │   │   │   │   │      self._y_extra_shape[:-self._y_axis])            │
│   109 │   │   │   raise ValueError("Data must be of shape %s" % ok_shape)                        │
│ ❱ 110 │   │   return yi.reshape((yi.shape[0], -1))                                               │
│   111 │                                                                                          │
│   112 │   def _set_yi(self, yi, xi=None, axis=None):                                             │
│   113 │   │   if axis is None:                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: cannot reshape array of size 0 into shape (0,newaxis)

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Linux Ubuntu 22.04.1
RAM: 16GB
Python version (python/py/python3 --version): 3.10.6
Installed modules (provide output from pip list):

Package                  Version
------------------------ -----------
certifi                  2022.12.7
charset-normalizer       2.1.1
click                    8.1.3
click-default-group      1.2.2
cloup                    0.13.1
colour                   0.1.5
commonmark               0.9.1
decorator                5.1.1
evdev                    1.6.0
ffmpeg-python            0.2.0
filelock                 3.9.0
future                   0.18.2
glcontext                2.3.7
huggingface-hub          0.11.1
humanhash3               0.0.6
idna                     3.4
isosurfaces              0.1.0
manim                    0.17.2
manim-voiceover          0.2.1.post1
ManimPango               0.4.3
mapbox-earcut            1.0.1
moderngl                 5.7.4
moderngl-window          2.4.2
more-itertools           9.0.0
multipledispatch         0.6.0
mutagen                  1.46.0
networkx                 2.8.8
numpy                    1.24.1
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
packaging                22.0
Pillow                   9.3.0
pip                      22.3.1
playsound                1.3.0
PyAudio                  0.2.13
pycairo                  1.23.0
pydub                    0.25.1
pyglet                   2.0.2.1
Pygments                 2.13.0
PyGObject                3.42.2
pynput                   1.7.6
pyrr                     0.10.3
python-dotenv            0.21.0
python-xlib              0.33
PyYAML                   6.0
regex                    2022.10.31
requests                 2.28.1
rich                     12.6.0
scipy                    1.9.3
screeninfo               0.8.1
setuptools               60.2.0
six                      1.16.0
skia-pathops             0.7.4
sox                      1.4.1
srt                      3.5.2
stable-ts                1.0.1
svgelements              1.9.0
tokenizers               0.13.2
torch                    1.13.1
tqdm                     4.64.1
transformers             4.25.1
typing_extensions        4.4.0
urllib3                  1.26.13
watchdog                 2.2.0
wheel                    0.37.1
whisper                  1.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100

Additional comments

Setting transcript language in RecorderService

Whisper has a functionality (i.e. a flag for it) to set language. So in theory, it does not have to detect it. It can be helpful because some languages are more easily erroneously detected than others.

Azure tts is not working

AttributeError: 'NoneType' object has no attribute 'reason'

The error occurs when I use with self.voiceover("..."), the example scene also has this issue.

Assertion error when trying to run with a transcription model

Description of bug / unexpected behavior

After installing packages required to run a transcription model it throws an assertion error when trying to use it

Expected behavior

The transcription model should run fine

How to reproduce the issue

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.gtts import GTTSService

class BugScene(VoiceoverScene):
  def construct(self):
    self.set_speech_service(
        GTTSService(transcription_model="base")
        )
    with self.voiceover("Voice") as trk:
      pass

Additional media files

Images/GIFs

Logs

Terminal output

(venv) oz@Ozz:~/repos/GPU_Programming$ manim -pql manim_scripts/temp.py -v DEBUG
Manim Community v0.18.1

Detected language: english
  0%|                                                                                                                                                                                                                                                                    | 0/0.96 [00:00<?, ?sec/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/manim/cli/render/commands.py:12 │
│ 0 in render                                                                                      │
│                                                                                                  │
│   117 │   │   │   try:                                                                           │
│   118 │   │   │   │   with tempconfig({}):                                                       │
│   119 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 120 │   │   │   │   │   scene.render()                                                         │
│   121 │   │   │   except Exception:                                                              │
│   122 │   │   │   │   error_console.print_exception()                                            │
│   123 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/manim/scene/scene.py:229 in     │
│ render                                                                                           │
│                                                                                                  │
│    226 │   │   """                                                                               │
│    227 │   │   self.setup()                                                                      │
│    228 │   │   try:                                                                              │
│ ❱  229 │   │   │   self.construct()                                                              │
│    230 │   │   except EndSceneEarlyException:                                                    │
│    231 │   │   │   pass                                                                          │
│    232 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/manim_scripts/temp.py:39 in construct                             │
│                                                                                                  │
│   36 │   self.set_speech_service(                                                                │
│   37 │   │   GTTSService(transcription_model="base")                                             │
│   38 │   │   )                                                                                   │
│ ❱ 39 │   with self.voiceover("Voice") as trk:                                                    │
│   40 │     pass                                                                                  │
│   41                                                                                             │
│                                                                                                  │
│ /usr/lib/python3.11/contextlib.py:137 in __enter__                                               │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/manim_voiceover/voiceover_scene │
│ .py:186 in voiceover                                                                             │
│                                                                                                  │
│   183 │   │                                                                                      │
│   184 │   │   try:                                                                               │
│   185 │   │   │   if text is not None:                                                           │
│ ❱ 186 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   187 │   │   │   elif ssml is not None:                                                         │
│   188 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   189 │   │   finally:                                                                           │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/manim_voiceover/voiceover_scene │
│ .py:69 in add_voiceover_text                                                                     │
│                                                                                                  │
│    66 │   │   │   │   "You need to call init_voiceover() before adding a voiceover."             │
│    67 │   │   │   )                                                                              │
│    68 │   │                                                                                      │
│ ❱  69 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│    70 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    71 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    72 │   │   self.current_tracker = tracker                                                     │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/manim_voiceover/services/base.p │
│ y:95 in _wrap_generate_from_text                                                                 │
│                                                                                                  │
│    92 │   │                                                                                      │
│    93 │   │   # Check whether word boundaries exist and if not run stt                           │
│    94 │   │   if "word_boundaries" not in dict_ and self._whisper_model is not None:             │
│ ❱  95 │   │   │   transcription_result = self._whisper_model.transcribe(                         │
│    96 │   │   │   │   str(Path(self.cache_dir) / original_audio), **self.transcription_kwargs    │
│    97 │   │   │   )                                                                              │
│    98 │   │   │   logger.info("Transcription: " + transcription_result.text)                     │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/stable_whisper/whisper_word_lev │
│ el.py:575 in transcribe_stable                                                                   │
│                                                                                                  │
│    572 │   │   │   if word_timestamps:                                                           │
│    573 │   │   │   │   if end_timestamp_pos > 0:                                                 │
│    574 │   │   │   │   │   num_samples = min(round(end_timestamp_pos * N_SAMPLES_PER_TOKEN), nu  │
│ ❱  575 │   │   │   │   add_word_timestamps_stable(                                               │
│    576 │   │   │   │   │   segments=current_segments,                                            │
│    577 │   │   │   │   │   model=model,                                                          │
│    578 │   │   │   │   │   tokenizer=tokenizer,                                                  │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/stable_whisper/timing.py:259 in │
│ add_word_timestamps_stable                                                                       │
│                                                                                                  │
│   256 │   │   │   │   │   )                                                                      │
│   257 │   │   │   │   )                                                                          │
│   258 │                                                                                          │
│ ❱ 259 │   align()                                                                                │
│   260 │   if (                                                                                   │
│   261 │   │   │   gap_padding is not None and                                                    │
│   262 │   │   │   any(                                                                           │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/stable_whisper/timing.py:225 in │
│ align                                                                                            │
│                                                                                                  │
│   222 │   │   text_tokens, token_split, seg_indices = split_word_tokens(segments, tokenizer,     │
│   223 │   │   │   │   │   │   │   │   │   │   │   │   │   │   │   │     padding=gap_padding, s   │
│   224 │   │                                                                                      │
│ ❱ 225 │   │   alignment = find_alignment_stable(model, tokenizer, text_tokens, mel, num_sample   │
│   226 │   │   │   │   │   │   │   │   │   │     **kwargs,                                        │
│   227 │   │   │   │   │   │   │   │   │   │     token_split=token_split,                         │
│   228 │   │   │   │   │   │   │   │   │   │     audio_features=audio_features,                   │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/stable_whisper/timing.py:79 in  │
│ find_alignment_stable                                                                            │
│                                                                                                  │
│    76 │   weights = (weights * qk_scale).softmax(dim=-1)                                         │
│    77 │   std, mean = torch.std_mean(weights, dim=-2, keepdim=True, unbiased=False)              │
│    78 │   weights = (weights - mean) / std                                                       │
│ ❱  79 │   weights = median_filter(weights, medfilt_width)                                        │
│    80 │                                                                                          │
│    81 │   matrix = weights.mean(axis=0)                                                          │
│    82 │   matrix = matrix[len(tokenizer.sot_sequence): -1]                                       │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/whisper/timing.py:38 in         │
│ median_filter                                                                                    │
│                                                                                                  │
│    35 │   x = F.pad(x, (filter_width // 2, filter_width // 2, 0, 0), mode="reflect")             │
│    36 │   if x.is_cuda:                                                                          │
│    37 │   │   try:                                                                               │
│ ❱  38 │   │   │   from .triton_ops import median_filter_cuda                                     │
│    39 │   │   │                                                                                  │
│    40 │   │   │   result = median_filter_cuda(x, filter_width)                                   │
│    41 │   │   except (RuntimeError, subprocess.CalledProcessError):                              │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/whisper/triton_ops.py:7 in      │
│ <module>                                                                                         │
│                                                                                                  │
│     4 import torch                                                                               │
│     5                                                                                            │
│     6 try:                                                                                       │
│ ❱   7 │   import triton                                                                          │
│     8 │   import triton.language as tl                                                           │
│     9 except ImportError:                                                                        │
│    10 │   raise RuntimeError("triton import failed; try `pip install --pre triton`")             │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/triton/__init__.py:20 in        │
│ <module>                                                                                         │
│                                                                                                  │
│   17 │   reinterpret,                                                                            │
│   18 │   TensorWrapper,                                                                          │
│   19 )                                                                                           │
│ ❱ 20 from .runtime import (                                                                      │
│   21 │   autotune,                                                                               │
│   22 │   Config,                                                                                 │
│   23 │   heuristics,                                                                             │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/triton/runtime/__init__.py:1 in │
│ <module>                                                                                         │
│                                                                                                  │
│ ❱  1 from .autotuner import Config, Heuristics, autotune, heuristics                             │
│    2 from .jit import JITFunction, KernelInterface, version_key                                  │
│    3                                                                                             │
│    4 __all__ = [                                                                                 │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/triton/runtime/autotuner.py:7   │
│ in <module>                                                                                      │
│                                                                                                  │
│     4 import time                                                                                │
│     5 from typing import Dict                                                                    │
│     6                                                                                            │
│ ❱   7 from ..compiler import OutOfResources                                                      │
│     8 from ..testing import do_bench                                                             │
│     9 from .jit import KernelInterface                                                           │
│    10                                                                                            │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/triton/compiler.py:22 in        │
│ <module>                                                                                         │
│                                                                                                  │
│     19 from sysconfig import get_paths                                                           │
│     20 from typing import Any, Callable, Dict, Tuple, Union                                      │
│     21                                                                                           │
│ ❱   22 import setuptools                                                                         │
│     23 import torch                                                                              │
│     24 from filelock import FileLock                                                             │
│     25                                                                                           │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/setuptools/__init__.py:8 in     │
│ <module>                                                                                         │
│                                                                                                  │
│     5 import re                                                                                  │
│     6 import warnings                                                                            │
│     7                                                                                            │
│ ❱   8 import _distutils_hack.override  # noqa: F401                                              │
│     9                                                                                            │
│    10 import distutils.core                                                                      │
│    11 from distutils.errors import DistutilsOptionError                                          │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/_distutils_hack/override.py:1   │
│ in <module>                                                                                      │
│                                                                                                  │
│ ❱ 1 __import__('_distutils_hack').do_override()                                                  │
│   2                                                                                              │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/_distutils_hack/__init__.py:77  │
│ in do_override                                                                                   │
│                                                                                                  │
│    74 │   """                                                                                    │
│    75 │   if enabled():                                                                          │
│    76 │   │   warn_distutils_present()                                                           │
│ ❱  77 │   │   ensure_local_distutils()                                                           │
│    78                                                                                            │
│    79                                                                                            │
│    80 class _TrivialRe:                                                                          │
│                                                                                                  │
│ /home/oz/repos/GPU_Programming/venv/lib/python3.11/site-packages/_distutils_hack/__init__.py:64  │
│ in ensure_local_distutils                                                                        │
│                                                                                                  │
│    61 │                                                                                          │
│    62 │   # check that submodules load as expected                                               │
│    63 │   core = importlib.import_module('distutils.core')                                       │
│ ❱  64 │   assert '_distutils' in core.__file__, core.__file__                                    │
│    65 │   assert 'setuptools._distutils.log' not in sys.modules                                  │
│    66                                                                                            │
│    67                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: /usr/lib/python3.11/distutils/core.py

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)):
RAM:
Python version (python/py/python3 --version):
Installed modules (provide output from pip list):

Debian 12 kernel 6.1.0-22-amd64
ram: 64 GB DDR5
Python 3.11.2
Pip:
Package                  Version
------------------------ -----------
attrs                    23.2.0
basedpyright             1.13.3
cattrs                   23.2.3
certifi                  2024.7.4
charset-normalizer       3.3.2
click                    8.1.7
cloup                    3.0.5
cmake                    3.30.1
decorator                5.1.1
docstring-to-markdown    0.15
evdev                    1.7.1
ffmpeg-python            0.2.0
filelock                 3.15.4
fsspec                   2024.6.1
future                   1.0.0
glcontext                2.5.0
gTTS                     2.5.1
huggingface-hub          0.24.1
idna                     3.7
isosurfaces              0.1.2
jedi                     0.19.1
jedi-language-server     0.41.4
Jinja2                   3.1.4
lit                      18.1.8
llvmlite                 0.43.0
lsprotocol               2023.0.1
manim                    0.18.1
manim-ml                 0.0.24
manim-voiceover          0.3.6.post0
ManimPango               0.5.0
mapbox-earcut            1.0.1
markdown-it-py           3.0.0
MarkupSafe               2.1.5
mdurl                    0.1.2
moderngl                 5.10.0
moderngl-window          2.4.6
more-itertools           10.3.0
mpmath                   1.3.0
multipledispatch         1.0.0
mutagen                  1.47.0
networkx                 3.3
nodejs-wheel-binaries    20.15.1
numba                    0.60.0
numpy                    1.26.4
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
openai-whisper           20230314
packaging                24.1
pandas                   2.2.2
parso                    0.8.4
pillow                   10.4.0
pip                      23.0.1
PyAudio                  0.2.14
pycairo                  1.26.1
pydub                    0.25.1
pyglet                   2.0.15
pygls                    1.3.1
Pygments                 2.18.0
pynput                   1.7.7
pyrr                     0.10.3
python-dateutil          2.9.0.post0
python-dotenv            0.21.1
python-slugify           8.0.4
python-xlib              0.33
pytz                     2024.1
PyYAML                   6.0.1
regex                    2024.5.15
requests                 2.32.3
rich                     13.7.1
safetensors              0.4.3
scipy                    1.14.0
screeninfo               0.8.1
setuptools               66.1.1
six                      1.16.0
skia-pathops             0.8.0.post1
sox                      1.5.0
srt                      3.5.3
stable-ts                2.11.1
svgelements              1.9.6
sympy                    1.13.1
text-unidecode           1.3
tiktoken                 0.3.1
tokenizers               0.19.1
torch                    2.0.1
torchaudio               2.0.2
tqdm                     4.66.4
transformers             4.43.1
triton                   2.0.0
typing_extensions        4.12.2
tzdata                   2024.1
urllib3                  2.2.2
watchdog                 4.0.1
wheel                    0.43.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

PASTE HERE

Additional comments

JSONDecodeError: Expecting value: line 283735 column 21 (char 7596101)

Description of bug / unexpected behavior

I was adjusting the timeline of my animation, but out of sudden, it shows the error below
Same when I run the testing, I've made several video with Manim Voiceover, everything worded smoothly.
wget https://github.com/ManimCommunity/manim-voiceover/raw/main/examples/gtts-example.py
manim -pql gtts-example.py --disable_caching

JSONDecodeError: Expecting value: line 283735 column 21 (char 7596101)

How to use StitcherService to directly play an mp3 file as a voiceover?

As you can see from the screenshot, a demonstration video VoiceoverDemo.mp4 explains at 2:20 that with StitcherService it is possible to directly play an mp3 file as a voiceover.

However when I try to use it a NameError occurs: name 'StitcherService' is not defined

from manim import *
from manim_voiceover import VoiceoverScene

class AzureExample(VoiceoverScene):
    def construct(self):
        # self.set_speech_service(
        #     AzureService(
        #         voice="ru-RU-DmitryNeural",
        #         style="newscast-casual",
        #         global_speed=1.15
        #     )
        # )
        self.set_speech_service(
            StitcherService("test.mp3")
        )

When I tried to fix it by importing StitcherService an ImportError occurred: cannot import name 'StitcherService' from 'manim_voiceover.services.stitcher' (/usr/local/lib/python3.11/site-packages/manim_voiceover/services/stitcher.py)

from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.stitcher import StitcherService

class AzureExample(VoiceoverScene):
    def construct(self):
        # self.set_speech_service(
        #     AzureService(
        #         voice="ru-RU-DmitryNeural",
        #         style="newscast-casual",
        #         global_speed=1.15
        #     )
        # )
        self.set_speech_service(
            StitcherService("test.mp3")
        )

So how to directly play an mp3 file as a voiceover?

Allow other audio codecs to be used other than mp3

Description of proposed feature

Currently, rendering to .webm doesn't work with manim-voiceover, as all audio files use .mp3's, which the WebM format does not support(it only supports Vorbis or OPUS).

The MP3 codec is also pretty old and really lossy, especially when compared to more modern formats like AAC, Vorbis or OPUS. On that ground, there should be a way for audio to be encoded as another format(which could be done with FFmpeg on the source files).

How can the new feature be used?

It would re-enable WebM output without having to spend extra time reencoding(and losing fidelity), which some people might prefer over having to deal with over the patented H.264/H.265 found in .mp4 files.

Additional comments

As I said before, it could easily be done using FFmpeg.

OpenGL

Description of bug / unexpected behavior

GTTR example rashes when using OpenGL with GUI.

Expected behavior

Work properly, perhaps at least without sound with GUI.

How to reproduce the issue

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.gtts import GTTSService

class GTTSExample(VoiceoverScene):
    def construct(self):
        self.interactive_embed()

        self.set_speech_service(GTTSService(lang="en", tld="com"))

        circle = Circle()
        square = Square().shift(2 * RIGHT)

        with self.voiceover(text="This circle is drawn as I speak.") as tracker:
            self.play(Create(circle), run_time=tracker.duration)

        with self.voiceover(text="Let's shift it to the left 2 units.") as tracker:
            self.play(circle.animate.shift(2 * LEFT), run_time=tracker.duration)

        with self.voiceover(text="Now, let's transform it into a square.") as tracker:
            self.play(Transform(circle, square), run_time=tracker.duration)

        with self.voiceover(text="Thank you for watching."):
            self.play(Uncreate(circle))

        self.wait()


# Use GTTS with another language:
class GTTSExampleVietnamese(VoiceoverScene):
    def construct(self):
        self.interactive_embed()

        # Set the lang argument to another language code.
        self.set_speech_service(GTTSService(lang="vi"))

        circle = Circle()
        square = Square().shift(2 * RIGHT)

        with self.voiceover(text="Vòng tròn này được vẽ khi tôi nói.") as tracker:
            self.play(Create(circle), run_time=tracker.duration)

        with self.voiceover(text="Hãy chuyển nó sang bên trái 2 đơn vị.") as tracker:
            self.play(circle.animate.shift(2 * LEFT), run_time=tracker.duration)

        with self.voiceover(
            text="Bây giờ hãy biến nó thành một hình vuông."
        ) as tracker:
            self.play(Transform(circle, square), run_time=tracker.duration)

        with self.voiceover(text="Cảm ơn vì đã xem."):
            self.play(Uncreate(circle))

        self.wait()

Additional media files

Images/GIFs

Logs

Terminal output

$ manim -pql scene.py GTTSExample --renderer=opengl --enable_gui --disable_caching
Manim Community v0.17.3

Python 3.11.2 (main, May 30 2023, 17:45:26) [GCC 12.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.0.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: 

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/hypatia/.local/lib/python3.11/site-packages/manim/cli/render/commands. │
│ py:97 in render                                                              │
│                                                                              │
│    94 │   │   │   │   for SceneClass in scene_classes_from_file(file):       │
│    95 │   │   │   │   │   with tempconfig({}):                               │
│    96 │   │   │   │   │   │   scene = SceneClass(renderer)                   │
│ ❱  97 │   │   │   │   │   │   rerun = scene.render()                         │
│    98 │   │   │   │   │   if rerun or config["write_all"]:                   │
│    99 │   │   │   │   │   │   renderer.num_plays = 0                         │
│   100 │   │   │   │   │   │   continue                                       │
│                                                                              │
│ /home/hypatia/.local/lib/python3.11/site-packages/manim/scene/scene.py:233   │
│ in render                                                                    │
│                                                                              │
│    230 │   │   │   return True                                               │
│    231 │   │   self.tear_down()                                              │
│    232 │   │   # We have to reset these settings in case of multiple renders │
│ ❱  233 │   │   self.renderer.scene_finished(self)                            │
│    234 │   │                                                                 │
│    235 │   │   # Show info only if animations are rendered or to get image   │
│    236 │   │   if (                                                          │
│                                                                              │
│ /home/hypatia/.local/lib/python3.11/site-packages/manim/renderer/opengl_rend │
│ erer.py:483 in scene_finished                                                │
│                                                                              │
│   480 │   │   # When num_plays is 0, no images have been output, so output a │
│   481 │   │   # image in this case                                           │
│   482 │   │   if self.num_plays > 0:                                         │
│ ❱ 483 │   │   │   self.file_writer.finish()                                  │
│   484 │   │   elif self.num_plays == 0 and config.write_to_movie:            │
│   485 │   │   │   config.write_to_movie = False                              │
│   486                                                                        │
│                                                                              │
│ /home/hypatia/.local/lib/python3.11/site-packages/manim/scene/scene_file_wri │
│ ter.py:468 in finish                                                         │
│                                                                              │
│   465 │   │   │   target_dir = self.image_file_path.parent / self.image_file │
│   466 │   │   │   logger.info("\n%i images ready at %s\n", self.frame_count, │
│   467 │   │   if self.subcaptions:                                           │
│ ❱ 468 │   │   │   self.write_subcaption_file()                               │
│   469 │                                                                      │
│   470 │   def open_movie_pipe(self, file_path=None):                         │
│   471 │   │   """                                                            │
│                                                                              │
│ /home/hypatia/.local/lib/python3.11/site-packages/manim/scene/scene_file_wri │
│ ter.py:729 in write_subcaption_file                                          │
│                                                                              │
│   726 │                                                                      │
│   727 │   def write_subcaption_file(self):                                   │
│   728 │   │   """Writes the subcaption file."""                              │
│ ❱ 729 │   │   subcaption_file = Path(config.output_file).with_suffix(".srt") │
│   730 │   │   subcaption_file.write_text(srt.compose(self.subcaptions), enco │
│   731 │   │   logger.info(f"Subcaption file has been written as {subcaption_ │
│   732                                                                        │
│                                                                              │
│ /usr/lib/python3.11/pathlib.py:872 in __new__                                │
│                                                                              │
│    869 │   def __new__(cls, *args, **kwargs):                                │
│    870 │   │   if cls is Path:                                               │
│    871 │   │   │   cls = WindowsPath if os.name == 'nt' else PosixPath       │
│ ❱  872 │   │   self = cls._from_parts(args)                                  │
│    873 │   │   if not self._flavour.is_supported:                            │
│    874 │   │   │   raise NotImplementedError("cannot instantiate %r on your  │
│    875 │   │   │   │   │   │   │   │   │     % (cls.__name__,))              │
│                                                                              │
│ /usr/lib/python3.11/pathlib.py:510 in _from_parts                            │
│                                                                              │
│    507 │   │   # We need to call _parse_args on the instance, so as to get t │
│    508 │   │   # right flavour.                                              │
│    509 │   │   self = object.__new__(cls)                                    │
│ ❱  510 │   │   drv, root, parts = self._parse_args(args)                     │
│    511 │   │   self._drv = drv                                               │
│    512 │   │   self._root = root                                             │
│    513 │   │   self._parts = parts                                           │
│                                                                              │
│ /usr/lib/python3.11/pathlib.py:494 in _parse_args                            │
│                                                                              │
│    491 │   │   │   if isinstance(a, PurePath):                               │
│    492 │   │   │   │   parts += a._parts                                     │
│    493 │   │   │   else:                                                     │
│ ❱  494 │   │   │   │   a = os.fspath(a)                                      │
│    495 │   │   │   │   if isinstance(a, str):                                │
│    496 │   │   │   │   │   # Force-cast str subclasses to str (issue #21127) │
│    497 │   │   │   │   │   parts.append(str(a))                              │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: expected str, bytes or os.PathLike object, not NoneType

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Ubuntu 23
RAM: 16GB
Python version (python/py/python3 --version): Python 3.11.2
Installed modules (provide output from pip list):

Package                        Version
------------------------------ -------------------------
appdirs                        1.4.4
arandr                         0.1.11
archivebox                     0.6.2
argcomplete                    2.0.0
asgiref                        3.6.0
asttokens                      2.2.1
attrs                          22.2.0
azure-cognitiveservices-speech 1.29.0
Babel                          2.10.3
backcall                       0.2.0
bcrypt                         3.2.2
beautifulsoup4                 4.11.2
beniget                        0.4.1
black                          23.3.0
blinker                        1.5
Brlapi                         0.8.4
Brotli                         1.0.9
bytecode                       0.14.0
cachetools                     5.2.0
certifi                        2022.9.24
cffi                           1.15.1
chardet                        5.1.0
charset-normalizer             3.1.0
click                          8.1.3
click-default-group            1.2.2
cloud-init                     23.1.2
cloup                          0.13.1
colorama                       0.4.6
coloredlogs                    15.0.1
colour                         0.1.5
command-not-found              0.3
configobj                      5.0.8
coverage                       6.5.0
croniter                       1.3.14
cryptography                   38.0.4
cssselect                      1.2.0
cupshelpers                    1.0
cycler                         0.11.0
dateparser                     1.1.8
dbus-python                    1.3.2
dearpygui                      1.9.1
debugpy                        1.6.3+git20221103.a2a3328
decorator                      5.1.1
defer                          1.0.6
Deprecated                     1.2.14
deprecation                    2.0.7
distro                         1.8.0
distro-info                    1.5
Django                         3.1.14
django-extensions              3.1.5
dropbox                        11.36.0
duplicity                      0.8.22
entrypoints                    0.4
epc                            0.0.5
evdev                          1.6.1
executing                      1.2.0
exif                           1.6.0
fasteners                      0.17.3
feedparser                     6.0.10
fonttools                      4.38.0
fs                             2.4.16
future                         0.18.2
gast                           0.5.2
geographiclib                  2.0
geopy                          2.3.0
ghp-import                     2.1.0
git-remote-dropbox             2.0.0
giturlparse                    0.10.0
glcontext                      2.3.7
google-api-python-client       1.7.12
google-auth                    1.5.1
google-auth-httplib2           0.1.0
greenlet                       2.0.1
gTTS                           2.3.2
gyp                            0.1
html5lib                       1.1
httplib2                       0.20.4
humanfriendly                  10.0
idna                           3.3
img2pdf                        0.4.4
importlib-metadata             4.12.0
iniconfig                      1.1.1
input-remapper                 2.0.0
ipykernel                      6.17.0
ipython                        8.0.1
ipython_genutils               0.2.0
isosurfaces                    0.1.0
jaraco.classes                 3.2.1
jedi                           0.18.2
jeepney                        0.8.0
Jinja2                         3.1.2
joblib                         1.2.0
jsonpatch                      1.32
jsonpointer                    2.0
jsonschema                     4.6.0
jupyter_client                 7.4.9
jupyter_core                   4.12.0
keyring                        23.9.3
kiwisolver                     0.0.0
language-selector              0.1
launchpadlib                   1.11.0
lazr.restfulclient             0.14.5
lazr.uri                       1.0.6
libevdev                       0.5
livereload                     2.6.3
lockfile                       0.12.2
louis                          3.24.0
lunr                           0.6.2
lxml                           4.9.2
lz4                            4.0.2+dfsg
Mako                           1.2.4.dev0
manim                          0.17.3
manim-voiceover                0.3.3.post0
ManimPango                     0.4.3
mapbox-earcut                  1.0.1
Markdown                       3.4.3
markdown-it-py                 2.1.0
MarkupSafe                     2.1.2
matplotlib                     3.5.2
matplotlib-inline              0.1.6
mdurl                          0.1.2
mergedeep                      1.3.4
meson                          1.0.1
mkdocs                         1.4.2
moderngl                       5.8.2
moderngl-window                2.4.4
monotonic                      1.6
more-itertools                 8.10.0
mpmath                         0.0.0
multipledispatch               0.6.0
mutagen                        1.46.0
mypy-extensions                1.0.0
nest-asyncio                   1.5.4
netifaces                      0.11.0
networkx                       2.8.8
nltk                           3.8
numpy                          1.24.2
oauth2client                   4.1.3
oauthlib                       3.2.2
ocrmypdf                       14.0.1+dfsg1
olefile                        0.46
packaging                      23.1
paramiko                       2.12.0
parso                          0.8.3
pathspec                       0.11.1
pdfminer.six                   20221105
pexpect                        4.8.0
pickleshare                    0.7.5
pikepdf                        6.0.0+dfsg
Pillow                         9.4.0
pip                            23.0.1
pipx                           1.1.0
platformdirs                   3.8.0
playsound                      1.3.0
pluggy                         1.0.0+repack
plum-py                        0.8.6
ply                            3.11
prompt-toolkit                 3.0.38
psutil                         5.9.5
ptyprocess                     0.7.0
pure-eval                      0.2.2
py                             1.11.0
pyasn1                         0.4.8
pyasn1-modules                 0.2.8
pycairo                        1.24.0
pycparser                      2.21
pycryptodomex                  3.11.0
pycups                         2.0.1
pydantic                       1.10.4
pydbus                         0.6.0
pydevd                         2.9.5
PyDrive2                       0.0.0
pydub                          0.25.1
pyenchant                      3.2.2
pygit2                         1.12.1
pyglet                         2.0.8
Pygments                       2.15.1
PyGObject                      3.44.1
pyinotify                      0.9.6
PyJWT                          2.6.0
PyMuPDF                        1.22.3
PyNaCl                         1.5.0
pyOpenSSL                      23.0.0
pyparsing                      3.0.9
pypinyin                       0.49.0
pypng                          0.20220715.0
PyQt5                          5.15.9
PyQt5-sip                      12.11.1
PyQt6                          6.5.0
PyQt6-Qt6                      6.5.0
PyQt6-sip                      13.5.1
PyQt6-WebEngine                6.5.0
PyQt6-WebEngine-Qt6            6.5.0
PyQtWebEngine                  5.15.6
pyquery                        2.0.0
pyrr                           0.10.3
pyrsistent                     0.18.1
pyserial                       3.5
PySocks                        1.7.1
pytaglib                       2.0.0
pyte                           0.8.1
pytest                         7.2.1
python-apt                     2.5.3+ubuntu1
python-crontab                 2.7.1
python-dateutil                2.8.2
python-debian                  0.1.49+ubuntu2
python-dotenv                  0.21.1
python-slugify                 8.0.1
python-tsp                     0.3.1
python-xlib                    0.33
pythran                        0.11.0
pytz                           2022.7.1
pyudev                         0.24.0
pyxattr                        0.8.0
pyxdg                          0.28
PyYAML                         6.0
pyyaml_env_tag                 0.1
pyzmq                          24.0.1
qrcode                         7.4.2
qtconsole                      5.4.0
QtPy                           2.3.0
regex                          2022.10.31
reportlab                      3.6.12
requests                       2.28.1
retrying                       1.3.4
rich                           13.3.1
rsa                            4.8
scipy                          1.10.1
screen-resolution-extra        0.0.0
screeninfo                     0.8.1
SecretStorage                  3.3.3
setuptools                     66.1.1
sexpdata                       1.0.1
sgmllib3k                      1.0.0
simplejson                     3.18.3
six                            1.16.0
skia-pathops                   0.7.4
soupsieve                      2.4
sox                            1.4.1
SQLAlchemy                     1.4.46
sqlparse                       0.4.4
srt                            3.5.3
stack-data                     0.6.2
stone                          3.3.1
svgelements                    1.9.5
sympy                          1.11.1
systemd-python                 235
tabulate                       0.8.10
text-unidecode                 1.3
tld                            0.13
tornado                        6.2
tqdm                           4.64.1
traitlets                      5.9.0
tsplib95                       0.7.1
typing_extensions              4.4.0
tzlocal                        5.0.1
ubuntu-advantage-tools         8001
ubuntu-drivers-common          0.0.0
ufoLib2                        0.14.0
ufw                            0.36.1
unattended-upgrades            0.1
unidiff                        0.7.5
uritemplate                    4.1.1
urllib3                        1.26.12
usb-creator                    0.3.16
userpath                       1.8.0
variety                        0.8.10
vimura-server                  1.3
w3lib                          2.1.1
wadllib                        1.3.6
watchdog                       2.2.1
wcwidth                        0.2.6
webencodings                   0.5.1
WebOb                          1.8.6
websockets                     10.4
wheel                          0.38.4
wrapt                          1.15.0
xdg                            5
xkit                           0.0.0
youtube-dl                     2021.12.17
yt-dlp                         2023.3.4
zipp                           1.0.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020): TeX Live 2022
Installed LaTeX packages:

texlive-full

FFMPEG

Output of ffmpeg -version:

ffmpeg version 5.1.2-3ubuntu1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12 (Ubuntu 12.2.0-14ubuntu2)
configuration: --prefix=/usr --extra-version=3ubuntu1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
libavutil      57. 28.100 / 57. 28.100
libavcodec     59. 37.100 / 59. 37.100
libavformat    59. 27.100 / 59. 27.100
libavdevice    59.  7.100 / 59.  7.100
libavfilter     8. 44.100 /  8. 44.100
libswscale      6.  7.100 /  6.  7.100
libswresample   4.  7.100 /  4.  7.100
libpostproc    56.  6.100 / 56.  6.100

Additional comments

Humanhash3 causes installation errors on Chinese locale

Preliminaries

I have followed the latest version of the
installation instructions.

Description of error

When I try to install manim-voiceover library with "pip install --upgrade "manim-voiceover[azure,gtts]" It produces error while installing the humanhash3. The output log is attached below.

Installation logs

Collecting manim-voiceover[azure,gtts]
Using cached manim_voiceover-0.3.0-py3-none-any.whl (37 kB)
Requirement already satisfied: manim in c:\tools\manim\lib\site-packages (from manim-voiceover[azure,gtts]) (0.17.2)
Collecting humanhash3<0.0.7,>=0.0.6
Using cached humanhash3-0.0.6.tar.gz (5.4 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\user\AppData\Local\Temp\pip-install-dq4c4am1\humanhash3_9cdb2df319f347f9a53bc7df772b27db\setup.py", line 7, in
long_description = f.read()
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 1081: illegal multibyte sequence
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

System specifications

System Details

OS (Windows 11 22H2):
RAM: 8GB
Python version : 3.10.8

Error installing manim-voiceover[recorder]

Preliminaries

I have followed the latest version of the
installation instructions.

Description of error

When I followed the steps in https://voiceover.manim.community/en/latest/quickstart.html (Record your own voiceover Part) and run manim -pql my_awesome_scene.py --disable_caching , it told me to install manim-voiceover[recorder] first:

[05/15/23 18:14:21] ERROR    Missing packages. Run `pip install  __init__.py:14
                             "manim-voiceover[recorder]"` to use
                             RecorderService.
                    INFO     The extra packages required by       helper.py:165
                             RecorderService are not installed.
                             Shall I install them for you? [Y/n]

But when I run pip install "manim-voiceover[recorder] , it got error: (see below)

Installation logs

Terminal output

$ pip install "manim-voiceover[recorder]"
Requirement already satisfied: manim-voiceover[recorder] in c:\users\_sayon\appdata\local\programs\python\python311\lib\site-packages (0.3.1)
Requirement already satisfied: PyAudio<0.3.0,>=0.2.12 in c:\users\_sayon\appdata\local\programs\python\python311\lib\site-packages (from manim-voiceover[recorder]) (0.2.13)
Requirement already satisfied: humanhash3<0.0.7,>=0.0.6 in c:\users\_sayon\appdata\local\programs\python\python311\lib\site-packages (from manim-voiceover[recorder]) (0.0.6)
Requirement already satisfied: manim in c:\users\_sayon\appdata\local\programs\python\python311\lib\site-packages (from manim-voiceover[recorder]) (0.17.3)
Requirement already satisfied: mutagen<2.0.0,>=1.46.0 in c:\users\_sayon\appdata\local\programs\python\python311\lib\site-packages (from manim-voiceover[recorder]) (1.46.0)
Requirement already satisfied: pip>=21.0.1 in c:\users\_sayon\appdata\local\programs\python\python311\lib\site-packages (from manim-voiceover[recorder]) (23.1.2)
Collecting playsound<2.0.0,>=1.3.0 (from manim-voiceover[recorder])
  Using cached playsound-1.3.0.tar.gz (7.7 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  error: subprocess-exited-with-error

  Getting requirements to build wheel did not run successfully.
  exit code: 1

  [29 lines of output]
  Traceback (most recent call last):
    File "C:\Users\_Sayon\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
      main()
    File "C:\Users\_Sayon\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\_Sayon\AppData\Local\Programs\Python\Python311\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
      return hook(config_settings)
             ^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\_Sayon\AppData\Local\Temp\pip-build-env-h68he88l\overlay\Lib\site-packages\setuptools\build_meta.py", line 341, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\_Sayon\AppData\Local\Temp\pip-build-env-h68he88l\overlay\Lib\site-packages\setuptools\build_meta.py", line 323, in _get_build_requires
      self.run_setup()
    File "C:\Users\_Sayon\AppData\Local\Temp\pip-build-env-h68he88l\overlay\Lib\site-packages\setuptools\build_meta.py", line 488, in run_setup
      self).run_setup(setup_script=setup_script)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\_Sayon\AppData\Local\Temp\pip-build-env-h68he88l\overlay\Lib\site-packages\setuptools\build_meta.py", line 338, in run_setup
      exec(code, locals())
    File "<string>", line 6, in <module>
    File "C:\Users\_Sayon\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 1262, in getsource
      lines, lnum = getsourcelines(object)
                    ^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\_Sayon\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 1244, in getsourcelines
      lines, lnum = findsource(object)
                    ^^^^^^^^^^^^^^^^^^
    File "C:\Users\_Sayon\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 1081, in findsource
      raise OSError('could not get source code')
  OSError: could not get source code
  [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

Getting requirements to build wheel did not run successfully.
exit code: 1

See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Windows 11 22H2 22621.1702
RAM: 16GB
Python version (python/py/python3 --version): Python 3.11.2
Installed modules (provide output from pip list):

Package                        Version
------------------------------ --------
altgraph                       0.17.3
azure-cognitiveservices-speech 1.28.0
certifi                        2023.5.7
charset-normalizer             3.1.0
click                          8.1.3
click-default-group            1.2.2
cloup                          0.13.1
colorama                       0.4.6
colour                         0.1.5
contourpy                      1.0.7
cycler                         0.11.0
decorator                      5.1.1
filelock                       3.12.0
fonttools                      4.39.3
glcontext                      2.3.7
gTTS                           2.3.2
humanhash3                     0.0.6
idna                           3.4
isosurfaces                    0.1.0
Jinja2                         3.1.2
kiwisolver                     1.4.4
manim                          0.17.3
manim-voiceover                0.3.1
ManimPango                     0.4.3
mapbox-earcut                  1.0.1
markdown-it-py                 2.2.0
MarkupSafe                     2.1.2
matplotlib                     3.7.1
mdurl                          0.1.2
moderngl                       5.8.2
moderngl-window                2.4.3
mpmath                         1.3.0
multipledispatch               0.6.0
mutagen                        1.46.0
networkx                       2.8.8
numpy                          1.24.3
packaging                      23.1
pandas                         2.0.1
pefile                         2023.2.7
Pillow                         9.5.0
pip                            23.1.2
PyAudio                        0.2.13
pycairo                        1.23.0
pydub                          0.25.1
pyglet                         2.0.7
Pygments                       2.15.1
pyinstaller                    5.10.1
pyinstaller-hooks-contrib      2023.2
pyparsing                      3.0.9
pyrr                           0.10.3
python-dateutil                2.8.2
python-dotenv                  0.21.1
pytz                           2023.3
pywin32-ctypes                 0.2.0
requests                       2.30.0
rich                           13.3.5
scipy                          1.10.1
screeninfo                     0.8.1
setuptools                     65.5.0
six                            1.16.0
skia-pathops                   0.7.4
sox                            1.4.1
srt                            3.5.3
svgelements                    1.9.4
svgwrite                       1.4.3
sympy                          1.11.1
torch                          2.0.0
tqdm                           4.65.0
typing_extensions              4.5.0
tzdata                         2023.3
urllib3                        2.0.2
watchdog                       2.3.1

LaTeX details

LaTeX distribution (e.g. TeX Live 2020): TeX Live 2022
Installed LaTeX packages: pretty much and it exceeds the word limit

FFMPEG

Output of ffmpeg -version:

ffmpeg version 5.0.1-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 11.2.0 (Rev7, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      57. 17.100 / 57. 17.100
  libavcodec     59. 18.100 / 59. 18.100
  libavformat    59. 16.100 / 59. 16.100
  libavdevice    59.  4.100 / 59.  4.100
  libavfilter     8. 24.100 /  8. 24.100
  libswscale      6.  4.100 /  6.  4.100
  libswresample   4.  3.100 /  4.  3.100
  libpostproc    56.  3.100 / 56.  3.100

Additional comments

Bookmarks do not work with Coqui speech service

Description of bug / unexpected behavior

Tried to run the bookmark example with Coqui. The animations don't wait for bookmarks correctly.

Expected behavior

Expected to play the same as using Azure speech service.

How to reproduce the issue

Code for reproducing the problem

from TTS.api import TTS
scene.set_speech_service(CoquiService(model_name=TTS.list_models()[17], speaker_idx=TTS(TTS.list_models()[17]).speakers[47]))

Additional media files

Images/GIFs

Logs

Terminal output

PASTE HERE OR PROVIDE LINK TO https://pastebin.com/ OR SIMILAR

System specifications

System Details

Windows 10
Python version 3.9.13:
Installed modules (provide output from pip list):

PASTE HERE

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

ffmpeg version 6.0-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil      58.  2.100 / 58.  2.100
libavcodec     60.  3.100 / 60.  3.100
libavformat    60.  3.100 / 60.  3.100
libavdevice    60.  1.100 / 60.  1.100
libavfilter     9.  3.100 /  9.  3.100
libswscale      7.  1.100 /  7.  1.100
libswresample   4. 10.100 /  4. 10.100
libpostproc    57.  1.100 / 57.  1.100```
</details>

## Additional comments
<!-- Add further context that you think might be relevant for this issue here. -->

Add Matcha-TTS

https://github.com/shivammehta25/Matcha-TTS

Get a TypeError: 'NoneType' object is not subscriptable after press key 'a' during RECORDING

Description of bug / unexpected behavior

I pressed 'r' to record my voice and after that, when I want to continue and press a, the error happened.

Expected behavior

It should save my voice and let me record next one

How to reproduce the issue

manim -pql recording.py --disable_caching
then press 'a'
Press...
l to [l]isten to the recording
r to [r]e-record
a to [a]ccept the recording

Code for reproducing the problem

manim -pql recording.py --disable_caching

Additional media files

Images/GIFs

Logs

Terminal output

/home/semikernel/anaconda3/envs/manimexp/lib/python3.11/site-packages/stable_whisper/whisper_word_level.py:235: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
  0%|                                                 | 0/4.59 [00:00<?, ?sec/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/semikernel/anaconda3/envs/manimexp/lib/python3.11/site-packages/manim/ │
│ cli/render/commands.py:115 in render                                         │
│                                                                              │
│   112 │   │   │   try:                                                       │
│   113 │   │   │   │   with tempconfig({}):                                   │
│   114 │   │   │   │   │   scene = SceneClass()                               │
│ ❱ 115 │   │   │   │   │   scene.render()                                     │
│   116 │   │   │   except Exception:                                          │
│   117 │   │   │   │   error_console.print_exception()                        │
│   118 │   │   │   │   sys.exit(1)                                            │
│                                                                              │
│ /home/semikernel/anaconda3/envs/manimexp/lib/python3.11/site-packages/manim/ │
│ scene/scene.py:223 in render                                                 │
│                                                                              │
│    220 │   │   """                                                           │
│    221 │   │   self.setup()                                                  │
│    222 │   │   try:                                                          │
│ ❱  223 │   │   │   self.construct()                                          │
│    224 │   │   except EndSceneEarlyException:                                │
│    225 │   │   │   pass                                                      │
│    226 │   │   except RerunSceneException as e:                              │
│                                                                              │
│ /home/semikernel/Documents/manim/gtts.py:13 in construct                     │
│                                                                              │
│   10 │   │   circle = Circle()                                               │
│   11 │   │   square = Square().shift(2 * RIGHT)                              │
│   12 │   │                                                                   │
│ ❱ 13 │   │   with self.voiceover(text="This circle is drawn as I speak.") as │
│   14 │   │   │   self.play(Create(circle), run_time=tracker.duration)        │
│   15 │   │                                                                   │
│   16 │   │   with self.voiceover(text="Let's shift it to the left 2 units.") │
│                                                                              │
│ /home/semikernel/anaconda3/envs/manimexp/lib/python3.11/contextlib.py:137 in │
│ __enter__                                                                    │
│                                                                              │
│   134 │   │   # they are only needed for recreation, which is not possible a │
│   135 │   │   del self.args, self.kwds, self.func                            │
│   136 │   │   try:                                                           │
│ ❱ 137 │   │   │   return next(self.gen)                                      │
│   138 │   │   except StopIteration:                                          │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None     │
│   140                                                                        │
│                                                                              │
│ /home/semikernel/anaconda3/envs/manimexp/lib/python3.11/site-packages/manim_ │
│ voiceover/voiceover_scene.py:186 in voiceover                                │
│                                                                              │
│   183 │   │                                                                  │
│   184 │   │   try:                                                           │
│   185 │   │   │   if text is not None:                                       │
│ ❱ 186 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)          │
│   187 │   │   │   elif ssml is not None:                                     │
│   188 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)          │
│   189 │   │   finally:                                                       │
│                                                                              │
│ /home/semikernel/anaconda3/envs/manimexp/lib/python3.11/site-packages/manim_ │
│ voiceover/voiceover_scene.py:69 in add_voiceover_text                        │
│                                                                              │
│    66 │   │   │   │   "You need to call init_voiceover() before adding a voi │
│    67 │   │   │   )                                                          │
│    68 │   │                                                                  │
│ ❱  69 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **k │
│    70 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.ca │
│    71 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_ │
│    72 │   │   self.current_tracker = tracker                                 │
│                                                                              │
│ /home/semikernel/anaconda3/envs/manimexp/lib/python3.11/site-packages/manim_ │
│ voiceover/services/base.py:95 in _wrap_generate_from_text                    │
│                                                                              │
│    92 │   │                                                                  │
│    93 │   │   # Check whether word boundaries exist and if not run stt       │
│    94 │   │   if "word_boundaries" not in dict_ and self._whisper_model is n │
│ ❱  95 │   │   │   transcription_result = self._whisper_model.transcribe(     │
│    96 │   │   │   │   str(Path(self.cache_dir) / original_audio), **self.tra │
│    97 │   │   │   )                                                          │
│    98 │   │   │   logger.info("Transcription: " + transcription_result.text) │
│                                                                              │
│ /home/semikernel/anaconda3/envs/manimexp/lib/python3.11/site-packages/stable │
│ _whisper/whisper_word_level.py:492 in transcribe_stable                      │
│                                                                              │
│    489 │   │   │   │   │   │   continue                                      │
│    490 │   │   │   │   │   ts_token_mask = pad_or_trim(ts_token_mask, 1501)  │
│    491 │   │   │                                                             │
│ ❱  492 │   │   │   detect_language()                                         │
│    493 │   │   │   decode_options["prompt"] = all_tokens[prompt_reset_since: │
│    494 │   │   │   result: DecodingResult = decode_with_fallback(mel_segment │
│    495 │   │   │   tokens = torch.tensor(result.tokens)                      │
│                                                                              │
│ /home/semikernel/anaconda3/envs/manimexp/lib/python3.11/site-packages/stable │
│ _whisper/whisper_word_level.py:305 in detect_language                        │
│                                                                              │
│    302 │   │   │   │   │   │   print("Detecting language using up to 30 seco │
│    303 │   │   │   │   │   │   │     "Use `--language` to specify the langua │
│    304 │   │   │   │   │   timing_mask = np.logical_and(                     │
│ ❱  305 │   │   │   │   │   │   segment_silence_timing[0] <= time_offset,     │
│    306 │   │   │   │   │   │   segment_silence_timing[1] >= time_offset      │
│    307 │   │   │   │   │   )                                                 │
│    308 │   │   │   │   │   start_sample = (                                  │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: 'NoneType' object is not subscriptable

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Red Hat Enterprise Linux 9.3 (Plow)
RAM:16GB
Python version (python/py/python3 --version):Python 3.11.7
Installed modules (provide output from pip list):

Package                        Version
------------------------------ ------------
absl-py                        2.1.0
aiohttp                        3.9.1
aiosignal                      1.3.1
annotated-types                0.6.0
anyascii                       0.3.2
asttokens                      2.4.1
attrs                          23.2.0
audioread                      3.0.1
azure-cognitiveservices-speech 1.34.1
Babel                          2.14.0
bangla                         0.0.2
blinker                        1.7.0
blis                           0.7.11
bnnumerizer                    0.0.2
bnunicodenormalizer            0.1.6
Brotli                         1.1.0
build                          1.0.3
CacheControl                   0.13.1
cachetools                     5.3.2
catalogue                      2.0.10
certifi                        2023.11.17
cffi                           1.16.0
charset-normalizer             3.3.2
cleo                           2.1.0
click                          8.1.7
click-default-group            1.2.4
cloudpathlib                   0.16.0
cloup                          2.1.2
cmake                          3.28.1
colorama                       0.4.6
confection                     0.1.4
contourpy                      1.2.0
coqpit                         0.0.17
crashtest                      0.4.1
cryptography                   42.0.1
cycler                         0.12.1
cymem                          2.0.8
Cython                         3.0.8
dateparser                     1.1.8
decorator                      5.1.1
distlib                        0.3.8
docopt                         0.6.2
dulwich                        0.21.7
einops                         0.7.0
encodec                        0.1.1
evdev                          1.6.1
executing                      2.0.1
fastjsonschema                 2.19.1
ffmpeg-python                  0.2.0
filelock                       3.13.1
Flask                          3.0.1
fonttools                      4.47.2
frozenlist                     1.4.1
fsspec                         2023.12.2
future                         0.18.3
g2pkk                          0.1.2
glcontext                      2.5.0
google-auth                    2.27.0
google-auth-oauthlib           1.2.0
grpcio                         1.60.0
gruut                          2.2.3
gruut-ipa                      0.13.0
gruut_lang_de                  2.0.0
gruut_lang_en                  2.0.0
gruut_lang_es                  2.0.0
gruut_lang_fr                  2.0.2
gTTS                           2.5.0
hangul-romanize                0.1.0
huggingface-hub                0.20.3
idna                           3.6
importlib-metadata             7.0.1
inflect                        7.0.0
installer                      0.7.0
ipython                        8.20.0
isosurfaces                    0.1.0
itsdangerous                   2.1.2
jamo                           0.4.1
jaraco.classes                 3.3.0
jedi                           0.19.1
jeepney                        0.8.0
jieba                          0.42.1
Jinja2                         3.1.3
joblib                         1.3.2
jsonlines                      1.2.0
keyring                        24.3.0
kiwisolver                     1.4.5
langcodes                      3.3.0
lazy_loader                    0.3
librosa                        0.10.1
lit                            17.0.6
llvmlite                       0.41.1
manim                          0.18.0
manim-voiceover                0.3.4.post1
ManimPango                     0.5.0
mapbox-earcut                  1.0.1
Markdown                       3.5.2
markdown-it-py                 3.0.0
MarkupSafe                     2.1.4
matplotlib                     3.8.2
matplotlib-inline              0.1.6
mdurl                          0.1.2
moderngl                       5.9.0
moderngl-window                2.4.1
more-itertools                 10.2.0
mpmath                         1.3.0
msgpack                        1.0.7
multidict                      6.0.4
multipledispatch               0.6.0
murmurhash                     1.0.10
mutagen                        1.47.0
networkx                       2.8.8
nltk                           3.8.1
num2words                      0.5.13
numba                          0.58.1
numpy                          1.26.3
nvidia-cublas-cu11             11.10.3.66
nvidia-cublas-cu12             12.1.3.1
nvidia-cuda-cupti-cu11         11.7.101
nvidia-cuda-cupti-cu12         12.1.105
nvidia-cuda-nvrtc-cu11         11.7.99
nvidia-cuda-nvrtc-cu12         12.1.105
nvidia-cuda-runtime-cu11       11.7.99
nvidia-cuda-runtime-cu12       12.1.105
nvidia-cudnn-cu11              8.5.0.96
nvidia-cudnn-cu12              8.9.2.26
nvidia-cufft-cu11              10.9.0.58
nvidia-cufft-cu12              11.0.2.54
nvidia-curand-cu11             10.2.10.91
nvidia-curand-cu12             10.3.2.106
nvidia-cusolver-cu11           11.4.0.1
nvidia-cusolver-cu12           11.4.5.107
nvidia-cusparse-cu11           11.7.4.91
nvidia-cusparse-cu12           12.1.0.106
nvidia-nccl-cu11               2.14.3
nvidia-nccl-cu12               2.18.1
nvidia-nvjitlink-cu12          12.3.101
nvidia-nvtx-cu11               11.7.91
nvidia-nvtx-cu12               12.1.105
oauthlib                       3.2.2
openai-whisper                 20230314
packaging                      23.2
pandas                         1.5.3
parso                          0.8.3
pexpect                        4.9.0
Pillow                         9.5.0
pip                            23.3.2
pkginfo                        1.9.6
platformdirs                   3.11.0
poetry                         1.7.1
poetry-core                    1.8.1
poetry-plugin-export           1.6.0
pooch                          1.8.0
preshed                        3.0.9
prompt-toolkit                 3.0.43
protobuf                       4.23.4
psutil                         5.9.8
ptyprocess                     0.7.0
pure-eval                      0.2.2
pyasn1                         0.5.1
pyasn1-modules                 0.3.0
PyAudio                        0.2.14
pycairo                        1.25.1
pycparser                      2.21
pydantic                       2.5.3
pydantic_core                  2.14.6
pydub                          0.25.1
pyglet                         1.5.27
Pygments                       2.17.2
pynndescent                    0.5.11
pynput                         1.7.6
pyparsing                      3.1.1
pypinyin                       0.50.0
pyproject_hooks                1.0.0
pyrr                           0.10.3
pysbd                          0.3.4
PySocks                        1.7.1
python-crfsuite                0.9.10
python-dateutil                2.8.2
python-dotenv                  0.21.1
python-slugify                 8.0.2
python-xlib                    0.33
pytz                           2023.3.post1
PyYAML                         6.0.1
rapidfuzz                      3.6.1
regex                          2023.12.25
requests                       2.31.0
requests-oauthlib              1.3.1
requests-toolbelt              1.0.0
rich                           13.7.0
rsa                            4.9
safetensors                    0.4.2
scikit-learn                   1.4.0
scipy                          1.12.0
screeninfo                     0.8.1
SecretStorage                  3.3.3
setuptools                     69.0.3
shellingham                    1.5.4
six                            1.16.0
skia-pathops                   0.8.0.post1
smart-open                     6.4.0
soundfile                      0.12.1
sox                            1.4.1
soxr                           0.3.7
spacy                          3.7.2
spacy-legacy                   3.0.12
spacy-loggers                  1.0.5
srsly                          2.4.8
srt                            3.5.3
stable-ts                      2.11.1
stack-data                     0.6.3
SudachiDict-core               20240109
SudachiPy                      0.6.8
svgelements                    1.9.6
sympy                          1.12
tensorboard                    2.15.1
tensorboard-data-server        0.7.2
text-unidecode                 1.3
thinc                          8.2.2
threadpoolctl                  3.2.0
tiktoken                       0.3.1
tokenizers                     0.15.1
tomli                          2.0.1
tomlkit                        0.12.3
torch                          2.1.2
torchaudio                     2.1.2
tqdm                           4.66.1
trainer                        0.0.36
traitlets                      5.14.1
transformers                   4.37.1
triton                         2.1.0
trove-classifiers              2024.1.8
TTS                            0.22.0
typer                          0.9.0
typing_extensions              4.9.0
tzlocal                        5.2
umap-learn                     0.5.5
Unidecode                      1.3.8
urllib3                        2.1.0
virtualenv                     20.25.0
wasabi                         1.1.2
watchdog                       2.3.1
wcwidth                        0.2.13
weasel                         0.3.4
Werkzeug                       3.0.1
wheel                          0.42.0
yarl                           1.9.4
zipp                           3.17.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020): TeX Live 2020
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.3.0 (conda-forge gcc 12.3.0-2)
configuration: --prefix=/home/conda/feedstock_root/build_artifacts/ffmpeg_1696213708285/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1696213708285/_build_env/bin/x86_64-conda-linux-gnu-cc --cxx=/home/conda/feedstock_root/build_artifacts/ffmpeg_1696213708285/_build_env/bin/x86_64-conda-linux-gnu-c++ --nm=/home/conda/feedstock_root/build_artifacts/ffmpeg_1696213708285/_build_env/bin/x86_64-conda-linux-gnu-nm --ar=/home/conda/feedstock_root/build_artifacts/ffmpeg_1696213708285/_build_env/bin/x86_64-conda-linux-gnu-ar --disable-doc --disable-openssl --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --enable-libdav1d --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-libass --enable-pthreads --enable-vaapi --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libopus --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1696213708285/_build_env/bin/pkg-config
libavutil      57. 28.100 / 57. 28.100
libavcodec     59. 37.100 / 59. 37.100
libavformat    59. 27.100 / 59. 27.100
libavdevice    59.  7.100 / 59.  7.100
libavfilter     8. 44.100 /  8. 44.100
libswscale      6.  7.100 /  6.  7.100
libswresample   4.  7.100 /  4.  7.100
libpostproc    56.  6.100 / 56.  6.100

Additional comments

Error running manim voice over examples

Preliminaries

I have followed the latest version of the
installation instructions.

Description of error

My manim voice over started to throw this error all of a sudden:

JSONDecodeError: Invalid control character at line 4906 column 19 (char 147456)

Initially, it was working fine, but at some point, it started to raise that error. I have reinstalled the package and tried to run the examples on the repo but without success.

[02/11/23 10:20:15] ERROR module_ops.py:90
whatIsAVector is not in the script

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/jorgebrasil/PycharmProjects/manim/venv/lib/python3.9/site-packages/manim/cli/render/comma │
│ nds.py:115 in render │
│ │
│ 112 │ │ │ try: │
│ 113 │ │ │ │ with tempconfig({}): │
│ 114 │ │ │ │ │ scene = SceneClass() │
│ ❱ 115 │ │ │ │ │ scene.render() │
│ 116 │ │ │ except Exception: │
│ 117 │ │ │ │ error_console.print_exception() │
│ 118 │ │ │ │ sys.exit(1) │
│ │
│ /Users/jorgebrasil/PycharmProjects/manim/venv/lib/python3.9/site-packages/manim/scene/scene.py:2 │
│ 23 in render │
│ │
│ 220 │ │ """ │
│ 221 │ │ self.setup() │
│ 222 │ │ try: │
│ ❱ 223 │ │ │ self.construct() │
│ 224 │ │ except EndSceneEarlyException: │
│ 225 │ │ │ pass │
│ 226 │ │ except RerunSceneException as e: │
│ │
│ /Users/jorgebrasil/PycharmProjects/manim/main.py:21 in construct │
│ │
│ 18 │ │ │
│ 19 │ │ question = Text("What is a vector ?", weight=BOLD, color='ORANGE') │
│ 20 │ │ │
│ ❱ 21 │ │ with self.voiceover(text="""You can think of a vector in simple terms as a list │
│ 22 │ │ │ │ │ │ each item in this structure matters. │
│ 23 │ │ │ │ │ │ In machine learning, this will often be the case.""") as trac │
│ 24 │ │ │ self.play(Write(question)) │
│ │
│ /opt/homebrew/Caskroom/miniforge/base/lib/python3.9/contextlib.py:119 in enter │
│ │
│ 116 │ │ # they are only needed for recreation, which is not possible anymore │
│ 117 │ │ del self.args, self.kwds, self.func │
│ 118 │ │ try: │
│ ❱ 119 │ │ │ return next(self.gen) │
│ 120 │ │ except StopIteration: │
│ 121 │ │ │ raise RuntimeError("generator didn't yield") from None │
│ 122 │
│ │
│ /Users/jorgebrasil/PycharmProjects/manim/venv/lib/python3.9/site-packages/manim_voiceover/voiceo │
│ ver_scene.py:180 in voiceover │
│ │
│ 177 │ │ │
│ 178 │ │ try: │
│ 179 │ │ │ if text is not None: │
│ ❱ 180 │ │ │ │ yield self.add_voiceover_text(text, **kwargs) │
│ 181 │ │ │ elif ssml is not None: │
│ 182 │ │ │ │ yield self.add_voiceover_ssml(ssml, **kwargs) │
│ 183 │ │ finally: │
│ │
│ /Users/jorgebrasil/PycharmProjects/manim/venv/lib/python3.9/site-packages/manim_voiceover/voiceo │
│ ver_scene.py:63 in add_voiceover_text │
│ │
│ 60 │ │ │ │ "You need to call init_voiceover() before adding a voiceover." │
│ 61 │ │ │ ) │
│ 62 │ │ │
│ ❱ 63 │ │ dict_ = self.speech_service.wrap_generate_from_text(text, **kwargs) │
│ 64 │ │ tracker = VoiceoverTracker(self, dict, self.speech_service.cache_dir) │
│ 65 │ │ self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"])) │
│ 66 │ │ self.current_tracker = tracker │
│ │
│ /Users/jorgebrasil/PycharmProjects/manim/venv/lib/python3.9/site-packages/manim_voiceover/servic │
│ es/base.py:85 in wrap_generate_from_text │
│ │
│ 82 │ │ # Replace newlines with lines, reduce multiple consecutive spaces to single │
│ 83 │ │ text = " ".join(text.split()) │
│ 84 │ │ │
│ ❱ 85 │ │ dict = self.generate_from_text(text, cache_dir=None, path=path, **kwargs) │
│ 86 │ │ original_audio = dict_["original_audio"] │
│ 87 │ │ │
│ 88 │ │ # Check whether word boundaries exist and if not run stt │
│ │
│ /Users/jorgebrasil/PycharmProjects/manim/venv/lib/python3.9/site-packages/manim_voiceover/servic │
│ es/azure.py:135 in generate_from_text │
│ │
│ 132 │ │ │ }, │
│ 133 │ │ } │
│ 134 │ │ │
│ ❱ 135 │ │ cached_result = self.get_cached_result(input_data, cache_dir) │
│ 136 │ │ if cached_result is not None: │
│ 137 │ │ │ return cached_result │
│ 138 │
│ │
│ /Users/jorgebrasil/PycharmProjects/manim/venv/lib/python3.9/site-packages/manim_voiceover/servic │
│ es/base.py:166 in get_cached_result │
│ │
│ 163 │ def get_cached_result(self, input_data, cache_dir): │
│ 164 │ │ json_path = os.path.join(cache_dir / DEFAULT_VOICEOVER_CACHE_JSON_FILENAME) │
│ 165 │ │ if os.path.exists(json_path): │
│ ❱ 166 │ │ │ json_data = json.load(open(json_path, "r")) │
│ 167 │ │ │ for entry in json_data: │
│ 168 │ │ │ │ if entry["input_data"] == input_data: │
│ 169 │ │ │ │ │ return entry │
│ │
│ /opt/homebrew/Caskroom/miniforge/base/lib/python3.9/json/init.py:293 in load │
│ │
│ 290 │ To use a custom JSONDecoder subclass, specify it with the cls │
│ 291 │ kwarg; otherwise JSONDecoder is used. │
│ 292 │ """ │
│ ❱ 293 │ return loads(fp.read(), │
│ 294 │ │ cls=cls, object_hook=object_hook, │
│ 295 │ │ parse_float=parse_float, parse_int=parse_int, │
│ 296 │ │ parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) │
│ │
│ /opt/homebrew/Caskroom/miniforge/base/lib/python3.9/json/init.py:346 in loads │
│ │
│ 343 │ if (cls is None and object_hook is None and │
│ 344 │ │ │ parse_int is None and parse_float is None and │
│ 345 │ │ │ parse_constant is None and object_pairs_hook is None and not kw): │
│ ❱ 346 │ │ return _default_decoder.decode(s) │
│ 347 │ if cls is None: │
│ 348 │ │ cls = JSONDecoder │
│ 349 │ if object_hook is not None: │
│ │
│ /opt/homebrew/Caskroom/miniforge/base/lib/python3.9/json/decoder.py:337 in decode │
│ │
│ 334 │ │ containing a JSON document). │
│ 335 │ │ │
│ 336 │ │ """ │
│ ❱ 337 │ │ obj, end = self.raw_decode(s, idx=_w(s, 0).end()) │
│ 338 │ │ end = _w(s, end).end() │
│ 339 │ │ if end != len(s): │
│ 340 │ │ │ raise JSONDecodeError("Extra data", s, end) │
│ │
│ /opt/homebrew/Caskroom/miniforge/base/lib/python3.9/json/decoder.py:353 in raw_decode │
│ │
│ 350 │ │ │
│ 351 │ │ """ │
│ 352 │ │ try: │
│ ❱ 353 │ │ │ obj, end = self.scan_once(s, idx) │
│ 354 │ │ except StopIteration as err: │
│ 355 │ │ │ raise JSONDecodeError("Expecting value", s, err.value) from None │
│ 356 │ │ return obj, end │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
JSONDecodeError: Invalid control character at: line 4906 column 19 (char 147456)
(venv) (base) jorgebrasil@jorges-air manim % manim -pql --disable_caching main.py GTTSExample
Manim Community v0.17.2

[02/11/23 10:22:00] ERROR module_ops.py:90
GTTSExample is not in the script

System specifications

System Details

MackBook Air M1 2020 BigSur
RAM: 16
Python version 3.9:
Installed modules (provide output from pip list):

Additional comments

Type error when using RecorderService

Description of bug / unexpected behavior

After recording my voice successfully, I encounter this error:

/home/rehnertz/manim/lib/python3.10/site-packages/stable_whisper/whisper_word_level.py:190: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detected language: english
100%|██████████████████████████████████████████████████████████████████████████████| 0.7/0.7 [00:03<00:00,  4.87s/sec]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim/cli/render/commands.py:115 in render     │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim/scene/scene.py:223 in render             │
│                                                                                                  │
│    220 │   │   """                                                                               │
│    221 │   │   self.setup()                                                                      │
│    222 │   │   try:                                                                              │
│ ❱  223 │   │   │   self.construct()                                                              │
│    224 │   │   except EndSceneEarlyException:                                                    │
│    225 │   │   │   pass                                                                          │
│    226 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ /home/rehnertz/manim/scenes/Demo.py:10 in construct                                              │
│                                                                                                  │
│    7 │   def construct(self):                                                                    │
│    8 │   │   self.set_speech_service(RecorderService())                                          │
│    9 │   │                                                                                       │
│ ❱ 10 │   │   with self.voiceover(text="Test") as tracker:                                        │
│   11 │   │   │   self.play(Create(Circle()), run_time=tracker.duration)                          │
│   12 │   │   self.wait(1)                                                                        │
│   13                                                                                             │
│                                                                                                  │
│ /usr/lib/python3.10/contextlib.py:135 in __enter__                                               │
│                                                                                                  │
│   132 │   │   # they are only needed for recreation, which is not possible anymore               │
│   133 │   │   del self.args, self.kwds, self.func                                                │
│   134 │   │   try:                                                                               │
│ ❱ 135 │   │   │   return next(self.gen)                                                          │
│   136 │   │   except StopIteration:                                                              │
│   137 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   138                                                                                            │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/voiceover_scene.py:180 in      │
│ voiceover                                                                                        │
│                                                                                                  │
│   177 │   │                                                                                      │
│   178 │   │   try:                                                                               │
│   179 │   │   │   if text is not None:                                                           │
│ ❱ 180 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   181 │   │   │   elif ssml is not None:                                                         │
│   182 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   183 │   │   finally:                                                                           │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/voiceover_scene.py:63 in       │
│ add_voiceover_text                                                                               │
│                                                                                                  │
│    60 │   │   │   │   "You need to call init_voiceover() before adding a voiceover."             │
│    61 │   │   │   )                                                                              │
│    62 │   │                                                                                      │
│ ❱  63 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│    64 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    65 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    66 │   │   self.current_tracker = tracker                                                     │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/services/base.py:93 in         │
│ _wrap_generate_from_text                                                                         │
│                                                                                                  │
│    90 │   │   │   transcription_result = self._whisper_model.transcribe(                         │
│    91 │   │   │   │   str(Path(self.cache_dir) / original_audio), **self.transcription_kwargs    │
│    92 │   │   │   )                                                                              │
│ ❱  93 │   │   │   logger.info("Transcription: " + transcription_result["text"])                  │
│    94 │   │   │   word_boundaries = timestamps_to_word_boundaries(                               │
│    95 │   │   │   │   transcription_result["segments"]                                           │
│    96 │   │   │   )                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 'WhisperResult' object is not subscriptable

It seems to be a type error due to the change of Whisper. I tried to modify manim_voiceover/services/base.py to

def timestamps_to_word_boundaries(segments):
    word_boundaries = []
    current_text_offset = 0
    for segment in segments:
        # ===== MODIFIEED: Whisper 结构变化
        # for dict_ in segment["word_timestamps"]:
        for dict_ in segment["words"]:                  # <====== Key modified
        # =====
            word = dict_["word"]
            word_boundaries.append(
                {
                    # ===== MODIFIEED: Whisper 结构变化
                    # "audio_offset": int(dict_["timestamp"] * AUDIO_OFFSET_RESOLUTION),
                    "audio_offset": int(dict_["start"] * AUDIO_OFFSET_RESOLUTION),     # <====== Key modified
                    # =====
      ................

    def _wrap_generate_from_text(self, text: str, path: str = None, **kwargs) -> dict:
        # Replace newlines with lines, reduce multiple consecutive spaces to single
        text = " ".join(text.split())

        dict_ = self.generate_from_text(text, cache_dir=None, path=path, **kwargs)
        original_audio = dict_["original_audio"]

        # Check whether word boundaries exist and if not run stt
        if "word_boundaries" not in dict_ and self._whisper_model is not None:
            transcription_result = self._whisper_model.transcribe(
                str(Path(self.cache_dir) / original_audio), **self.transcription_kwargs
            )
            # ==== MODIFIED: whisper 结构变化
            transcription_result = transcription_result.ori_dict   # <====== Use original data(?)
            # ====
      ...........................

It seems to work.

Expected behavior

Successfully output the video with recorded voice.

How to reproduce the issue

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.gtts import GTTSService
from manim_voiceover.services.recorder import RecorderService

class Demo(VoiceoverScene):
    def construct(self):
        self.set_speech_service(RecorderService())

        with self.voiceover(text="Test") as tracker:
            self.play(Create(Circle()), run_time=tracker.duration)
        self.wait(1)

Then call

manim -pql Demo.py --disable_caching

Additional media files

Images/GIFs

Logs

Terminal output

manim -v DEBUG scenes/Demo.py --disable_caching
Manim Community v0.17.2

ALSA lib pcm_dmix.c:1032:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:1032:(snd_pcm_dmix_open) unable to open slave
-------------------------device list-------------------------
Input Device id  0  -  HDA Intel PCH: ALC256 Analog (hw:0,0)
Input Device id  18  -  Samson Go Mic: USB Audio (hw:2,0)
Input Device id  19  -  sysdefault
Input Device id  21  -  samplerate
Input Device id  22  -  speexrate
Input Device id  23  -  pulse
Input Device id  24  -  upmix
Input Device id  25  -  vdownmix
Input Device id  26  -  default
-------------------------------------------------------------
Please select an input device id to record from:
18
Selected device: Samson Go Mic: USB Audio (hw:2,0)
╔════════════╗
║ Voiceover: ║
║            ║
║ Test       ║
╚════════════╝
Press and hold the 'r' key to begin recording
Wait for 1 second, then start speaking.
Wait for at least 1 second after you finish speaking.
This is to eliminate any sounds that may come from your keyboard.
The silence at the beginning and end will be trimmed automatically.
You can adjust this setting using the `trim_silence_threshold` argument.
These instructions are only shown once.
Release the 'r' key to end recording
rStream active: True
start Stream
rrrrrrrrrrrrrrrrrrrrrrrFinished recording, saving to media/voiceovers/charlie-summer-virginia-salami.mp3
[03/29/23 04:58:48] INFO     Saved media/voiceovers/charlie-summer-virginia-salami.mp3                    helper.py:36
Press...
 l to [l]isten to the recording
 r to [r]e-record
 a to [a]ccept the recording

a
/home/rehnertz/manim/lib/python3.10/site-packages/stable_whisper/whisper_word_level.py:190: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detected language: english
100%|████████████████████████████████████████████████████████████████████████████| 0.65/0.65 [00:09<00:00, 14.56s/sec]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim/cli/render/commands.py:115 in render     │
│                                                                                                  │
│   112 │   │   │   try:                                                                           │
│   113 │   │   │   │   with tempconfig({}):                                                       │
│   114 │   │   │   │   │   scene = SceneClass()                                                   │
│ ❱ 115 │   │   │   │   │   scene.render()                                                         │
│   116 │   │   │   except Exception:                                                              │
│   117 │   │   │   │   error_console.print_exception()                                            │
│   118 │   │   │   │   sys.exit(1)                                                                │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim/scene/scene.py:223 in render             │
│                                                                                                  │
│    220 │   │   """                                                                               │
│    221 │   │   self.setup()                                                                      │
│    222 │   │   try:                                                                              │
│ ❱  223 │   │   │   self.construct()                                                              │
│    224 │   │   except EndSceneEarlyException:                                                    │
│    225 │   │   │   pass                                                                          │
│    226 │   │   except RerunSceneException as e:                                                  │
│                                                                                                  │
│ /home/rehnertz/manim/scenes/Demo.py:10 in construct                                              │
│                                                                                                  │
│    7 │   def construct(self):                                                                    │
│    8 │   │   self.set_speech_service(RecorderService())                                          │
│    9 │   │                                                                                       │
│ ❱ 10 │   │   with self.voiceover(text="Test") as tracker:                                        │
│   11 │   │   │   self.play(Create(Circle()), run_time=tracker.duration)                          │
│   12 │   │   self.wait(1)                                                                        │
│   13                                                                                             │
│                                                                                                  │
│ /usr/lib/python3.10/contextlib.py:135 in __enter__                                               │
│                                                                                                  │
│   132 │   │   # they are only needed for recreation, which is not possible anymore               │
│   133 │   │   del self.args, self.kwds, self.func                                                │
│   134 │   │   try:                                                                               │
│ ❱ 135 │   │   │   return next(self.gen)                                                          │
│   136 │   │   except StopIteration:                                                              │
│   137 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   138                                                                                            │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/voiceover_scene.py:180 in      │
│ voiceover                                                                                        │
│                                                                                                  │
│   177 │   │                                                                                      │
│   178 │   │   try:                                                                               │
│   179 │   │   │   if text is not None:                                                           │
│ ❱ 180 │   │   │   │   yield self.add_voiceover_text(text, **kwargs)                              │
│   181 │   │   │   elif ssml is not None:                                                         │
│   182 │   │   │   │   yield self.add_voiceover_ssml(ssml, **kwargs)                              │
│   183 │   │   finally:                                                                           │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/voiceover_scene.py:63 in       │
│ add_voiceover_text                                                                               │
│                                                                                                  │
│    60 │   │   │   │   "You need to call init_voiceover() before adding a voiceover."             │
│    61 │   │   │   )                                                                              │
│    62 │   │                                                                                      │
│ ❱  63 │   │   dict_ = self.speech_service._wrap_generate_from_text(text, **kwargs)               │
│    64 │   │   tracker = VoiceoverTracker(self, dict_, self.speech_service.cache_dir)             │
│    65 │   │   self.add_sound(str(Path(self.speech_service.cache_dir) / dict_["final_audio"]))    │
│    66 │   │   self.current_tracker = tracker                                                     │
│                                                                                                  │
│ /home/rehnertz/manim/lib/python3.10/site-packages/manim_voiceover/services/base.py:93 in         │
│ _wrap_generate_from_text                                                                         │
│                                                                                                  │
│    90 │   │   │   transcription_result = self._whisper_model.transcribe(                         │
│    91 │   │   │   │   str(Path(self.cache_dir) / original_audio), **self.transcription_kwargs    │
│    92 │   │   │   )                                                                              │
│ ❱  93 │   │   │   logger.info("Transcription: " + transcription_result["text"])                  │
│    94 │   │   │   word_boundaries = timestamps_to_word_boundaries(                               │
│    95 │   │   │   │   transcription_result["segments"]                                           │
│    96 │   │   │   )                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 'WhisperResult' object is not subscriptable

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Ubuntu 22.04
RAM: 16GB
Python version (python/py/python3 --version): 3.10.6
Installed modules (provide output from pip list):

Package                  Version
------------------------ ----------
autopep8                 2.0.2
certifi                  2022.12.7
charset-normalizer       3.1.0
click                    8.1.3
click-default-group      1.2.2
cloup                    0.13.1
cmake                    3.26.1
colour                   0.1.5
decorator                5.1.1
evdev                    1.6.1
ffmpeg-python            0.2.0
filelock                 3.10.7
future                   0.18.3
glcontext                2.3.7
gTTS                     2.3.1
huggingface-hub          0.13.3
humanhash3               0.0.6
idna                     3.4
isosurfaces              0.1.0
Jinja2                   3.1.2
lit                      16.0.0
llvmlite                 0.39.1
manim                    0.17.2
manim-voiceover          0.3.0
ManimPango               0.4.3
mapbox-earcut            1.0.1
markdown-it-py           2.2.0
MarkupSafe               2.1.2
mdurl                    0.1.2
moderngl                 5.8.1
moderngl-window          2.4.3
more-itertools           9.1.0
mpmath                   1.3.0
multipledispatch         0.6.0
mutagen                  1.46.0
networkx                 2.8.8
numba                    0.56.4
numpy                    1.23.5
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
openai-whisper           20230314
packaging                23.0
Pillow                   9.4.0
pip                      22.0.2
playsound                1.3.0
PyAudio                  0.2.13
pycairo                  1.23.0
pycodestyle              2.10.0
pydub                    0.25.1
pyglet                   2.0.5
Pygments                 2.14.0
PyGObject                3.44.1
pynput                   1.7.6
pyrr                     0.10.3
python-dotenv            0.21.1
python-xlib              0.33
PyYAML                   6.0
regex                    2023.3.23
requests                 2.28.2
rich                     13.3.2
scipy                    1.10.1
screeninfo               0.8.1
setuptools               59.6.0
six                      1.16.0
skia-pathops             0.7.4
sox                      1.4.1
srt                      3.5.2
stable-ts                2.1.2
svgelements              1.9.1
sympy                    1.11.1
tiktoken                 0.3.1
tokenizers               0.13.2
tomli                    2.0.1
torch                    2.0.0
torchaudio               2.0.1
tqdm                     4.65.0
transformers             4.27.3
triton                   2.0.0
typing_extensions        4.5.0
urllib3                  1.26.15
watchdog                 2.3.1
wheel                    0.40.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100

Additional comments

Using RecorderService manim-voiceover hangs (before or after releasing the 'r' key?)

Description of bug / unexpected behavior

I render a file with CoquiService and both manim and manim-voicover work correctly. Then I pass to RecorderService, I select the input device (e.g. 13 - default) and start recording pressing the 'r' key. When I release the 'r' key manim-voiceover hangs as if lost in an infinite idle cycle. In fact, inspecting folder ./media/voiceovers I see that no file has been produced, therefore I suspect that manim-voiceover hangs waiting the user to press 'r'...

Expected behavior

Manim-voiceover should start recording the voiceover as soon as the user presses 'r'. After releasing the 'r' key manim-voiceover should ask to choose from the following options:

l to [l]isten to the recording
r to [r]e-record
a to [a]ccept the recording

How to reproduce the issue

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene
# from manim_voiceover.services.gtts import GTTSService
# from manim_voiceover.services.coqui import CoquiService
from manim_voiceover.services.recorder import RecorderService
from math import *

class RecordVoiceover(VoiceoverScene):
    
    def construct(self):
        circle = Circle()

        # self.set_speech_service(GTTSService(lang='it',tld='it',transcription_model='base'))
        self.set_speech_service(RecorderService())
        # self.set_speech_service(CoquiService(
        #                                         model_name='tts_models/it/mai_male/glow-tts',
        #                                         transcription_model='base'
        #                                     )
        #                        )

        with self.voiceover(
                text='''Ora creo un <bookmark mark="A"/> cerchio,
                        poi lo muovo a <bookmark mark="B"/> destra
                        e infine lo <bookmark mark="C"/> elimino.
                     ''') as tracker:
            self.wait_until_bookmark('A')
            self.play(Create(circle,run_time=0.5))
            self.wait_until_bookmark('B')
            self.play(circle.animate.shift(RIGHT),run_time=0.5)
            self.wait_until_bookmark('C')
            self.play(FadeOut(circle),run_time=0.5)

Additional media files

Images/GIFs

Logs

Terminal output

PASTE HERE OR PROVIDE LINK TO https://pastebin.com/ OR SIMILAR

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Linux Ubuntu 23.04
RAM: 32 GB
Python version (python/py/python3 --version): 3.10.11
Installed modules (provide output from pip list):

Package                        Version
------------------------------ ------------
accelerate                     0.19.0
aiohttp                        3.8.4
aiosignal                      1.3.1
anyascii                       0.3.2
appdirs                        1.4.4
async-timeout                  4.0.2
attrs                          23.1.0
audioread                      3.0.0
azure-cognitiveservices-speech 1.29.0
Babel                          2.12.1
backports.cached-property      1.0.2
bangla                         0.0.2
blinker                        1.6.2
bnnumerizer                    0.0.2
bnunicodenormalizer            0.1.1
boltons                        23.0.0
brotlipy                       0.7.0
build                          0.10.0
CacheControl                   0.12.11
certifi                        2023.5.7
cffi                           1.15.1
charset-normalizer             3.1.0
clean-fid                      0.1.35
cleo                           2.0.1
click                          8.1.3
click-default-group            1.2.2
clip-anytorch                  2.5.2
cloup                          0.13.1
cmake                          3.26.3
colorama                       0.4.6
colour                         0.1.5
contourpy                      1.0.7
coqpit                         0.0.17
crashtest                      0.4.1
cryptography                   41.0.1
cycler                         0.11.0
Cython                         0.29.28
dataclasses                    0.8
dateparser                     1.1.8
decorator                      5.1.1
deepl                          1.14.0
distlib                        0.3.6
docker-pycreds                 0.4.0
docopt                         0.6.2
dulwich                        0.21.5
einops                         0.6.1
evdev                          1.6.1
ffmpeg-python                  0.2.0
filelock                       3.12.0
Flask                          2.3.2
fonttools                      4.39.4
frozenlist                     1.3.3
fsspec                         2023.5.0
ftfy                           6.1.1
future                         0.18.3
g2pkk                          0.1.2
gitdb                          4.0.10
GitPython                      3.1.31
glcontext                      2.3.7
gruut                          2.2.3
gruut-ipa                      0.13.0
gruut-lang-de                  2.0.0
gruut-lang-en                  2.0.0
gruut-lang-es                  2.0.0
gruut-lang-fr                  2.0.2
gTTS                           2.3.2
html5lib                       1.1
huggingface-hub                0.15.1
idna                           3.4
imageio                        2.31.0
importlib-metadata             6.6.0
importlib-resources            5.12.0
inflect                        5.6.0
installer                      0.7.0
isosurfaces                    0.1.0
itsdangerous                   2.1.2
jamo                           0.4.1
jaraco.classes                 3.2.3
jeepney                        0.8.0
jieba                          0.42.1
Jinja2                         3.1.2
joblib                         1.2.0
jsonlines                      1.2.0
jsonmerge                      1.9.0
jsonschema                     4.17.3
k-diffusion                    0.0.15
keyring                        23.13.1
kiwisolver                     1.4.4
kornia                         0.6.12
lazy_loader                    0.2
librosa                        0.10.0.post2
lit                            16.0.5.post0
llvmlite                       0.39.1
lockfile                       0.12.2
manim                          0.17.3
manim-voiceover                0.3.3.post0
ManimPango                     0.4.3
mapbox-earcut                  1.0.0
markdown-it-py                 2.2.0
MarkupSafe                     2.1.3
matplotlib                     3.7.1
mdurl                          0.1.0
mecab-python3                  1.0.5
moderngl                       5.8.2
moderngl-window                2.4.1
more-itertools                 9.1.0
mpmath                         1.3.0
msgpack                        1.0.5
multidict                      6.0.4
multipledispatch               0.6.0
mutagen                        1.46.0
networkx                       2.8.8
nltk                           3.8.1
num2words                      0.5.12
numba                          0.56.4
numpy                          1.23.5
nvidia-cublas-cu11             11.10.3.66
nvidia-cuda-cupti-cu11         11.7.101
nvidia-cuda-nvrtc-cu11         11.7.99
nvidia-cuda-runtime-cu11       11.7.99
nvidia-cudnn-cu11              8.5.0.96
nvidia-cufft-cu11              10.9.0.58
nvidia-curand-cu11             10.2.10.91
nvidia-cusolver-cu11           11.4.0.1
nvidia-cusparse-cu11           11.7.4.91
nvidia-nccl-cu11               2.14.3
nvidia-nvtx-cu11               11.7.91
openai-whisper                 20230314
packaging                      23.1
pandas                         2.0.2
pathtools                      0.1.2
pexpect                        4.8.0
Pillow                         9.5.0
pip                            23.1.2
pkginfo                        1.9.6
pkgutil_resolve_name           1.3.10
platformdirs                   3.5.1
poetry                         1.5.1
poetry-core                    1.6.1
poetry-plugin-export           1.4.0
pooch                          1.6.0
protobuf                       3.19.6
psutil                         5.9.5
ptyprocess                     0.7.0
PyAudio                        0.2.13
pycairo                        1.23.0
pycparser                      2.21
pydub                          0.25.1
pyglet                         1.5.27
Pygments                       2.15.1
pynndescent                    0.5.10
pynput                         1.7.6
pyOpenSSL                      23.2.0
pyparsing                      3.0.9
pypinyin                       0.49.0
pyproject_hooks                1.0.0
pyrr                           0.10.3
pyrsistent                     0.19.3
pysbd                          0.3.4
PySocks                        1.7.1
python-crfsuite                0.9.9
python-dateutil                2.8.2
python-dotenv                  0.21.1
python-slugify                 8.0.1
python-xlib                    0.33
pyttsx3                        2.90
pytz                           2023.3
PyWavelets                     1.4.1
PyYAML                         6.0
rapidfuzz                      2.15.1
regex                          2023.6.3
requests                       2.31.0
requests-toolbelt              1.0.0
resize-right                   0.0.2
rich                           13.4.1
scikit-image                   0.21.0
scikit-learn                   1.2.2
scipy                          1.10.1
screeninfo                     0.8.1
SecretStorage                  3.3.3
sentry-sdk                     1.25.0
setproctitle                   1.3.2
setuptools                     67.7.2
shellingham                    1.5.1
six                            1.16.0
skia-pathops                   0.7.4
smmap                          5.0.0
soundfile                      0.12.1
sox                            1.4.1
soxr                           0.3.5
srt                            3.5.2
stable-ts                      2.6.2
svgelements                    1.9.5
sympy                          1.12
tensorboardX                   2.6
text-unidecode                 1.3
threadpoolctl                  3.1.0
tifffile                       2023.4.12
tiktoken                       0.3.1
tokenizers                     0.13.3
tomli                          2.0.1
tomlkit                        0.11.8
torch                          2.0.1
torchaudio                     2.0.2
torchdiffeq                    0.2.3
torchsde                       0.2.5
torchvision                    0.15.2
tqdm                           4.65.0
trainer                        0.0.20
trampoline                     0.1.2
transformers                   4.29.2
triton                         2.0.0
trove-classifiers              2023.5.24
TTS                            0.14.3
typing_extensions              4.6.3
tzdata                         2023.3
tzlocal                        5.0.1
umap-learn                     0.5.1
unidic-lite                    1.0.8
urllib3                        1.26.15
virtualenv                     20.23.0
wandb                          0.15.4
watchdog                       2.2.1
wcwidth                        0.2.6
webencodings                   0.5.1
Werkzeug                       2.3.4
wheel                          0.40.0
yarl                           1.9.2
zipp                           3.15.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020): TeX Live 2022/Debian
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
configuration: --prefix=/tmp/build/80754af9/ffmpeg_1587154242452/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho --cc=/tmp/build/80754af9/ffmpeg_1587154242452/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-avresample --enable-gmp --enable-hardcoded-tables --enable-libfreetype --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --disable-nonfree --enable-gpl --enable-gnutls --disable-openssl --enable-libopenh264 --enable-libx264
libavutil      56. 31.100 / 56. 31.100
libavcodec     58. 54.100 / 58. 54.100
libavformat    58. 29.100 / 58. 29.100
libavdevice    58.  8.100 / 58.  8.100
libavfilter     7. 57.100 /  7. 57.100
libavresample   4.  0.  0 /  4.  0.  0
libswscale      5.  5.100 /  5.  5.100
libswresample   3.  5.100 /  3.  5.100
libpostproc    55.  5.100 / 55.  5.100

Additional comments

Coqui Bark support

Currently there is no way to use Bark using coqui because you need to specify voice_dir in the parameters and speakers can be strings. I don't quite know how i would change the Service to accomplish ease of use with this but for now you can get it to work by changing it manually in the source

Here a link for usage
https://tts.readthedocs.io/en/dev/models/bark.html#bark-model

Ability to reuse the recording for a low-quality video in a high-quality video

Description of proposed feature

Ability to reuse the recording for a low-quality video in a high-quality video

How can the new feature be used?

A user records voice for a low quality video (manim -ql), then they can reuse that voice for a high-quality video (manim -qh), perhaps by a command line argument.

Additional comments

wait_until_bookmark (Word boundaries for any audio)

When working with the wait_until_bookmark function with GTTSService, the following error occurs in the 'time_until_bookmark' function in the "manim_voiceover/init.py" file.

AttributeError: 'VoiceoverTracker' object has no attribute 'bookmark_times'

Print a warning when trying to render a Scene with >1 voiceovers without --disable_caching

People forget to use manim with --disable_caching — print out a warning at the second, third, ... voiceover if --disable-caching is not enabled.

"There is no bookmark"

Description of bug / unexpected behavior

I keep getting Exception: There is no <bookmark mark='BookmarkName' /> with no obvious connection between when it happens.

Expected behavior

Bookmarks are properly picked up

How to reproduce the issue

I don't know why it happens sometimes but not other times.

Code for reproducing the problem

self.set_speech_service(GTTSService(lang="en", tld="com", transcription_model='base', global_speed=1.2))
with self.voiceover(text="""With <bookmark mark='A'/>this result, 
                                    we can now go <bookmark mark='B'/>back and derive our formula for updating the kernel weights.""") as tracker:
            self.wait_until_bookmark("A")
            self.play(Indicate(s1_text), Unwrite(s2_text), Unwrite(s3_text), Unwrite(s4_text))
            self.wait_until_bookmark("B")
            self.play(Write(new_weights), s1_text.animate.to_edge(UP).to_edge(RIGHT)), s1_text.animate.scale(0.5)
            self.wait_for_voiceover()

doesn't work but

self.set_speech_service(GTTSService(lang="en", tld="com", transcription_model='base', global_speed=1.2))
with self.voiceover(text="""The kernel weights are updated, or trained, using the following formula<bookmark mark='A'/>, 
                                    where <bookmark mark='B' />alpha is the learning rate hyperparameter 
                                    and <bookmark mark='C'/>W is the kernel weight matrix.""") as tracker:
            self.play(Write(new_weights), run_time=tracker.time_until_bookmark("A"))
            self.wait_until_bookmark("B")
            self.play(Indicate(new_weights[0][7]))
            self.wait_until_bookmark("C")
            self.play(Indicate(new_weights[0][0:3]), Indicate(new_weights[0][4:6]))

Does

Logs

Terminal output

Nothing useful/relevant in the debug log.

Here's the error traceback: https://pastebin.com/pUGT9LF2

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Windows 10
RAM: 16GB
Python version (python/py/python3 --version): 3.10.12
Installed modules (provide output from pip list): https://pastebin.com/0YZ0fC7m

LaTeX details

LaTeX distribution (e.g. TeX Live 2020): MikTex
Installed LaTeX packages: (A lot, don't wanna take hundreds of screenshots)

FFMPEG

Output of ffmpeg -version:

ffmpeg version 4.2.3 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 9.3.1 (GCC) 20200523
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
libavutil      56. 31.100 / 56. 31.100
libavcodec     58. 54.100 / 58. 54.100
libavformat    58. 29.100 / 58. 29.100
libavdevice    58.  8.100 / 58.  8.100
libavfilter     7. 57.100 /  7. 57.100
libswscale      5.  5.100 /  5.  5.100
libswresample   3.  5.100 /  3.  5.100
libpostproc    55.  5.100 / 55.  5.100

Additional comments

run_time=tracker.duration causes some voice messages to be skipped

Description of bug / unexpected behavior

Using the "run_time=tracker.duration" (in either of the places in the attached code example) argument makes the second voiceover to not appear in the final mp4.

p.s. In other cases even without run_time=tracker.duration the second voiceover gets lost unless I add a "wait(5)" between the first and the second.

Expected behavior

To have all (two) voiceovers in the final mp4.

How to reproduce the issue

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.gtts import GTTSService

class TestScene(VoiceoverScene):
    def construct(self):
        self.set_speech_service(GTTSService())

        circle = Circle()
        with self.voiceover(text="First") as tracker:
            self.play(Create(circle), run_time=tracker.duration)

        square = Square()
        with self.voiceover(text="Second") as tracker:
            self.play(Create(square)) #, run_time=tracker.duration)

Additional media files

Images/GIFs

TestScene.mp4

Logs

Terminal output

 *  Executing task: python -m manim -ql /home/pesho/work/manim/minimal_example.py TestScene -v DEBUG 

Manim Community v0.16.0.post0

/usr/lib/python3.8/runpy.py:127: RuntimeWarning: 'manim.__main__' found in sys.modules after import of package 'manim', but prior to execution of 'manim.__main__'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
Manim Community v0.16.0.post0

/bin/sh: 1: sox: not found
SoX could not be found!

    If you do not have SoX, proceed here:
     - - - http://sox.sourceforge.net/ - - -

    If you do (or think that you should) have SoX, double-check your
    path variables.
    
[11/02/22 00:27:54] DEBUG    Hashing ...                                                                                                                                  hashing.py:343
                    DEBUG    Hashing done in 0.009107 s.                                                                                                                  hashing.py:355
                    DEBUG    Hash generated :  3163782288_2539344468_223132457                                                                                            hashing.py:358
                    INFO     Animation 0 : Using cached data (hash : 3163782288_2539344468_223132457)                                                               cairo_renderer.py:75
                    DEBUG    List of the first few animation hashes of the scene: ['3163782288_2539344468_223132457']                                               cairo_renderer.py:84
                    DEBUG    Animation with empty mobject                                                                                                               animation.py:173
                    DEBUG    Hashing ...                                                                                                                                  hashing.py:343
                    DEBUG    Hashing done in 0.004769 s.                                                                                                                  hashing.py:355
                    DEBUG    Hash generated :  2201830969_2464823014_440505530                                                                                            hashing.py:358
                    INFO     Animation 1 : Using cached data (hash : 2201830969_2464823014_440505530)                                                               cairo_renderer.py:75
                    DEBUG    List of the first few animation hashes of the scene: ['3163782288_2539344468_223132457', '2201830969_2464823014_440505530']            cairo_renderer.py:84
                    DEBUG    Hashing ...                                                                                                                                  hashing.py:343
                    DEBUG    Hashing done in 0.005147 s.                                                                                                                  hashing.py:355
                    DEBUG    Hash generated :  2201830969_757100158_2261372289                                                                                            hashing.py:358
                    INFO     Animation 2 : Using cached data (hash : 2201830969_757100158_2261372289)                                                               cairo_renderer.py:75
                    DEBUG    List of the first few animation hashes of the scene: ['3163782288_2539344468_223132457', '2201830969_2464823014_440505530',            cairo_renderer.py:84
                             '2201830969_757100158_2261372289']                                                                                                                         
                    DEBUG    Animation with empty mobject                                                                                                               animation.py:173
                    DEBUG    Hashing ...                                                                                                                                  hashing.py:343
                    DEBUG    Hashing done in 0.004034 s.                                                                                                                  hashing.py:355
                    DEBUG    Hash generated :  2201830969_138116783_2603944447                                                                                            hashing.py:358
                    INFO     Animation 3 : Using cached data (hash : 2201830969_138116783_2603944447)                                                               cairo_renderer.py:75
                    DEBUG    List of the first few animation hashes of the scene: ['3163782288_2539344468_223132457', '2201830969_2464823014_440505530',            cairo_renderer.py:84
                             '2201830969_757100158_2261372289', '2201830969_138116783_2603944447']                                                                                      
                    INFO     Combining to Movie file.                                                                                                           scene_file_writer.py:607
                    DEBUG    Partial movie files to combine (4 files):                                                                                          scene_file_writer.py:548
                             ['/home/pesho/work/manim/media/videos/minimal_example/480p15/partial_movie_files/TestScene/3163782288_2539344468_223132457.mp4',                           
                             '/home/pesho/work/manim/media/videos/minimal_example/480p15/partial_movie_files/TestScene/2201830969_2464823014_440505530.mp4',                            
                             '/home/pesho/work/manim/media/videos/minimal_example/480p15/partial_movie_files/TestScene/2201830969_757100158_2261372289.mp4',                            
                             '/home/pesho/work/manim/media/videos/minimal_example/480p15/partial_movie_files/TestScene/2201830969_138116783_2603944447.mp4']                            
                    INFO                                                                                                                                        scene_file_writer.py:728
                             File ready at '/home/pesho/work/manim/media/videos/minimal_example/480p15/TestScene.mp4'                                                                   
                                                                                                                                                                                        
                    INFO     Subcaption file has been written as /home/pesho/work/manim/media/videos/minimal_example/480p15/TestScene.srt                       scene_file_writer.py:723
                    INFO     Rendered TestScene                                                                                                                             scene.py:240
                             Played 4 animations

TestScene.srt:
1
00:00:00,000 --> 00:00:01,004
First

2
00:00:06,133 --> 00:00:07,113
Second

System specifications

System Details

OS (with version, e.g., Windows 10 v2004 or macOS 10.15 (Catalina)): Ubuntu 20.04.5 LTS
RAM: 16 GB
Python version (python/py/python3 --version): Python 3.8.10
Installed modules (provide output from pip list):

Package                Version             
---------------------- --------------------
anyio                  3.4.0               
appdirs                1.4.4               
argon2-cffi            21.2.0              
argon2-cffi-bindings   21.2.0              
attrs                  19.3.0              
Babel                  2.9.1               
backcall               0.1.0               
bcrypt                 3.1.7               
bleach                 3.1.1               
blinker                1.4                 
Brlapi                 0.7.0               
catfish                1.4.13              
certifi                2019.11.28          
cffi                   1.15.0              
chardet                3.0.4               
charset-normalizer     2.1.1               
click                  8.1.3               
click-default-group    1.2.2               
cloup                  0.13.1              
colorama               0.4.3               
colour                 0.1.5               
command-not-found      0.3                 
commonmark             0.9.1               
ConfigArgParse         1.5.3               
connection-pool        0.0.3               
cryptography           2.8                 
cupshelpers            1.0                 
cycler                 0.11.0              
datrie                 0.8.2               
dbus-python            1.2.16              
decorator              5.1.1               
defer                  1.0.6               
defusedxml             0.6.0               
distro                 1.4.0               
distro-info            0.23ubuntu1         
dnspython              1.16.0              
docutils               0.18.1              
entrypoints            0.3                 
filelock               3.4.2               
fonttools              4.28.3              
gitdb                  4.0.9               
GitPython              3.1.26              
glcontext              2.3.7               
gTTS                   2.2.4               
html5lib               1.0.1               
httplib2               0.14.0              
idna                   2.8                 
importlib-metadata     1.5.0               
instaloader            4.9.1               
ipykernel              5.2.0               
ipython                7.13.0              
ipython-genutils       0.2.0               
ipywidgets             6.0.0               
isosurfaces            0.1.0               
jedi                   0.15.2              
Jinja2                 2.10.1              
json5                  0.9.6               
jsonschema             3.2.0               
jupyter-client         6.1.2               
jupyter-console        6.0.0               
jupyter-core           4.6.3               
jupyter-server         1.13.0              
jupyterlab             3.2.4               
jupyterlab-server      2.8.2               
keyring                18.0.1              
kiwisolver             1.3.2               
language-selector      0.1                 
launchpadlib           1.10.13             
lazr.restfulclient     0.14.2              
lazr.uri               1.0.3               
louis                  3.12.0              
macaroonbakery         1.3.1               
Magnus                 1.0.3               
Mako                   1.1.0               
manim                  0.16.0.post0        
manim-voiceover        0.1.1               
ManimPango             0.4.1               
mapbox-earcut          0.12.11             
MarkupSafe             1.1.0               
matplotlib             3.5.0               
menulibre              2.2.1               
mistune                0.8.4               
moderngl               5.7.0               
moderngl-window        2.4.2               
more-itertools         4.2.0               
multipledispatch       0.6.0               
mutagen                1.46.0              
nbclassic              0.3.4               
nbconvert              5.6.1               
nbformat               5.0.4               
netifaces              0.10.4              
networkx               2.8.7               
notebook               6.0.3               
notify2                0.3                 
numpy                  1.21.4              
oauthlib               3.1.0               
olefile                0.46                
onboard                1.4.1               
openshot-qt            2.4.3               
packaging              21.3                
pandas                 1.3.4               
pandocfilters          1.4.2               
parso                  0.5.2               
pexpect                4.6.0               
pickleshare            0.7.5               
Pillow                 9.3.0               
pip                    20.0.2              
prometheus-client      0.7.1               
prompt-toolkit         2.0.10              
protobuf               3.6.1               
proton-client          0.7.1               
protonvpn-cli          3.13.0              
protonvpn-gui          1.11.0              
protonvpn-nm-lib       3.13.0              
psutil                 5.5.1               
ptyprocess             0.7.0               
PuLP                   2.6.0               
pycairo                1.21.0              
pycosat                0.6.3               
pycparser              2.21                
pycups                 1.9.73              
pydub                  0.25.1              
pyglet                 2.0b2               
Pygments               2.13.0              
PyGObject              3.36.0              
PyJWT                  1.7.1               
pymacaroons            0.13.0              
PyNaCl                 1.3.0               
pyOpenSSL              19.0.0              
pyparsing              3.0.6               
PyQt5                  5.14.1              
pyRFC3339              1.1                 
pyrr                   0.10.3              
pyrsistent             0.15.5              
python-apt             2.0.0+ubuntu0.20.4.8
python-dateutil        2.7.3               
python-debian          0.1.36ubuntu1       
python-dotenv          0.21.0              
python-gnupg           0.4.5               
python-xapp            1.8.1               
pythondialog           3.4.0               
pytz                   2019.3              
pyudev                 0.21.0              
pyxattr                0.6.1               
pyxdg                  0.26                
PyYAML                 5.3.1               
pyzmq                  18.1.1              
ratelimiter            1.2.0.post0         
reportlab              3.5.34              
requests               2.28.1              
requests-unixsocket    0.2.0               
rich                   12.6.0              
ruamel.yaml            0.17.21             
ruamel.yaml.clib       0.2.6               
scipy                  1.7.3               
screeninfo             0.8.1               
seaborn                0.11.2              
SecretStorage          2.3.1               
Send2Trash             1.5.0               
setproctitle           1.1.10              
setuptools             45.2.0              
setuptools-scm         6.3.2               
simplejson             3.16.0              
sip                    4.19.21             
six                    1.14.0              
skia-pathops           0.7.3               
smart-open             5.2.1               
smmap                  5.0.0               
snakemake              6.13.1              
sniffio                1.2.0               
sox                    1.4.1               
srt                    3.5.2               
stopit                 1.1.2               
systemd-python         234                 
tabulate               0.8.9               
terminado              0.12.1              
testpath               0.4.4               
tomli                  1.2.2               
toposort               1.7                 
tornado                6.1                 
tqdm                   4.64.1              
traitlets              4.3.3               
typing-extensions      4.4.0               
ubuntu-advantage-tools 27.11.2             
ubuntu-drivers-common  0.0.0               
ufw                    0.36                
unattended-upgrades    0.1                 
UpSetPlot              0.6.0               
urllib3                1.25.8              
usb-creator            0.3.7               
vboxapi                1.0                 
wadllib                1.3.3               
watchdog               2.1.9               
wcwidth                0.1.8               
webencodings           0.5.1               
websocket-client       1.2.3               
wheel                  0.34.2              
widgetsnbextension     2.0.0               
wrapt                  1.13.3              
xkit                   0.0.0               
youtube-dl             2020.3.24           
zipp                   1.0.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 31.100 / 56. 31.100
libavcodec     58. 54.100 / 58. 54.100
libavformat    58. 29.100 / 58. 29.100
libavdevice    58.  8.100 / 58.  8.100
libavfilter     7. 57.100 /  7. 57.100
libavresample   4.  0.  0 /  4.  0.  0
libswscale      5.  5.100 /  5.  5.100
libswresample   3.  5.100 /  3.  5.100
libpostproc    55.  5.100 / 55.  5.100

Additional comments

Both voice mp3 files are successfully created in tts/. I suspect that the issue is the manim silently drops sounds which are intersecting with each other.

I can't see start stream after press 'r' key and can't record my sound normally

Description of bug / unexpected behavior

I try to use RecorderService of the manim-voiceover on my Ubuntu22.04 OS Huawei Computer. After installation, I try to test it with the test code in the tutorial. However, it didn't work well.

Expected behavior

Then I watched the demostration video and found that I didn't get the same output.
input:
manim -pql recording.py --disable_caching

my output looks like:

Manim Community v0.18.0

/home/semikernel/anaconda3/envs/manim/lib/python3.11/site-packages/whisper/timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
-------------------------device list-------------------------
Input Device id  0  -  sof-hda-dsp: - (hw:0,0)
Input Device id  4  -  sof-hda-dsp: - (hw:0,6)
Input Device id  5  -  sof-hda-dsp: - (hw:0,7)
Input Device id  6  -  sysdefault
Input Device id  7  -  samplerate
Input Device id  8  -  speexrate
Input Device id  9  -  pulse
Input Device id  10  -  upmix
Input Device id  11  -  vdownmix
Input Device id  13  -  default
-------------------------------------------------------------
Please select an input device id to record from:
5
Selected device: sof-hda-dsp: - (hw:0,7)
╔══════════════════════════════════╗
║ Voiceover:                       ║
║                                  ║
║ This circle is drawn as I speak. ║
╚══════════════════════════════════╝
Press and hold the 'r' key to begin recording
Wait for 1 second, then start speaking.
Wait for at least 1 second after you finish speaking.
This is to eliminate any sounds that may come from your keyboard.
The silence at the beginning and end will be trimmed automatically.
You can adjust this setting using the `trim_silence_threshold` argument.
These instructions are only shown once.
Release the 'r' key to end recording
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr(I kept pressing 'r')

How to reproduce the issue

my testing code:

Code for reproducing the problem

from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.recorder import RecorderService

# Simply inherit from VoiceoverScene instead of Scene to get all the
# voiceover functionality.
class RecorderExample(VoiceoverScene):
    def construct(self):
        # You can choose from a multitude of TTS services,
        # or in this example, record your own voice:
        self.set_speech_service(RecorderService())

        circle = Circle()

        # Surround animation sections with with-statements:
        with self.voiceover(text="This circle is drawn as I speak.") as tracker:
            self.play(Create(circle), run_time=tracker.duration)
            # The duration of the animation is received from the audio file
            # and passed to the tracker automatically.

        # This part will not start playing until the previous voiceover is finished.
        with self.voiceover(text="Let's shift it to the left 2 units.") as tracker:
            self.play(circle.animate.shift(2 * LEFT), run_time=tracker.duration)

Additional media files

Images/GIFs

Logs

Terminal output

PASTE HERE OR PROVIDE LINK TO https://pastebin.com/ OR SIMILAR

System specifications

System Details

OS Ubuntu22.04.3 LTS
RAM:16GB
Python version Python 3.11.7
Installed modules (provide output from pip list):

Package                        Version
------------------------------ -----------
azure-cognitiveservices-speech 1.34.1
Brotli                         1.1.0
build                          1.0.3
CacheControl                   0.13.1
certifi                        2023.11.17
cffi                           1.16.0
charset-normalizer             3.3.2
cleo                           2.1.0
click                          8.1.7
click-default-group            1.2.4
cloup                          2.1.2
cmake                          3.28.1
colorama                       0.4.6
crashtest                      0.4.1
cryptography                   42.0.1
decorator                      5.1.1
deepl                          1.16.1
distlib                        0.3.8
dulwich                        0.21.7
evdev                          1.6.1
fastjsonschema                 2.19.1
ffmpeg-python                  0.2.0
filelock                       3.13.1
fsspec                         2023.12.2
future                         0.18.3
glcontext                      2.5.0
gTTS                           2.5.0
huggingface-hub                0.20.3
idna                           3.6
importlib-metadata             7.0.1
installer                      0.7.0
isosurfaces                    0.1.0
jaraco.classes                 3.3.0
jeepney                        0.8.0
Jinja2                         3.1.3
keyring                        24.3.0
lit                            17.0.6
llvmlite                       0.41.1
manim                          0.18.0
manim-voiceover                0.3.4.post1
ManimPango                     0.5.0
mapbox-earcut                  1.0.1
markdown-it-py                 3.0.0
MarkupSafe                     2.1.4
mdurl                          0.1.2
moderngl                       5.9.0
moderngl-window                2.4.1
more-itertools                 10.2.0
mpmath                         1.3.0
msgpack                        1.0.7
multipledispatch               0.6.0
mutagen                        1.47.0
networkx                       3.2.1
numba                          0.58.1
numpy                          1.26.3
nvidia-cublas-cu11             11.10.3.66
nvidia-cuda-cupti-cu11         11.7.101
nvidia-cuda-nvrtc-cu11         11.7.99
nvidia-cuda-runtime-cu11       11.7.99
nvidia-cudnn-cu11              8.5.0.96
nvidia-cufft-cu11              10.9.0.58
nvidia-curand-cu11             10.2.10.91
nvidia-cusolver-cu11           11.4.0.1
nvidia-cusparse-cu11           11.7.4.91
nvidia-nccl-cu11               2.14.3
nvidia-nvtx-cu11               11.7.91
openai-whisper                 20230314
packaging                      23.2
pexpect                        4.9.0
Pillow                         9.5.0
pip                            23.3.2
pkginfo                        1.9.6
platformdirs                   3.11.0
poetry                         1.7.1
poetry-core                    1.8.1
poetry-plugin-export           1.6.0
ptyprocess                     0.7.0
PyAudio                        0.2.14
pycairo                        1.25.1
pycparser                      2.21
pydub                          0.25.1
pyglet                         1.5.27
Pygments                       2.17.2
pynput                         1.7.6
pyproject_hooks                1.0.0
pyrr                           0.10.3
PySocks                        1.7.1
python-dotenv                  0.21.1
python-slugify                 8.0.2
python-xlib                    0.33
pyttsx3                        2.90
PyYAML                         6.0.1
rapidfuzz                      3.6.1
regex                          2023.12.25
requests                       2.31.0
requests-toolbelt              1.0.0
rich                           13.7.0
safetensors                    0.4.2
scipy                          1.12.0
screeninfo                     0.8.1
SecretStorage                  3.3.3
setuptools                     69.0.3
shellingham                    1.5.4
six                            1.16.0
skia-pathops                   0.8.0.post1
sox                            1.4.1
srt                            3.5.3
stable-ts                      2.11.1
svgelements                    1.9.6
sympy                          1.12
text-unidecode                 1.3
tiktoken                       0.3.1
tokenizers                     0.15.1
tomli                          2.0.1
tomlkit                        0.12.3
torch                          2.0.1
torchaudio                     2.0.2
tqdm                           4.66.1
transformers                   4.37.1
triton                         2.0.0
trove-classifiers              2024.1.8
typing_extensions              4.9.0
urllib3                        2.1.0
virtualenv                     20.25.0
watchdog                       2.3.1
wheel                          0.42.0
zipp                           3.17.0

LaTeX details

LaTeX distribution (e.g. TeX Live 2020):
Installed LaTeX packages:

FFMPEG

Output of ffmpeg -version:

PASTE HERE

Additional comments

RecordingService does not record if device_index is not default

Description of bug / unexpected behavior

I am using the RecorderService with a non default device_index. In my case the microphone has index 8. I finished recording and entered l to listen the recorded file. Unfortunately the animation crashed. I have have added the terminal output in the Log section below.

I believe modifying RecordingService._record_task as follows, may fix the issue - add line input_device_index=self.device_index

   def _record_task(self, path):
        if self.listener.key_pressed and not self.started:
            # Start the recording
            try:
                self.stream = self.audio.open(
                    format=self.format,
                    channels=self.channels,
                    rate=self.rate,
                    input=True,
                    frames_per_buffer=self.chunk,
                    stream_callback=self.callback,
                    input_device_index=self.device_index
                )
                print("Stream active:", self.stream.is_active())
                self.started = True
                print("start Stream")
            except:
                raise

            self.task.enter(self.callback_delay, 1, self._record_task, ([path]))

Expected behavior

How to reproduce the issue

Please notice self.set_speech_service(RecorderService(device_index=8)) with the keyword argument. I got the same behavior without setting the device_index directly in the code, but rather input it during runtime.

Code for reproducing the problem

class GTTSExample(VoiceoverScene):
    def construct(self):
        # self.set_speech_service(PyTTSX3Service())
        self.set_speech_service(RecorderService(device_index=8))

        circle = Circle()
        square = Square().shift(2 * RIGHT)

        with self.voiceover(text="This circle is drawn as I speak.") as tracker:
            self.play(Create(circle), run_time=tracker.duration)

        with self.voiceover(text="Let's shift it to the left 2 units.") as tracker:
            self.play(circle.animate.shift(2 * LEFT), run_time=tracker.duration)

        with self.voiceover(text="Now, let's transform it into a square.") as tracker:
            self.play(Transform(circle, square), run_time=tracker.duration)

        with self.voiceover(text="Thank you for watching."):
            self.play(Uncreate(circle))

        self.wait()

Logs

Terminal output

```

CouldntDecodeError: Decoding failed. ffmpeg returned error code: 1

Output from ffmpeg/avlib:

ffmpeg version n6.0 Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 13.1.1 (GCC) 20230429
configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-amf --enable-avisynth --enable-cuda-llvm --enable-lto --enable-fontconfig
--enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libdav1d --enable-libdrm
--enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libjxl --enable-libmfx --enable-libmodplug --enable-libmp3lame
--enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librav1e --enable-librsvg
--enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-libvorbis
--enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-nvdec --enable-nvenc
--enable-opencl --enable-opengl --enable-shared --enable-version3 --enable-vulkan
libavutil 58. 2.100 / 58. 2.100
libavcodec 60. 3.100 / 60. 3.100
libavformat 60. 3.100 / 60. 3.100
libavdevice 60. 1.100 / 60. 1.100
libavfilter 9. 3.100 / 9. 3.100
libswscale 7. 1.100 / 7. 1.100
libswresample 4. 10.100 / 4. 10.100
libpostproc 57. 1.100 / 57. 1.100
[mp3 @ 0x5638d47f7f00] Format mp3 detected only with low score of 1, misdetection possible!
[mp3 @ 0x5638d47f7f00] Failed to read frame size: Could not seek to 1026.
media/voiceovers/this-circle-is-drawn-as-i-speak-556a988c.mp3: Invalid argument

</details>

## Additional comments
After fixing `_record_task` the expected mp3 file was correctly written to the filesystem. However after accepting the recording with 'a' the following exception appeared.

/home/mk/.manim-venv/lib/python3.11/site-packages/_distutils_hack/__init__.py:77 in do_override  │
│                                                                                                  │
│    74 │   """                                                                                    │
│    75 │   if enabled():                                                                          │
│    76 │   │   warn_distutils_present()                                                           │
│ ❱  77 │   │   ensure_local_distutils()                                                           │
│    78                                                                                            │
│    79                                                                                            │
│    80 class _TrivialRe:                                                                          │
│                                                                                                  │
│ /home/mk/.manim-venv/lib/python3.11/site-packages/_distutils_hack/__init__.py:64 in              │
│ ensure_local_distutils                                                                           │
│                                                                                                  │
│    61 │                                                                                          │
│    62 │   # check that submodules load as expected                                               │
│    63 │   core = importlib.import_module('distutils.core')                                       │
│ ❱  64 │   assert '_distutils' in core.__file__, core.__file__                                    │
│    65 │   assert 'setuptools._distutils.log' not in sys.modules                                  │
│    66                                                                                            │
│    67                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: /usr/lib/python3.11/distutils/core.py
(.manim-venv) [mk@archlinux eec]$ 

I know this happens when importing `triton`, but so far I know no workaround.
I am using `Python 3.11.3`

_StitcherService saves files with wrong file name

Description of bug / unexpected behavior

When using the _StitcherService, an error occurs, because the hashed audiofile is searched for in the directory media/voiceovers/media/voiceovers instead of media/voiceovers.

How to reproduce the issue

Since the error lies in the search of an audiofile, the most basic example reproduces the error.

Code for reproducing the problem

class Example(VoiceoverScene):
    def construct(self):
        self.set_speech_service(_StitcherService("my_voice_recording.mp3"))
        with self.voiceover("Test"):
            pass

Additional media files

Images/GIFs

System specifications

irrelevant

Additional comments

The error can be fixed as follows: replace line 150 in stitcher.py by

            output_dict["segments"].append({"index": i, "path": data_hash + ".mp3"})

Show error message that Wayland is not supported

pynput does not support Wayland. Unfortunately, it took me a while to figure out why the RecorderService did not work.

I got the message: The extra packages required by manim_voiceover[recorder] are not installed .... This is a misleading error, since they were installed, but pynput did not load. Pynput did not load because it does not work on Wayland.

I would suggest adding something like this to the helper.py at around line 162 in the error handling block:

        if "failed to acquire X connection" in str(e):
            raise ImportError(
                f'Wayland is not supported!'
            )