cbh123 / narrator Goto Github PK

View Code? Open in Web Editor NEW

4.3K 4.3K 535.0 211 KB

David Attenborough narrates your life

Python 100.00%

narrator's People

Contributors

Stargazers

Watchers

Forkers

ric2z mindkhichdi brennenawana touristshaun debjyotigorai ciccarone vr000m dgilperez andre15silva doriandarko pastaghost hirajanwin jarekmor femiadebogun ml-ovox mattm csmimrankhan tonypls jithinraj andrewyu0 alome007 jeffara codingkevin sasan-j davisgcii jagamypriera lancetw quacrobat lucianosp dietrich-stein 10krotator jawond endagegnehu vincentsider aitoapps svandragt surim0n emarkensten jessezondervan nicabarnimble tomchapin gui13 hololeo henrii1 stephenroddy hatgit mustaphau mwvaughn kelseywood rasmushjulskov anandvc kisio giladoved chonki emilot rocinante42 kwesikwaa 0xm1kr markopolo123 adamsaper ovidb gideonuchehara raviprasadmr jimmylv umerazad jmandzik farshad-vgw joshuavoydik ideabrian tejallam codybontecou raphaelsr wollerman sxakil vladih hbcbh1999 ruthvik-17 rkp64 sinakarimi7 shelbaz koisose faducoder mykeln prnv30 niksmac sergeicu mdomorffaruk taocao seyedhashtag shivamsinha15 oxxio ajd2 ulukanu thelyoncrypt luisgurmendez fordnox fuho c00renut o7renebro willkhoza

narrator's Issues

Simple guide

This is a very cool project. .... is there a complete step by step guide for this as all the additional bits to set up Replicate, voice in Elevenlab etc have lost me a bit...

🤞🤘

Can it start processing the next frame when the previous frame's audio starts playing?

It would be nice if the pause between narrations was shorter and more natural. (Also, do you know of a method to keep the context/memory of previous frames?)

Can't give descriptions of individuals in the image provided

🎙️ David says:
As I am not allowed to provide descriptions or any other details regarding the individuals in the image you’ve provided, I can't assist with your request. If you have a different kind of inquiry not involving personal details, feel free to ask!

any way around this? I assume this is a new limitation of openAI?

no such file or directory ./frames

Is Replicate necessary?

The README directs users to create a Replicate account and set an API key, but I skipped those steps and (other issues aside), the script worked fine. If that's true, the README can be simplified to remove the Replicate steps.

API KEY error

Hello,
anyone else having similar problem to mine? I entered the api_key for OpenAI as an environmental variable with cmd command setx OPENAI_API_KEY "sk-nU..." and also tried adding a system variable but I always get the same error when launching narrator.py:

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or
by setting the OPENAI_API_KEY environment variable

Any tips what would be the problem?

Add pillow and simpleaudio in requirements?

On Mac, Python 3.10.13,
i had to pip install pillow and simpleaudio.
Now i have
Pillow 10.1.0
simpleaudio 1.0.4

David Attonborough voice not available

elevenlabs.api.error.APIError: A voice for the voice_id ENfvYmv6CRqDodDZTieQ was not found.

it doesn't seem to be listed here also:

https://api.elevenlabs.io/v1/voices

gpt-4-vision-preview` does not exist

After using the install methode and adding the API keys I receive this error.

👀 David is watching...

The model gpt-4-vision-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

Current Implementation on cbh123:main Not Functional

The current implementation of the Narrator project on the cbh123:main branch is not functional due to outdated API integrations and missing environment variable handling.

This issue has been addressed and fixed in the following pull request: Update ElevenLabs API Integration, Enhance Security, and Improve Narrator Functionality #53.

To resolve the issues and ensure a working application, please pull in the latest changes from the following forked repository: mgennings/narrator.

These updates include:

Compatibility with the latest ElevenLabs API.
Improved handling of environment variables using a .env file.
Enhanced performance and readability of the code.

Thank you for your attention to this matter.

Context only tracks assistant

narrator/narrator.py

Line 94 in 4bab104

script = script + [{"role": "assistant", "content": analysis}]

I'm not sure the best way to include user prompts in the messages history here, since you don't want to include the actual image every time, but a placeholder of some sort that shows the LLM was prompted into the given response may help consistency and avoid an 'intro' every time, I'm not certain.

Something like:
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image (user uploaded image)"},
],
},

No such file or directory: './frames/frame.jpg'

capture.py seems to be capturing the images (webcam is on, console repeating "Say Cheese"

However when running narrator.py in a different terminal, I am getting an error message saying that frames/frame.jpg doesn't exist

I do not see this directory or files

Narration is always starting with "this image"

Is there a way to change the prompt for example to make it so that each result isn't starting with "this image"?

运行在什么系统上，linux?unbantu?windows?centos?

No module named 'PIL'

python capture.py
Traceback (most recent call last):
  File "/Users/jr/Documents/hobby/narrator/capture.py", line 3, in <module>
    from PIL import Image
ModuleNotFoundError: No module named 'PIL'

Not entirely sure what the issue is, wrong Python version? I'm on Python 3.11.4. Tried installing PIL without success:

pip3 install PIL
ERROR: Could not find a version that satisfies the requirement PIL (from versions: none)
ERROR: No matching distribution found for PIL

TypeError: str expected, not NoneType

When I run Narrate.py I get the following:

Traceback (most recent call last):
File "/Users/ahmed/Desktop/Dev/narrate/narrator/narrator.py", line 12, in
set_api_key(os.environ.get("ELEVENLABS_API_KEY"))
File "/Users/ahmed/Desktop/Dev/narrate/venv/lib/python3.10/site-packages/elevenlabs/simple.py", line 17, in set_api_key
os.environ["ELEVEN_API_KEY"] = api_key
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/os.py", line 684, in setitem
value = self.encodevalue(value)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/os.py", line 756, in encode
raise TypeError("str expected, not %s" % type(value).name)
TypeError: str expected, not NoneType

Feature: Mobile

You know what would be cool? having this stuff run on a mobile phone, do you think that would be possible?
Like going in a trip and starting this and using the camera of the phone. That would be so handy!

Thank you!

Issue with Elevenlabs keys & ids

I'm able to get everything to work, but I run into problems with the Eleven Labs voice reading the text.
I do have a paid account and the voice ID, but I can't get it recognized.

I get the following error after it displays text that correctly describes the image.
I followed the @mgennings instructions for creating a .env file to help streamline that process of setting the keys, but still no luck.

  File "C:\narrator-main\narrator-main\narrator.py", line 105, in <module>
    main()
  File "C:\narrator-main\narrator-main\narrator.py", line 96, in main
    play_audio(analysis)
  File "C:\narrator-main\narrator-main\narrator.py", line 31, in play_audio
    audio = generate(text, voice=os.environ.get("ELEVENLABS_VOICE_ID"))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\exhibits\AppData\Roaming\Python\Python312\site-packages\elevenlabs\simple.py", line 61, in generate
    assert isinstance(voice, Voice)
           ^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

AssertionError

My apologies if this is a silly problem with an easy fix, but I promise I have googled it and could not fix it.
The capture.py runs smoothly for me, but the narrator.py throws the following error; and it does not depend on the Eleven Labs audio id that I use.

I'd really appreciate any help you'd be able to offer :)

File "\narrator-main\narrator.py", line 114, in
main()

File "\narrator-main\narrator.py", line 105, in main
play_audio(analysis)

File "\narrator-main\narrator.py", line 40, in play_audio
audio = generate(text, voice=os.environ.get("7Wqa3tuynJ4uUcRnTwAI"))

File "\anaconda3\lib\site-packages\elevenlabs\simple.py", line 61, in generate
assert isinstance(voice, Voice)

AssertionError

Build error on macOS- clang: error: invalid arch name '-arch root:xnu-10002.60.71.505.1~3/RELEASE_ARM64_T6000'

And people wonder why Nix is necessary... /eyeroll

Building wheels for collected packages: simpleaudio
  Building wheel for simpleaudio (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [22 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-10.9-universal2-cpython-39
      creating build/lib.macosx-10.9-universal2-cpython-39/simpleaudio
      copying simpleaudio/__init__.py -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio
      copying simpleaudio/shiny.py -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio
      copying simpleaudio/functionchecks.py -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio
      creating build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/c.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/e.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/g.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/left_right.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/notes_2_16_44.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      running build_ext
      building 'simpleaudio._simpleaudio' extension
      creating build/temp.macosx-10.9-universal2-cpython-39
      creating build/temp.macosx-10.9-universal2-cpython-39/c_src
      clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/Headers -Werror=implicit-function-declaration -Wno-error=unreachable-code -arch root:xnu-10002.60.71.505.1~3/RELEASE_ARM64_T6000 -DDEBUG=0 -I/Users/pmarreck/Documents/narrator/venv/include -I/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/include/python3.9 -c c_src/posix_mutex.c -o build/temp.macosx-10.9-universal2-cpython-39/c_src/posix_mutex.o -mmacosx-version-min=10.6
      clang: error: invalid arch name '-arch root:xnu-10002.60.71.505.1~3/RELEASE_ARM64_T6000'
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for simpleaudio
  Running setup.py clean for simpleaudio
Failed to build simpleaudio
ERROR: Could not build wheels for simpleaudio, which is required to install pyproject.toml-based projects

ELEVEN_API_KEY, not ELEVENLABS_API_KEY

In README.md,
change:

export OPENAI_API_KEY=<token>
export ELEVENLABS_API_KEY=<eleven-token>

export OPENAI_API_KEY=<token>
export ELEVEN_API_KEY=<eleven-token>

UnicodeEncodeError: 'ascii' codec can't encode character '\u201d' in position 58: ordinal not in range(128)

Got this error but no clue...anyone?

👀 David is watching...
Traceback (most recent call last):
File "/narrator/narrator.py", line 102, in
main()
File "/narrator/narrator.py", line 88, in main
analysis = analyze_image(base64_image, script=script)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/narrator/narrator.py", line 57, in analyze_image
response = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_utils/_utils.py", line 299, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 556, in create
return self._post(
^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 1055, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 834, in request
return self._request(
^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 854, in _request
request = self._build_request(options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 435, in _build_request
headers = self._build_headers(options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 393, in _build_headers
headers = httpx.Headers(headers_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/httpx/_models.py", line 70, in init
self._list = [
^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/httpx/_models.py", line 74, in
normalize_header_value(v, encoding),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/httpx/_utils.py", line 53, in normalize_header_value
return value.encode(encoding or "ascii")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'ascii' codec can't encode character '\u201d' in position 58: ordinal not in range(128)

OpenAI no longer provides descriptions of images will 'real people'

OpenAI now returns the following error: I'm sorry, but I'm not able to provide visual descriptions of images with real people. If you have any other questions or need information on a different topic, feel free to ask!

Feature: subprocess calls

Hi,

The calls to the processes can be simplified for the end-user, may I get access to open a branch an a PR to simplify this?

Thanks!

Reduce Rate Requests - errors out with pro plan

I have a paid account with OPEN AI but recieve this error message, suggesting I am exceeding my rate limit.

The limits for (gpt-4)[https://platform.openai.com/account/limits] are:

gpt-4	10,000 TPM	3 RPM200 RPD

My opportunity is:

I have the pro account and want to use the repo
I cannot use the repo right now.

, line 877, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

Beginner's issue

Hello,

This is my first ever "programming" experience and I am having some issues, here is what I get when I try to run narrator.py

What can I do to solve this?

👀 David is watching...
Traceback (most recent call last):
File "/Users/ME/projectai/narrator/narrator.py", line 102, in
main()
File "/Users/ME/projectai/narrator/narrator.py", line 88, in main
analysis = analyze_image(base64_image, script=script)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ME/projectai/narrator/narrator.py", line 57, in analyze_image
response = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/_utils/_utils.py", line 299, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 556, in create
return self._post(
^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/_base_client.py", line 1055, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/_base_client.py", line 834, in request
return self._request(
^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/_base_client.py", line 877, in _request
raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model gpt-4-vision-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
(narrator) ME@MacBook-Pro-de-ME narrator %

Scripts running fine - but audio not playing

Everything seems to be working:
Getting "📸 Say cheese! Saving frame." and "🎙️ David says:" in the terminal in VSC.

But no audio is playing, not sure how to trouble-shoot. Any ideas? 😊

"expected an object, but got a string instead"

when running narrator.py, the error message below comes up.

openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model gpt-4-vision-preview has been deprecated, learn more here: https://platform.openai.com/docs/deprecations', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

looking at https://platform.openai.com/docs/deprecations, it shows that 'gpt-4-vision-preview's recommended replacement is 'gpt-4o'

When I replace the above in narrator.py line 58 with 'gpt-4o', the new error message I get is

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid type for 'messages[1].content[1].image_url': expected an object, but got a string instead.", 'type': 'invalid_request_error', 'param': 'messages[1].content[1].image_url', 'code': 'invalid_type'}}

Anyone know a fix for this?

SimpleAudio incompatible

Trying to run narrator and getting this issue:

import simpleaudio as sa
File "/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/init.py", line 1, in
from simpleaudio.shiny import *
File "/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/shiny.py", line 5, in
import simpleaudio._simpleaudio as _sa
ImportError: dlopen(/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/_simpleaudio.cpython-311-darwin.so, 0x0002): tried: '/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/_simpleaudio.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/_simpleaudio.cpython-311-darwin.so' (no such file), '/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/_simpleaudio.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

Tried to arch -arm64 pip install simpleaudio but didn't work.

Pydantic Version Error?

I get the following error when I run it, do I need to change the version of pydantic or elevenlabs for it to work?

Traceback (most recent call last):
File "/Users/Documents/GitHub/narrator/narrator.py", line 8, in
from elevenlabs import generate, play, set_api_key, voices
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/init.py", line 1, in
from .api import * # noqa F403
^^^^^^^^^^^^^^^^^^
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/api/init.py", line 2, in
from .history import * # noqa F403
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/api/history.py", line 6, in
from pydantic import model_validator
ImportError: cannot import name 'model_validator' from 'pydantic' (/Users/anaconda3/lib/python3.11/site-packages/pydantic/init.cpython-311-darwin.so)
(base) @iMac-2 narrator % python narrator.py
Traceback (most recent call last):
File "/Users/Documents/GitHub/narrator/narrator.py", line 8, in
from elevenlabs import generate, play, set_api_key, voices
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/init.py", line 1, in
from .api import * # noqa F403
^^^^^^^^^^^^^^^^^^
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/api/init.py", line 2, in
from .history import * # noqa F403
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/api/history.py", line 6, in
from pydantic import model_validator
ImportError: cannot import name 'model_validator' from 'pydantic' (/Users/anaconda3/lib/python3.11/site-packages/pydantic/init.cpython-311-darwin.so)

uhm, just a noob who needs help

What python version do i need for this?

Simple audio needs 3.8 but i found that another module required 3.9 so i was not able to get it running. Any help is greatly appreciated.

A voice for the voice_id was not found

Hi! Thanks for hacking this one, it's super cool :)

Narrator doesn't work for me, the elevenpath API always returns

elevenlabs.api.error.APIError: A voice for the voice_id XXX was not found.

where XXX is the voiceId of my freshly created voice.

any ideas? I can't find the error in their docs

FFMPEG not found

Getting this when running narrator:

ValueError: ffplay from ffmpeg not found, necessary to play audio. On mac you can install it with 'brew install ffmpeg'. On linux and windows you can install it from https://ffmpeg.org/

Shouldn't ffmpeg be in the requirements.txt? Where should the executables be placed?

Thanks!

API Keys

Where can I add my API keys so that I don't need to export them every time I launch the project?

Blank image from capture

I am getting a weird blank image from capture. Not what is actually from my webcam and my webcam is not activated during. Any ideas?