Giter VIP home page Giter VIP logo

narrator's Introduction

David Attenborough narrates your life.

https://twitter.com/charliebholtz/status/1724815159590293764

Want to make your own AI app?

Check out Replicate. We make it easy to run machine learning models with an API.

Setup

Clone this repo, and setup and activate a virtualenv:

python3 -m pip install virtualenv
python3 -m virtualenv venv
source venv/bin/activate

Then, install the dependencies: pip install -r requirements.txt

Make a Replicate, OpenAI, and ElevenLabs account and set your tokens:

export OPENAI_API_KEY=<token>
export ELEVENLABS_API_KEY=<eleven-token>

Make a new voice in Eleven and get the voice id of that voice using their get voices API, or by clicking the flask icon next to the voice in the VoiceLab tab.

export ELEVENLABS_VOICE_ID=<voice-id>

Run it!

In on terminal, run the webcam capture:

python capture.py

In another terminal, run the narrator:

python narrator.py

narrator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

narrator's Issues

Can't give descriptions of individuals in the image provided

πŸŽ™οΈ David says:
As I am not allowed to provide descriptions or any other details regarding the individuals in the image you’ve provided, I can't assist with your request. If you have a different kind of inquiry not involving personal details, feel free to ask!

any way around this? I assume this is a new limitation of openAI?

SimpleAudio incompatible

Trying to run narrator and getting this issue:

import simpleaudio as sa
File "/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/init.py", line 1, in
from simpleaudio.shiny import *
File "/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/shiny.py", line 5, in
import simpleaudio._simpleaudio as _sa
ImportError: dlopen(/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/_simpleaudio.cpython-311-darwin.so, 0x0002): tried: '/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/_simpleaudio.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/_simpleaudio.cpython-311-darwin.so' (no such file), '/Users/jakeboyles/Documents/repos/narrator/venv/lib/python3.11/site-packages/simpleaudio/_simpleaudio.cpython-311-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

Tried to arch -arm64 pip install simpleaudio but didn't work.

TypeError: str expected, not NoneType

When I run Narrate.py I get the following:

Traceback (most recent call last):
File "/Users/ahmed/Desktop/Dev/narrate/narrator/narrator.py", line 12, in
set_api_key(os.environ.get("ELEVENLABS_API_KEY"))
File "/Users/ahmed/Desktop/Dev/narrate/venv/lib/python3.10/site-packages/elevenlabs/simple.py", line 17, in set_api_key
os.environ["ELEVEN_API_KEY"] = api_key
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/os.py", line 684, in setitem
value = self.encodevalue(value)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/os.py", line 756, in encode
raise TypeError("str expected, not %s" % type(value).name)
TypeError: str expected, not NoneType

Issue with Elevenlabs keys & ids

I'm able to get everything to work, but I run into problems with the Eleven Labs voice reading the text.
I do have a paid account and the voice ID, but I can't get it recognized.

I get the following error after it displays text that correctly describes the image.
I followed the @mgennings instructions for creating a .env file to help streamline that process of setting the keys, but still no luck.

  File "C:\narrator-main\narrator-main\narrator.py", line 105, in <module>
    main()
  File "C:\narrator-main\narrator-main\narrator.py", line 96, in main
    play_audio(analysis)
  File "C:\narrator-main\narrator-main\narrator.py", line 31, in play_audio
    audio = generate(text, voice=os.environ.get("ELEVENLABS_VOICE_ID"))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\exhibits\AppData\Roaming\Python\Python312\site-packages\elevenlabs\simple.py", line 61, in generate
    assert isinstance(voice, Voice)
           ^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

UnicodeEncodeError: 'ascii' codec can't encode character '\u201d' in position 58: ordinal not in range(128)

Got this error but no clue...anyone?

πŸ‘€ David is watching...
Traceback (most recent call last):
File "/narrator/narrator.py", line 102, in
main()
File "/narrator/narrator.py", line 88, in main
analysis = analyze_image(base64_image, script=script)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/narrator/narrator.py", line 57, in analyze_image
response = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_utils/_utils.py", line 299, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 556, in create
return self._post(
^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 1055, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 834, in request
return self._request(
^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 854, in _request
request = self._build_request(options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 435, in _build_request
headers = self._build_headers(options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/openai/_base_client.py", line 393, in _build_headers
headers = httpx.Headers(headers_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/httpx/_models.py", line 70, in init
self._list = [
^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/httpx/_models.py", line 74, in
normalize_header_value(v, encoding),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/miniconda3/envs/narrator/lib/python3.11/site-packages/httpx/_utils.py", line 53, in normalize_header_value
return value.encode(encoding or "ascii")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'ascii' codec can't encode character '\u201d' in position 58: ordinal not in range(128)

API Keys

Where can I add my API keys so that I don't need to export them every time I launch the project?

Reduce Rate Requests - errors out with pro plan

I have a paid account with OPEN AI but recieve this error message, suggesting I am exceeding my rate limit.

The limits for (gpt-4)[https://platform.openai.com/account/limits] are:

gpt-4 10,000 TPM 3 RPM200 RPD

My opportunity is:

  • I have the pro account and want to use the repo
  • I cannot use the repo right now.
, line 877, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

Beginner's issue

Hello,

This is my first ever "programming" experience and I am having some issues, here is what I get when I try to run narrator.py

What can I do to solve this?

πŸ‘€ David is watching...
Traceback (most recent call last):
File "/Users/ME/projectai/narrator/narrator.py", line 102, in
main()
File "/Users/ME/projectai/narrator/narrator.py", line 88, in main
analysis = analyze_image(base64_image, script=script)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ME/projectai/narrator/narrator.py", line 57, in analyze_image
response = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/_utils/_utils.py", line 299, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 556, in create
return self._post(
^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/_base_client.py", line 1055, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/_base_client.py", line 834, in request
return self._request(
^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/narrator/lib/python3.12/site-packages/openai/_base_client.py", line 877, in _request
raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': 'The model gpt-4-vision-preview does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}
(narrator) ME@MacBook-Pro-de-ME narrator %

Scripts running fine - but audio not playing

Everything seems to be working:
Getting "πŸ“Έ Say cheese! Saving frame." and "πŸŽ™οΈ David says:" in the terminal in VSC.

But no audio is playing, not sure how to trouble-shoot. Any ideas? 😊

Blank image from capture

I am getting a weird blank image from capture. Not what is actually from my webcam and my webcam is not activated during. Any ideas?
frame

uhm, just a noob who needs help

What python version do i need for this?

Simple audio needs 3.8 but i found that another module required 3.9 so i was not able to get it running. Any help is greatly appreciated.

API KEY error

Hello,
anyone else having similar problem to mine? I entered the api_key for OpenAI as an environmental variable with cmd command setx OPENAI_API_KEY "sk-nU..." and also tried adding a system variable but I always get the same error when launching narrator.py:

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or
by setting the OPENAI_API_KEY environment variable

Any tips what would be the problem?

No such file or directory: './frames/frame.jpg'

capture.py seems to be capturing the images (webcam is on, console repeating "Say Cheese"

However when running narrator.py in a different terminal, I am getting an error message saying that frames/frame.jpg doesn't exist

I do not see this directory or files

Pydantic Version Error?

I get the following error when I run it, do I need to change the version of pydantic or elevenlabs for it to work?

Traceback (most recent call last):
File "/Users/Documents/GitHub/narrator/narrator.py", line 8, in
from elevenlabs import generate, play, set_api_key, voices
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/init.py", line 1, in
from .api import * # noqa F403
^^^^^^^^^^^^^^^^^^
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/api/init.py", line 2, in
from .history import * # noqa F403
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/api/history.py", line 6, in
from pydantic import model_validator
ImportError: cannot import name 'model_validator' from 'pydantic' (/Users/anaconda3/lib/python3.11/site-packages/pydantic/init.cpython-311-darwin.so)
(base) @iMac-2 narrator % python narrator.py
Traceback (most recent call last):
File "/Users/Documents/GitHub/narrator/narrator.py", line 8, in
from elevenlabs import generate, play, set_api_key, voices
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/init.py", line 1, in
from .api import * # noqa F403
^^^^^^^^^^^^^^^^^^
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/api/init.py", line 2, in
from .history import * # noqa F403
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anaconda3/lib/python3.11/site-packages/elevenlabs/api/history.py", line 6, in
from pydantic import model_validator
ImportError: cannot import name 'model_validator' from 'pydantic' (/Users/anaconda3/lib/python3.11/site-packages/pydantic/init.cpython-311-darwin.so)

A voice for the voice_id was not found

Hi! Thanks for hacking this one, it's super cool :)

Narrator doesn't work for me, the elevenpath API always returns

elevenlabs.api.error.APIError: A voice for the voice_id XXX was not found.

where XXX is the voiceId of my freshly created voice.

any ideas? I can't find the error in their docs

AssertionError

My apologies if this is a silly problem with an easy fix, but I promise I have googled it and could not fix it.
The capture.py runs smoothly for me, but the narrator.py throws the following error; and it does not depend on the Eleven Labs audio id that I use.

I'd really appreciate any help you'd be able to offer :)

File "\narrator-main\narrator.py", line 114, in
main()

File "\narrator-main\narrator.py", line 105, in main
play_audio(analysis)

File "\narrator-main\narrator.py", line 40, in play_audio
audio = generate(text, voice=os.environ.get("7Wqa3tuynJ4uUcRnTwAI"))

File "\anaconda3\lib\site-packages\elevenlabs\simple.py", line 61, in generate
assert isinstance(voice, Voice)

AssertionError

FFMPEG not found

Getting this when running narrator:

ValueError: ffplay from ffmpeg not found, necessary to play audio. On mac you can install it with 'brew install ffmpeg'. On linux and windows you can install it from https://ffmpeg.org/

Shouldn't ffmpeg be in the requirements.txt? Where should the executables be placed?

Thanks!

Is Replicate necessary?

The README directs users to create a Replicate account and set an API key, but I skipped those steps and (other issues aside), the script worked fine. If that's true, the README can be simplified to remove the Replicate steps.

Context only tracks assistant

script = script + [{"role": "assistant", "content": analysis}]

I'm not sure the best way to include user prompts in the messages history here, since you don't want to include the actual image every time, but a placeholder of some sort that shows the LLM was prompted into the given response may help consistency and avoid an 'intro' every time, I'm not certain.

Something like:
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image (user uploaded image)"},
],
},

Build error on macOS- clang: error: invalid arch name '-arch root:xnu-10002.60.71.505.1~3/RELEASE_ARM64_T6000'

And people wonder why Nix is necessary... /eyeroll

Building wheels for collected packages: simpleaudio
  Building wheel for simpleaudio (setup.py) ... error
  error: subprocess-exited-with-error

  Γ— python setup.py bdist_wheel did not run successfully.
  β”‚ exit code: 1
  ╰─> [22 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-10.9-universal2-cpython-39
      creating build/lib.macosx-10.9-universal2-cpython-39/simpleaudio
      copying simpleaudio/__init__.py -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio
      copying simpleaudio/shiny.py -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio
      copying simpleaudio/functionchecks.py -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio
      creating build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/c.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/e.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/g.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/left_right.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      copying simpleaudio/test_audio/notes_2_16_44.wav -> build/lib.macosx-10.9-universal2-cpython-39/simpleaudio/test_audio
      running build_ext
      building 'simpleaudio._simpleaudio' extension
      creating build/temp.macosx-10.9-universal2-cpython-39
      creating build/temp.macosx-10.9-universal2-cpython-39/c_src
      clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/Headers -Werror=implicit-function-declaration -Wno-error=unreachable-code -arch root:xnu-10002.60.71.505.1~3/RELEASE_ARM64_T6000 -DDEBUG=0 -I/Users/pmarreck/Documents/narrator/venv/include -I/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/include/python3.9 -c c_src/posix_mutex.c -o build/temp.macosx-10.9-universal2-cpython-39/c_src/posix_mutex.o -mmacosx-version-min=10.6
      clang: error: invalid arch name '-arch root:xnu-10002.60.71.505.1~3/RELEASE_ARM64_T6000'
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for simpleaudio
  Running setup.py clean for simpleaudio
Failed to build simpleaudio
ERROR: Could not build wheels for simpleaudio, which is required to install pyproject.toml-based projects

No module named 'PIL'

python capture.py
Traceback (most recent call last):
  File "/Users/jr/Documents/hobby/narrator/capture.py", line 3, in <module>
    from PIL import Image
ModuleNotFoundError: No module named 'PIL'

Not entirely sure what the issue is, wrong Python version? I'm on Python 3.11.4. Tried installing PIL without success:

pip3 install PIL
ERROR: Could not find a version that satisfies the requirement PIL (from versions: none)
ERROR: No matching distribution found for PIL

ELEVEN_API_KEY, not ELEVENLABS_API_KEY

In README.md,
change:

export OPENAI_API_KEY=<token>
export ELEVENLABS_API_KEY=<eleven-token>

to

export OPENAI_API_KEY=<token>
export ELEVEN_API_KEY=<eleven-token>

Simple guide

This is a very cool project. .... is there a complete step by step guide for this as all the additional bits to set up Replicate, voice in Elevenlab etc have lost me a bit...

🀞🀘

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.