Giter VIP home page Giter VIP logo

visionscriptbot's Introduction

VisionScriptBot

A telegram bot that uses Google's Gemini Pro Vision API , Take a demo here. New Version support prompts along with Images, Add your prompt in Image caption before uploading the Image.

Gemini Vision Pro

Gemini Pro Vision is a Gemini large language vision model that understands input from text and visual modalities (image and video) in addition to text to generate relevant text responses.

Gemini Pro Vision is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.

Gemini API

VisionScriptBot uses Google new Gemini Pro Model .

Gemini is Google's latest family of large language models.

API KEY

You need Google Api key ๐Ÿ” for Gemini to run this model. Get your api key from https://makersuite.google.com/app/apikey

Google's Python SDK for the Gemini API, is contained in the google-generativeai package. Install the dependency using pip:

pip install -q -U google-generativeai

for complete guide refer

Deploy

Deployed on Railway.app , do checkout their free hosting plans here

Use cases

  1. Visual information seeking: Use external knowledge combined with information extracted from the input image or video to answer questions.

  2. Object recognition: Answer questions related to fine-grained identification of the objects in images and videos.

  3. Digital content understanding: Answer questions and extract information from visual content like infographics, charts, figures, tables, and web pages.

  4. Structured content generation: Generate responses based on multimodal inputs in formats like HTML and JSON.

  5. Captioning and description: Generate descriptions of images and videos with varying levels of details.

  6. Reasoning: Compositionally infer new information without memorization or retrieval.

Demo

Support

If You find this project useful, Do support me here

visionscriptbot's People

Contributors

nuhmanpk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.