Giter VIP home page Giter VIP logo

chakrabortyrajatsubhra / video-chatgpt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mbzuai-oryx/video-chatgpt

0.0 0.0 0.0 80.99 MB

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

License: Creative Commons Attribution 4.0 International

Shell 0.62% Python 99.38%

video-chatgpt's Introduction

Video-ChatGPT ๐ŸŽฅ ๐Ÿ’ฌ

Oryx Video-ChatGPT

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Installation ๐Ÿ”ง

We recommend setting up a conda environment for the project:

conda create --name=video_chatgpt python=3.10
conda activate video_chatgpt

git clone https://github.com/mbzuai-oryx/Video-ChatGPT.git
cd Video-ChatGPT
pip install -r requirements.txt

export PYTHONPATH="./:$PYTHONPATH"

Additionally, install FlashAttention for training,

pip install ninja

git clone https://github.com/HazyResearch/flash-attention.git
cd flash-attention
git checkout v1.0.7
python setup.py install

Running Demo Offline ๐Ÿ’ฟ

To run the demo offline, please refer to the instructions in offline_demo.md.


Training ๐Ÿš‹

For training instructions, check out train_video_chatgpt.md.


Video Instruction Dataset for ADL:

If you want the dataset and features let me know.

Qualitative Analysis ๐Ÿ”

A Comprehensive Evaluation of Video-ChatGPT's Performance across Multiple Tasks.

Video Reasoning Tasks ๐ŸŽฅ

sample1


Creative and Generative Tasks ๐Ÿ–Œ๏ธ

sample5


Spatial Understanding ๐ŸŒ

sample8


Video Understanding and Conversational Tasks ๐Ÿ’ฌ

sample10


Action Recognition ๐Ÿƒ

sample22


Question Answering Tasks โ“

sample14


Temporal Understanding โณ

sample18


License ๐Ÿ“œ

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

video-chatgpt's People

Contributors

mmaaz60 avatar hanoonar avatar chakrabortyrajatsubhra avatar ashmalvayani avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.