Giter VIP home page Giter VIP logo

zjr2000 / awesome-multimodal-chatbot Goto Github PK

View Code? Open in Web Editor NEW
58.0 4.0 5.0 18 KB

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience.

general-ai instruction-tuning multimodal vision-language multimodal-dialogue multimodal-assistant chat-application instruction-following chatbot awesome

awesome-multimodal-chatbot's Introduction

Awesome-Multimodal-Chatbot Awesome

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience. It is designed to assist users in performing various tasks, from simple information retrieval to complex multimedia reasoning.

Multimodal Instruction Tuning

  • MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

    arXiv 2022/12 [paper]

  • GPT-4

    arXiv 2023/03 [paper] [blog]

  • Visual Instruction Tuning Star

    arXiv 2023/04 [paper] [code] [project page] [demo]

  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models Star

    arXiv 2023/04 [paper] [code] [project page] [demo]

  • mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Star

    arXiv 2023/04 [paper] [code] [demo]

  • LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Star

    arXiv 2023/04 [paper] [code] [demo]

  • Video-LLaMA: An Instruction-Finetuned Visual Language Model for Video Understanding Star

    [code]

  • LMEye: An Interactive Perception Network for Large Language Models Star

  • arXiv 2023/05 [paper] [code]

  • MultiModal-GPT: A Vision and Language Model for Dialogue with Humans Star

    arXiv 2023/05 [paper] [code] [demo]

  • X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages Star

    arXiv 2023/05 [paper] [code] [project page]

  • Otter: A Multi-Modal Model with In-Context Instruction Tuning Star

    arXiv 2023/05 [paper] [code] [demo]

  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Star

    arXiv 2023/05 [paper] [code]

  • InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language Star

    arXiv 2023/05 [paper] [code] [demo]

  • VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric TasksStar

    arXiv 2023/05 [paper] [code]

  • Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language ModelsStar

  • arXiv 2023/05 [paper] [code] [project page]

  • EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought Star

    arXiv 2023/05 [paper] [code] [project page]

  • DetGPT: Detect What You Need via Reasoning Star

    arXiv 2023/05 [paper] [code] [project page]

  • PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology Star

    arXiv 2023/05 [paper] [code]

  • ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst Star

    arXiv 2023/05 [paper] [code] [project page]

  • Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Star

    arXiv 2023/06 [paper] [code]

  • LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

    arXiv 2023/06 [paper]

  • Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation

    arXiv 2023/06 [paper] [project page]

  • VALLEY: VIDEO ASSISTANT WITH LARGE LANGUAGE MODEL ENHANCED ABILITY Star

    arXiv 2023/06 [paper] [code]

LLM-Based Modularized Frameworks

  • Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Star

    arXiv 2023/03 [paper] [code] [demo]

  • ViperGPT: Visual Inference via Python Execution for Reasoning Star

    arXiv 2023/03 [paper] [code] [project page]

  • TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs Star

    arXiv 2023/03 [paper] [code]

  • Chatgpt asks, blip-2 answers: Automatic questioning towards enriched visual descriptions Star

    arXiv 2023/03 [paper] [code]

  • MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action Star

    arXiv 2023/03 [paper] [code] [project page] [demo]

  • Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface Star

    arXiv 2023/03 [paper] [code] [demo]

  • VLog: Video as a Long Document Star

    [code] [demo]

  • Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions Star

    arXiv 2023/04 [paper] [code]

  • ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

    arXiv 2023/04 [paper] [project page]

  • VideoChat: Chat-Centric Video Understanding Star

    arXiv 2023/05 [paper] [code] [demo]

awesome-multimodal-chatbot's People

Contributors

feielysia avatar ttengwang avatar zjr2000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

awesome-multimodal-chatbot's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.