Giter VIP home page Giter VIP logo

internchat's Introduction

The project is still under construction, we will continue to update it and welcome contributions/pull requests from the community.

InternChat [paper]

InternChat(short for iChat) is pointing-language-driven visual interactive system. The name InternChat stands for interaction, nonverbal, and chatbots. Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, iChat significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios. Additionally, in iChat, an auxiliary control mechanism is used to improve the control capability of LLM, and a large vision-language model termed Husky is fine-tuned for high-quality multi-modal dialogue (impressing ChatGPT-3.5-turbo with 93.89% GPT-4 Quality).

Online Demo

InternChat is online. Let's try it!

online_demo.mp4

Schedule

  • Support Chinese
  • Support MOSS
  • More powerful foundation models based on InternImage and InternVideo
  • More accurate interactive experience
  • Web Page & Code Generation
  • Support voice assistant
  • Support click interaction
  • Interactive image editing
  • Interactive image generation
  • Interactive visual question answering
  • Segment Anything
  • Image inpainting
  • Image caption
  • image matting
  • Optical character recognition
  • Action recognition
  • Video caption
  • Video dense caption
  • video highlight interpretation

System Overview

Logo

๐ŸŽ Major Features

(a) Remove the masked object

(b) Interactive image editing

(c) Image generation

(d) Interactive visual question answer

(e) Interactive image generation

(f) Video highlight interpretation

๐Ÿ› ๏ธ Installation

Basic requirements

  • Linux
  • Python 3.8+
  • PyTorch 1.12+
  • CUDA 11.6+
  • GCC & G++ 5.4+
  • GPU Memory >= 17G for loading basic tools (HuskyVQA, SegmentAnything, ImageOCRRecognition)

Install Python dependencies

pip install -r requirements.txt

Model zoo

Coming soon...

๐Ÿ‘จโ€๐Ÿซ Get Started

Running the following shell can start a gradio service:

python -u iChatApp.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456

if you want to enable the voice assistant, please use openssl to generate the certificate:

openssl req -x509 -newkey rsa:4096 -keyout ./key.pem -out ./cert.pem -sha256 -days 365 -nodes

and then run:

python -u iChatApp.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456 --https

๐ŸŽซ License

This project is released under the Apache 2.0 license.

๐Ÿ–Š๏ธ Citation

If you find this project useful in your research, please consider cite:

@misc{2023internchat,
    title={InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language},
    author={Zhaoyang Liu and Yinan He and Wenhai Wang and Weiyun Wang and Yi Wang and Shoufa Chen and Qinglong Zhang and Yang Yang and Qingyun Li and Jiashuo Yu and Kunchang Li and Zhe Chen and Xue Yang and Xizhou Zhu and Yali Wang and Limin Wang and Ping Luo and Jifeng Dai and Yu Qiao},
    howpublished = {\url{https://arxiv.org/abs/2305.05662}},
    year={2023}
}

๐Ÿค Acknowledgement

Thanks to the open source of the following projects:

Hugging Face โ€‚ LangChain โ€‚ TaskMatrix โ€‚ SAM โ€‚ Stable Diffusion โ€‚ ControlNet โ€‚ InstructPix2Pix โ€‚ BLIP โ€‚ Latent Diffusion Models โ€‚ EasyOCR โ€‚

internchat's People

Contributors

whai362 avatar liu-zhy avatar yinanhe avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.