Topic: multimodal Goto Github

Some thing interesting about multimodal

👇 Here are 739 public repositories matching this topic...

alan-ai / alan-sdk-android

multimodal,Conversational AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)

Organization: alan-ai

Home Page: https://alan.app/

alan-sdk android voice voice-assistant alan-voice alan-studio sdk alan-ai voice-commands voice-control

alan-ai / alan-sdk-cordova

multimodal,Conversational AI SDK for Apache Cordova to enable text and voice conversations with actions (iOS and Android)

Organization: alan-ai

conversational-ai speech-recognition text-to-speech machine-learning voice-commands chatbot voice-assistant voice-interface vui multimodal

alan-ai / alan-sdk-flutter

multimodal,Conversational AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)

Organization: alan-ai

Home Page: https://alan.app

alan-sdk alan-studio chatbot voice voice-assistant voice-ai alan-voice flutter sdk voice-commands

alan-ai / alan-sdk-ionic

multimodal,Conversational AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)

Organization: alan-ai

Home Page: https://alan.app

alan-ionic-sdk alan-studio chatbot voice voice-assistant voice-ai ionic sdk voice-commands voice-control

alibabaresearch / advancedliteratemachinery

multimodal,A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Organization: alibabaresearch

artificial-intelligence documentai multimodal multimodal-deep-learning ocr computer-vision vision-language-transformer end-to-end-ocr scene-text-detection scene-text-detection-recognition

atfortes / awesome-llm-reasoning

multimodal,Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

User: atfortes

language-models reasoning prompt question-answering in-context-learning chatgpt chain-of-thought prompt-engineering cot awesome

autodistill / autodistill

multimodal,Images to inference with no labeling (use foundation models to train supervised models).

Organization: autodistill

Home Page: https://docs.autodistill.com

computer-vision model-distillation auto-labeling deep-learning foundation-models grounding-dino image-annotation image-classification instance-segmentation labeling-tool

bentoml / bentoml

multimodal,The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

Organization: bentoml

Home Page: https://bentoml.com

ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python

docarray / docarray

multimodal,Represent, send, store and search multimodal data

Organization: docarray

Home Page: https://docs.docarray.org/

docarray data-structures multimodal cross-modal neural-search deep-learning nested-data qdrant weaviate nearest-neighbor-search

enricoros / big-agi

multimodal,Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

User: enricoros

Home Page: https://big-agi.com

chatgpt generative-ai ui chatgpt-ui agi large-language-models stable-diffusion gpt gpt-4 openai

eurus-holmes / awesome-multimodal-research

multimodal,A curated list of Multimodal Related Research.

User: eurus-holmes

awesome multimodal multimodal-learning multimodal-research

facebookresearch / mmf

multimodal,A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Organization: facebookresearch

Home Page: https://mmf.sh/

pytorch vqa pretrained-models multimodal deep-learning captioning dialog textvqa hateful-memes multi-tasking

google-research-datasets / wit

multimodal,WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Organization: google-research-datasets

Home Page: https://github.com/google-research-datasets/wit

nlp machine-learning wikipedia multimodal multilingual cc-by-sa-3

haotian-liu / llava

multimodal,[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

User: haotian-liu

Home Page: https://llava.hliu.cc

gpt-4 chatbot chatgpt llama multimodal llava foundation-models instruction-tuning multi-modality visual-language-learning

idea-ccnl / fengshenbang-lm

multimodal,Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

Organization: idea-ccnl

chinese-nlp pretrained-models pytorch distributed-training transformers aigc multimodal

internlm / huixiangdou

multimodal,HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Organization: internlm

Home Page: https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web

chatbot llm rag dsl lark pipeline robot wechat application multimodal

internlm / internlm-xcomposer

multimodal,InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Organization: internlm

chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning

invictus717 / metatransformer

multimodal,Meta-Transformer for Unified Multimodal Learning

User: invictus717

Home Page: https://arxiv.org/abs/2307.10802

artificial-intelligence computer-vision machine-learning multimedia multimodal transformers foundationmodel

jina-ai / discoart

multimodal,🪩 Create Disco Diffusion artworks in one line

Organization: jina-ai

creative-ai disco-diffusion cross-modal dalle generative-art multimodal diffusion prompts midjourney imgen

jina-ai / jina

multimodal,☁️ Build multimodal AI applications with cloud-native stack

Organization: jina-ai

Home Page: https://docs.jina.ai

neural-search cloud-native deep-learning machine-learning framework grpc kubernetes multimodal mlops pipeline fastapi generative-ai docker jaeger llmops opentelemetry cncf microservice orchestration prometheus

kyegomez / bitnet

multimodal,Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

User: kyegomez

Home Page: https://discord.gg/qUtxnK2NMf

artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning

kyegomez / swarms

multimodal,The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503

User: kyegomez

Home Page: https://docs.swarms.world

artificial-intelligence attention-mechanism gpt4 langchain machine-learning multi-modal-imaging multi-modality multimodal swarms transformer-models

kyegomez / tree-of-thoughts

multimodal,Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

User: kyegomez

Home Page: https://discord.gg/qUtxnK2NMf

artificial-intelligence chatgpt gpt4 multimodal prompt-engineering deep-learning prompt prompt-learning prompt-tuning

louis030195 / screen-pipe

multimodal,Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.

User: louis030195

Home Page: https://screenpi.pe

ai computer-vision llm machine-learning ml multimodal vision

luban-agi / awesome-aigc-tutorials

multimodal,Curated tutorials and resources for Large Language Models, AI Painting, and more.

Organization: luban-agi

aigc llm ai midjourney stable-diffusion deep-learning tutorials courses-resource prompt-engineering nlp

lucidrains / coca-pytorch

multimodal,Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

User: lucidrains

artificial-intelligence attention-mechanism contrastive-learning deep-learning multimodal transformers image-to-text

microsoft / torchscale

multimodal,Foundation Architecture for (M)LLMs

Organization: microsoft

Home Page: https://aka.ms/GeneralAI

computer-vision machine-learning multimodal natural-language-processing pretrained-language-model speech-processing transformer translation

microsoft / unilm

multimodal,Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Organization: microsoft

Home Page: https://aka.ms/GeneralAI

nlp pre-trained-model unilm minilm layoutlm layoutxlm beit document-ai trocr beit-3

mintplex-labs / anything-llm

multimodal,The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.

Organization: mintplex-labs

Home Page: https://anythingllm.com

rag lmstudio localai vector-database ollama local-llm chromadb desktop-app llama3 llamacpp

modelscope / ms-swift

multimodal,Use PEFT or Full-parameter to finetune 300+ LLMs or 60+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Organization: modelscope

Home Page: https://swift.readthedocs.io/zh-cn/latest/LLM/index.html

agent llm lora llama pre-training sft deploy multimodal dpo llava llama3 modelscope unsloth peft qwen2 internvl ollama megatron minicpm-v reft

next-gpt / next-gpt

multimodal,Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

User: next-gpt

Home Page: https://next-gpt.github.io/

chatgpt foundation-models gpt-4 instruction-tuning large-language-models llm multi-modal-chatgpt multimodal visual-language-learning

nvidia / nemo

multimodal,A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Organization: nvidia

Home Page: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

machine-translation speaker-recognition asr tts generative-ai multimodal deeplearning neural-networks speaker-diariazation speech-translation

ofa-sys / ofa

multimodal,Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Organization: ofa-sys

multimodal pretraining image-captioning text-to-image-synthesis visual-question-answering referring-expression-comprehension vision-language pretrained-models prompt prompt-tuning

open-mmlab / mmpretrain

multimodal,OpenMMLab Pre-training Toolbox and Benchmark

Organization: open-mmlab

Home Page: https://mmpretrain.readthedocs.io/en/latest/

image-classification resnet mobilenet pytorch deep-learning swin-transformer beit clip constrastive-learning convnext

open-mmlab / multimodal-gpt

multimodal,Multimodal-GPT

Organization: open-mmlab

flamingo gpt gpt-4 llama multimodal transformer vision-and-language

openbmb / viscpm

multimodal,[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Organization: openbmb

diffusion-models large-language-models multimodal transformers

opengvlab / interngpt

multimodal,InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Organization: opengvlab

Home Page: https://igpt.opengvlab.com

chatgpt foundation-model gpt gpt-4 gradio husky image-captioning langchain llm multimodal

opengvlab / internvideo

multimodal,[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Organization: opengvlab

foundation-models video-understanding vision-transformer action-recognition masked-autoencoder multimodal open-set-recognition spatio-temporal-action-localization temporal-action-localization video-question-answering

rerun-io / rerun

multimodal,Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

Organization: rerun-io

Home Page: https://rerun.io/

computer-vision cpp multimodal python robotics rust visualization

rom1504 / clip-retrieval

multimodal,Easily compute clip embeddings and build a clip retrieval system with them

User: rom1504

Home Page: https://rom1504.github.io/clip-retrieval/

semantic-search deep-learning multimodal ai clip knn

rom1504 / img2dataset

multimodal,Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

User: rom1504

deep-learning dataset big-data image multimodal image-dataset download-images

skalskip / courses

multimodal,This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

User: skalskip

computer-vision deep-learning deep-neural-networks machine-learning mlops multimodal transformers tutorial natural-language-processing nlp

stability-ai / stability-sdk

multimodal,SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)

Organization: stability-ai

Home Page: https://platform.stability.ai/

stable-diffusion ai-art generative-art latent-diffusion multimodal

swyxio / ai-notes

multimodal,notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

User: swyxio

Home Page: https://latent.space/

ai gpt gpt-3 multimodal openai prompt-engineering stable-diffusion

unum-cloud / uform

multimodal,Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Organization: unum-cloud

Home Page: https://unum-cloud.github.io/uform/