Topic: multimodal Goto Github
Some thing interesting about multimodal
Some thing interesting about multimodal
multimodal,Conversational AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)
Organization: alan-ai
Home Page: https://alan.app/
multimodal,Conversational AI SDK for Apache Cordova to enable text and voice conversations with actions (iOS and Android)
Organization: alan-ai
multimodal,Conversational AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)
Organization: alan-ai
Home Page: https://alan.app
multimodal,Conversational AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)
Organization: alan-ai
Home Page: https://alan.app
multimodal,A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Organization: alibabaresearch
multimodal,Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
User: atfortes
multimodal,Images to inference with no labeling (use foundation models to train supervised models).
Organization: autodistill
Home Page: https://docs.autodistill.com
multimodal,The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
Organization: bentoml
Home Page: https://bentoml.com
multimodal,Represent, send, store and search multimodal data
Organization: docarray
Home Page: https://docs.docarray.org/
multimodal,Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
User: enricoros
Home Page: https://big-agi.com
multimodal,A curated list of Multimodal Related Research.
User: eurus-holmes
multimodal,A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Organization: facebookresearch
Home Page: https://mmf.sh/
multimodal,WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
Organization: google-research-datasets
Home Page: https://github.com/google-research-datasets/wit
multimodal,[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
User: haotian-liu
Home Page: https://llava.hliu.cc
multimodal,Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
Organization: idea-ccnl
multimodal,HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Organization: internlm
Home Page: https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web
multimodal,InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Organization: internlm
multimodal,Meta-Transformer for Unified Multimodal Learning
User: invictus717
Home Page: https://arxiv.org/abs/2307.10802
multimodal,🪩 Create Disco Diffusion artworks in one line
Organization: jina-ai
multimodal,☁️ Build multimodal AI applications with cloud-native stack
Organization: jina-ai
Home Page: https://docs.jina.ai
multimodal,Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multimodal,The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
User: kyegomez
Home Page: https://docs.swarms.world
multimodal,Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
User: kyegomez
Home Page: https://discord.gg/qUtxnK2NMf
multimodal,Library to build personalized AI powered by what you've seen, said, or heard. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
User: louis030195
Home Page: https://screenpi.pe
multimodal,Curated tutorials and resources for Large Language Models, AI Painting, and more.
Organization: luban-agi
multimodal,Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
User: lucidrains
multimodal,Foundation Architecture for (M)LLMs
Organization: microsoft
Home Page: https://aka.ms/GeneralAI
multimodal,Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Organization: microsoft
Home Page: https://aka.ms/GeneralAI
multimodal,The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.
Organization: mintplex-labs
Home Page: https://anythingllm.com
multimodal,Use PEFT or Full-parameter to finetune 300+ LLMs or 60+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
Organization: modelscope
Home Page: https://swift.readthedocs.io/zh-cn/latest/LLM/index.html
multimodal,Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
User: next-gpt
Home Page: https://next-gpt.github.io/
multimodal,A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Organization: nvidia
Home Page: https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
multimodal,Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Organization: ofa-sys
multimodal,OpenMMLab Pre-training Toolbox and Benchmark
Organization: open-mmlab
Home Page: https://mmpretrain.readthedocs.io/en/latest/
multimodal,Multimodal-GPT
Organization: open-mmlab
multimodal,[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
Organization: openbmb
multimodal,InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Organization: opengvlab
Home Page: https://igpt.opengvlab.com
multimodal,[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Organization: opengvlab
multimodal,Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Organization: rerun-io
Home Page: https://rerun.io/
multimodal,Easily compute clip embeddings and build a clip retrieval system with them
User: rom1504
Home Page: https://rom1504.github.io/clip-retrieval/
multimodal,Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
User: rom1504
multimodal,This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
User: skalskip
multimodal,SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
Organization: stability-ai
Home Page: https://platform.stability.ai/
multimodal,notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
User: swyxio
Home Page: https://latent.space/
multimodal,Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Organization: unum-cloud
Home Page: https://unum-cloud.github.io/uform/
multimodal,Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Organization: x-plug
Home Page: https://arxiv.org/abs/2406.01014
multimodal,mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Organization: x-plug
multimodal,mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Organization: x-plug
Home Page: https://www.modelscope.cn/studios/damo/mPLUG-Owl
multimodal,OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Organization: xlang-ai
Home Page: https://os-world.github.io
multimodal,(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
User: yutong-zhou-cv
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.