pliang279 / awesome-multimodal-ml Goto Github PK
View Code? Open in Web Editor NEWReading list for research topics in multimodal machine learning
License: MIT License
Reading list for research topics in multimodal machine learning
License: MIT License
Hi @pliang279,
thanks for this list. I find this recent survey paper on vision and language to be a good addition to this list: Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
The area of Audio and Visual in Applications and Datasets seems to be out of update.
Would you mind to update the list, e.g. audio-visual segmentation ECCV2022?
Thanks for the curated list of multimodal LLM. We have a related work that we hope is added to this awesome repository.
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation (https://arxiv.org/pdf/2303.05983.pdf)
Project Page: https://matrix-alpha.github.io/
after chatgpt, especially gpt-4 was released, there are more multi-modal pre-train model released like mini-gpt4, instructBLIP, is there any plan to add these paper into the list?
Hi everyone, I open this issue for the discussion of consistency and complementary information in multiview or multimodal. After reading some papers, I find that many authors would like to talk about consistency between modalities (e.g., the similarity between modalities) or the complementary information across the modalities. Yes, the consistency can enhance some signals that are not so remarkable in one modality and the complementary information can supplement the information that one view or modal does not exist. But I do not clearly understand why we need them? What's more, I do not find any mathematical explanation about it. Can anybody provide some comprehensive understanding about them?
There are many work in image-to-text retrieval. It should have its own section?
The url for "Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations" is wrong
Hi,
How do you access the courses mentioned ... any online courses on multimodal ml?
Hi,
Can you please add DeepCU: Integrating both Common and Unique Latent Information for
Multimodal Sentiment Analysis https://www.ijcai.org/Proceedings/2019/0503.pdf to the multimodal fusion papers list?
Source code available at https://github.com/sverma88/DeepCU-IJCAI19.
Cheers.
Thanks for your contribution!
I wonder if it is possible to keep the repo name and the title consistent so the title is written as "Awesome Multimodal ML" rather than "Reading List for Topics in Multimodal Machine Learning", I this might be greater!
Hi, @pliang279, very thanks for your great list from which I learned a lot!
Recently, we have a new work about compressing multimodal models, i.e., making them more lightweight and friendly for custom-level devices to use. However, it seems that there isn't a proper subsection to cover this work. And it would be nice if there was a subsection about efficient multimodal models or something similar! Looking forward to your opinions on this or which of the existing subsections could include the work.
Paper: https://proceedings.mlr.press/v202/shi23e.html
Code: https://github.com/sdc17/UPop
Project: https://dachuanshi.com/UPop-Project/
Sorry I didn't see the topic like ''Social Impact โ Fairness and Misinformation''. But I saw this topic in your course. Thank you.
Hi Paul,
I have read about Co-learning where we can train model on 3 modalities however at test time we can use only one modality. I am struggling to understand how this will be implemented in a code. Once we train a model with 3 modalities, it will expect 3 modalities at test time. Do we need to handle this scenario by passing zero or some random values for modalities to be dropped. Please help, also any sample implementation of the same.
Thanks a lot for all your awesome repo.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.