pliang279 / awesome-multimodal-ml Goto Github PK

Reading list for research topics in multimodal machine learning

License: MIT License

computer-vision deep-learning healthcare machine-learning multimodal-learning natural-language-processing reading-list reinforcement-learning representation-learning robotics speech-processing

awesome-multimodal-ml's People

Contributors

Stargazers

Watchers

Forkers

yaochie senyu-t chahuja peter-yh-wu shankar0206 multimodal-machine-learning iyuge2 miracle24 anhduc2203 manashmandal sprinterzzj roshansh-cmu sanzidikawsar ye-man weichern zhou-ust evinpinar bin2000 salmankh47 343695222 juliac29 wyuzyf qshuang123 harry-zhou zyyll ryutian wangyiyan3318 sailinglqh sparkjiao xjtushujun tythonlee ztl-35 liu-lxq heshao90 jxh4945777 yespon happimeng panelatta oath2yangmen xuchengkevin mtcai stdoo tigeryang93 tangguihua inste-ad lihongweimail yangyang-li jrdeco560 sepstar aileenlingyu yangchuancv xuanhanyu wzhings oliviazzq flamato tcwltcwl xiaochehe xsxustc vanchengkai dingyuedydydy keshav47 greendeck bxshin gullalc wbb123 fandongmeng iftikaralam89 jingyangleoliu wanxnch ericproton hongminwu be-redasmara serryuer annabelle115 pxuab ai-gzhu ahong286 hujiajia0401 nir3ushahaha gzjas jagdeeshgughalot ykwon0407 laiyun90 heucs325336 mariyahendriksen gkoumasd mitchelldel hdchieh yousanai ywyue hongzhili cyvincent husthuke yikaiw choodly phamcuong92 omerelshrief wayne980 bruceshuyu mengchy

awesome-multimodal-ml's Issues

Trends in Integration of Vision and Language Research

Hi @pliang279,
thanks for this list. I find this recent survey paper on vision and language to be a good addition to this list: Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Update on Audio and Visual

The area of Audio and Visual in Applications and Datasets seems to be out of update.

Would you mind to update the list, e.g. audio-visual segmentation ECCV2022?

Suggest a multimodal dataset

Thanks for the curated list of multimodal LLM. We have a related work that we hope is added to this awesome repository.

Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation (https://arxiv.org/pdf/2303.05983.pdf)

Project Page: https://matrix-alpha.github.io/

add paper "Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions"

Papers in later 2022

after chatgpt, especially gpt-4 was released, there are more multi-modal pre-train model released like mini-gpt4, instructBLIP, is there any plan to add these paper into the list?

Consistency and complementary information in multiview or multimodal

Hi everyone, I open this issue for the discussion of consistency and complementary information in multiview or multimodal. After reading some papers, I find that many authors would like to talk about consistency between modalities (e.g., the similarity between modalities) or the complementary information across the modalities. Yes, the consistency can enhance some signals that are not so remarkable in one modality and the complementary information can supplement the information that one view or modal does not exist. But I do not clearly understand why we need them? What's more, I do not find any mathematical explanation about it. Can anybody provide some comprehensive understanding about them?

Cross-modal retrieval should have its section.

There are many work in image-to-text retrieval. It should have its own section?

wrong url

The url for "Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations" is wrong

Online Courses on multimodal ml

Hi,
How do you access the courses mentioned ... any online courses on multimodal ml?

Addition of DeepCU to Multimodal Fusion List

Hi,

Can you please add DeepCU: Integrating both Common and Unique Latent Information for
Multimodal Sentiment Analysis https://www.ijcai.org/Proceedings/2019/0503.pdf to the multimodal fusion papers list?
Source code available at https://github.com/sverma88/DeepCU-IJCAI19.

Cheers.

Proposal to tweak the title for consistency

Thanks for your contribution!

I wonder if it is possible to keep the repo name and the title consistent so the title is written as "Awesome Multimodal ML" rather than "Reading List for Topics in Multimodal Machine Learning", I this might be greater!

Add a paper about efficient multimodal models

Hi, @pliang279, very thanks for your great list from which I learned a lot!

Recently, we have a new work about compressing multimodal models, i.e., making them more lightweight and friendly for custom-level devices to use. However, it seems that there isn't a proper subsection to cover this work. And it would be nice if there was a subsection about efficient multimodal models or something similar! Looking forward to your opinions on this or which of the existing subsections could include the work.

Paper: https://proceedings.mlr.press/v202/shi23e.html
Code: https://github.com/sdc17/UPop
Project: https://dachuanshi.com/UPop-Project/

About the missing research areas

Sorry I didn't see the topic like ''Social Impact – Fairness and Misinformation''. But I saw this topic in your course. Thank you.

guidance to handle missing modality at test time

Hi Paul,

I have read about Co-learning where we can train model on 3 modalities however at test time we can use only one modality. I am struggling to understand how this will be implemented in a code. Once we train a model with 3 modalities, it will expect 3 modalities at test time. Do we need to handle this scenario by passing zero or some random values for modalities to be dropped. Please help, also any sample implementation of the same.
Thanks a lot for all your awesome repo.