Giter VIP home page Giter VIP logo

awesome-cross-lingual-cross-modal-retrieval's Introduction

Awesome Cross-/Multi-Lingual Cross-Modal Retrieval

Table of Contents

Datasets

Image-Text

  1. [ACL-16] Multi30K(multi-lingual version of Filickr30K)-[English|German|French|Czech]: Multi30K: Multilingual English-German Image Descriptions. [paper] [dataset]
  2. MSCOCO-[English|Chinese|Japanese]:
  • (English) [ARXIV-15] Microsoft COCO Captions: Data Collection and Evaluation Server. [paper] [dataset]
  • (Chinese) [TMM-19] COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval. [paper] [dataset]
  • (Japanese) [ACL-17] STAIR Captions:Constructing a Large-Scale Japanese Image Caption Dataset. [paper] [dataset]
  1. CC3M(mutli-lingual version) [dataset]
  2. Wukong(Chinese) [dataset]

Video-Text

  1. [ICCV-19] VATEX-[English|Chinese]: VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. [Paper] [dataset]
  2. [ACM MM-22] MSRVTT-CN(multi-lingual version of MSRVTT)-[English|Chinese]: Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning. [paper] [dataset]

Note: This repository provides English captions and other language(Machine-translation version) captions of Multi30K, MSCOCO, VATEX, and MSRVTT-CN.



Papers and Code

2024

  • [Wang et al. TIP] Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval. [paper]
  • [Wang et al. AAAI] CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer. [paper]
  • [Cai et al. TKDE] Cross-Lingual Cross-Modal Retrieval with Noise-Robust Fine-Tuning. [paper]

2023

  • [Zeng et al. ACL] Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training. [paper] [code]
  • [Li et al. ACL] Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training. [paper]
  • [Rouditchenko et al. ICASSP] C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. [paper] [code]

2022

  • [Wang et al. ACM MM] Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning. [paper] [code]

2021

  • [Zhou et al. CVPR21] UC2:Universal Cross-lingual Cross-modal Vision-and-Language Pre-training. [paper] [code]
  • [Ni et al. CVPR21] M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training. [paper] [code]
  • [Huang et al. NAACL21] Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models. [paper]
  • [Fei et al. NAACL21] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval. [paper]

2020

  • [Aggarwal et al. ARXIV] Towards Zero-shot Cross-lingual Image Retrieval. [paper]

2019

  • [Portaz et al. ARXIV] Image search using multilingual texts: a cross-modal learning approach between image and text. [paper]

Chinese Cross-modal Pre-training

  • [Gu et al. NIPS22] Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark. [paper] [code]
  • [Xie et al. ARXIV22] ZERO and R2D2: A Large-scale Chinese Cross-modal Benchmark and a Vision-Language Framework. [paper] [code]

awesome-cross-lingual-cross-modal-retrieval's People

Contributors

lijiabei-7 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.