openmatch / univl-dr Goto Github PK

[ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Space for Multi-Modal Retrieval".

License: MIT License

Python 95.86% Shell 4.14%

univl-dr's Introduction

OpenMatch v2

An all-in-one toolkit for information retrieval. Under active development.

Install

git clone https://github.com/OpenMatch/OpenMatch.git
cd OpenMatch
pip install -e .

-e means editable, i.e. you can change the code directly in your directory.

We do not include all the requirements in the package. You may need to manually install torch, tensorboard.

You may also need faiss for dense retrieval. You can install either faiss-cpu or faiss-gpu, according to your enviroment. Note that if you want to perform search on GPUs, you need to install the version of faiss-gpu compatible with your CUDA. In some cases (usually CUDA >= 11.0) pip installs a wrong version. If you encounter errors during search on GPUs, you may try installing it from conda.

Features

Human-friendly interface for dense retriever and re-ranker training and testing
Various PLMs supported (BERT, RoBERTa, T5...)
Native support for common IR & QA Datasets (MS MARCO, NQ, KILT, BEIR, ...)
Deep integration with Huggingface Transformers and Datasets
Efficient training and inference via stream-style data loading

Docs

We are actively working on the docs.

Project Organizers

Zhiyuan Liu
- Tsinghua University
- Homepage
Zhenghao Liu
- Northeastern University
- Homepage
Chenyan Xiong
- Microsoft Research AI
- Homepage
Maosong Sun
- Tsinghua University
- Homepage

Acknowledgments

Our implementation uses Tevatron as the starting point. We thank its authors for their contributions.

Contact

Please email to [email protected].

univl-dr's People

Contributors

Stargazers

Watchers

Forkers

anshiquanshu66 moqingxinai awi121 liuhaotian2004 manhbao-nguyen

univl-dr's Issues

How to handle Different snippet id but Same fact and url for text document.

Hi I'm a student trying to reproduce your research.

I took a look at the dataset and realized that there was data in the text document that had a different snippet_id but the exact same fact and wiki url.

For example, in WebQA_train_val.json

{
    "title": "2008 Summer Olympics",
    "fact": "The theme song of the 2008 Summer Olympics was \"You and Me,\" which was composed by Chen Qigang, the musical director of the opening ceremony.",
    "url": "https://en.wikipedia.org/wiki/2008_Summer_Olympics",
    "snippet_id": "d5bbd0e20dba11ecb1e81171463288e9_7"
}
{
    "title": "2008 Summer Olympics",
    "fact": "The theme song of the 2008 Summer Olympics was \"You and Me,\" which was composed by Chen Qigang, the musical director of the opening ceremony.",
    "url": "https://en.wikipedia.org/wiki/2008_Summer_Olympics",
    "snippet_id": "d5bbd13c0dba11ecb1e81171463288e9_8"
}
{
    "title": "2008 Summer Olympics",
    "fact": "The theme song of the 2008 Summer Olympics was \"You and Me,\" which was composed by Chen Qigang, the musical director of the opening ceremony.",
    "url": "https://en.wikipedia.org/wiki/2008_Summer_Olympics",
    "snippet_id": "d5bcc8440dba11ecb1e81171463288e9_14"
}

All three examples have different snippet_ids,
but the same fact: "The theme song of the 2008 Summer Olympics was "You and Me," which was composed by Chen Qigang, the musical director of the opening ceremony.",
and the same url: "https://en.wikipedia.org/wiki/2008_Summer_Olympics".

It seems to me that different facts should be given different snippet_ids to evaluate accurate search performance.
It looks like you collected all the text and image documents in the train_val.json and test.json files, extracted the embeddings, then trained the model, and I'm curious how you solved it in your study.

If there is something I am missing or misunderstanding, I would appreciate it if you could let me know.
I haven't figured out if there are more examples like this, but I'd like to correct my misconceptions first.

Thank you for your help.

Question about sum(caption embeddings, image embeddings)

First of all, thank you for the code and paper~
I wonder why caption embeddings and image embeddings can be added directly element by element？I have known that they are aligned by minimizing contrast loss in clip.
Thanks again!

Image Verbalization for Expansion

Hello, I'm a student reproducing the paper.
I have a question while reading the paper and looking at the code.
First of all, thank you for sharing the code.

Q1. In the paper 4.3 (IMAGE VERBALIZATION FOR EXPANSION), we need to obtain image verbalization results $V (I_j )$ to expend the raw captions passing through the text encoder.

$C^*_j = C_j ; [SEP]; V (I_j )$　　(8)

However, I could not find a part that generates a potentially matching caption or related queries that corresponds to the image verbalization result.
I would appreciate it if you could tell me which part of the code corresponds to Image Verbalization.

Once again, thank you for revealing the code and I look forward to your reply.