luyug / gc-dpr Goto Github PK

View Code? Open in Web Editor NEW

125.0 125.0 20.0 97 KB

Train Dense Passage Retriever (DPR) with a single GPU

License: Other

Python 100.00%

deep-learning dense-retrieval dpr memory-efficient pytorch question-answering retrieval single-gpu

gc-dpr's People

Stargazers

Watchers

Forkers

kiminh ankur3107 hangzhang-nlp tornado202 wissamantoun crystina-z bino282 hdvvip dungtn ohadrubin zhiyuan-fan cjy02044027 onlyarche eric-wallace hiennguyennq aduyphm samikhan-cse19 coangquang jincan333

gc-dpr's Issues

surrogate = surrogate * (trainer.distributed_factor / 8.)

is 8 just a parameter or has some exact mining? eg, 8 GPUs?

Multiply by distributed_factor/8.

Thanks for posting a really nice repo!
While I was studying the code, I found that in 'train_dense_encoder.py' line 669 and 691 the following:
'''
surrogate = surrogate * (trainer.distributed_factor / 8.)
'''
which I actually don't fully understand the reason of the multiplication part.
Can you explain any reason? Thank you 👍

multilingual-bert issue?

Hi,

I found a weird thing that if using the multilingual-bert e.g: bert-base-multilingual-uncased, it seems like the grad_cache doesn't work. I know it sounds weird, changing different bert models shouldn't affect it, but the thing is I tried normal bert, german bert, and m-bert, only the latter one need very small batch_size (like 4) to successfully run. Other models like german bert runs with batch_size=128 successfully. Do you probably know the reason of this? Btw, great paper and code, extremely helpful! Thanks in advance!

coCondenser hyperparameter

Hello.
Thank you for your great work!

I have a few questions regarding CoCondenser fine tuning.
First,
When training 1st fine tuning and 2nd fine tuning on Ms-marco dataset, can you know the number of negative samples and hard negative samples?

The second,
What are the criteria for selecting hard negative samples? For example, when the rank of positive document is 5th, hard negative samples are 1st to 4th.

How to see the difference between DPR and GC-DPR?

I only see one branch in it.

太赞了

我留这个留言没什么别的原因，就是感觉这工作太棒了，感觉这工作完全解决了memory bank，moco里的问题，CV里面有类似的工作吗？怎么只投了一个workshop？太棒了我兴奋了一晚上。

pip install . 之后的报错

我现在也在做input data 每个sample 都特别大，所以每个batch的sample 数就很小的cl的问题（医疗数据）。
看到您的paper，发现您的方法正是我想要的！！

麻烦要问一下，windows 下，pip install . 之后
会看到

Processing c:\users\user\dropbox\book\bnl\research\granger\fmri\code\gc-dpr
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [14 lines of output]
      error: Multiple top-level packages discovered in a flat-layout: ['dpr', 'data'].

      To avoid accidental inclusion of unwanted files or directories,
      setuptools will not proceed with this build.

      If you are trying to create a single distribution with multiple packages
      on purpose, you should not rely on automatic discovery.
      Instead, consider the following options:

      1. set up custom discovery (`find` directive with `include` or `exclude`)
      2. use a `src-layout`
      3. explicitly set `py_modules` or `packages` with a list of names

      To find more information, look for "package discovery" on setuptools docs.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

请问这个错误在您配置环境的时候常出现嘛，谢谢！

TPU support?

Will this work on a TPU?
If so, can I train it on a colab notebook?

fine tuning existing dpr model

Hi, Thanks for posting a nice repo!
I see that we can train the DPR model with GC-DPR but I guess we need to train it from scratch by loading base models (bert-base-uncased|roberta-base)
How can we make use of this repo to fine tune a pretrained DPR model. For example, we already have DPR encoder models provided by Facebook.
question_model = "facebook/dpr-question_encoder-single-nq-base"
context_model = "facebook/dpr-ctx_encoder-single-nq-base"

To make these models domain-specific my idea is to fine-tune these models with domain data.
It would be helpful if you can let me know how we can load question and context models with train_dense_encoder function.
Any other suggestion would be appreciated.

Multi GPU support

Hi,
Is the current version of encoder training with GC support multiple GPUs?
I tried to run the training with NQ dataset by following the instructions in README.md but on a machine with 2 GPUs.
seems it is running slower than on a single GPU?
i.e. on a single GPU, one step cost about 4 sec, but with two GPU, one step cost about 24 sec

Gradient caching vs Model dropout

The GC-DPR has two steps

The first step did a full batch forward without gradient, to get the full batch contrastive learning loss and corresponding embedding gradient.
The second step conduct mini-batch forward, and assign the embedding gradient, then do backward. The mini-batch will loop through the full batch to computing all gradient and accumulate.

However, during the computation, there might be one issues:

The backbone model has randomized dropout process, the dropout will make the 1 & 2 to be inconsistent. 1's dropout process will be different from 2, so 1's gradient can not be directly applied to 2. 2's gradient shall be calculated again for every mini-batch. This bug can be fixed using some more sophisticated operation to make sure 1&2 to be consistent.

关于Reader_train部分GC技术使用问题

请问Reader训练的时候可以使用您的GC技术减少开销么，在您的示例中好像仍然用的8*32G的GPU。谢谢

luyug / gc-dpr Goto Github PK

gc-dpr's People

Stargazers

Watchers

Forkers

gc-dpr's Issues

surrogate = surrogate * (trainer.distributed_factor / 8.)

Multiply by distributed_factor/8.

multilingual-bert issue?

coCondenser hyperparameter

How to see the difference between DPR and GC-DPR?

太赞了

pip install . 之后的报错

TPU support?

fine tuning existing dpr model

Multi GPU support

Gradient caching vs Model dropout

关于Reader_train部分GC技术使用问题

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent