Giter VIP home page Giter VIP logo

gc-dpr's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gc-dpr's Issues

Multiply by distributed_factor/8.

Thanks for posting a really nice repo!
While I was studying the code, I found that in 'train_dense_encoder.py' line 669 and 691 the following:
'''
surrogate = surrogate * (trainer.distributed_factor / 8.)
'''
which I actually don't fully understand the reason of the multiplication part.
Can you explain any reason? Thank you 👍

multilingual-bert issue?

Hi,

I found a weird thing that if using the multilingual-bert e.g: bert-base-multilingual-uncased, it seems like the grad_cache doesn't work. I know it sounds weird, changing different bert models shouldn't affect it, but the thing is I tried normal bert, german bert, and m-bert, only the latter one need very small batch_size (like 4) to successfully run. Other models like german bert runs with batch_size=128 successfully. Do you probably know the reason of this? Btw, great paper and code, extremely helpful! Thanks in advance!

coCondenser hyperparameter

Hello.
Thank you for your great work!

I have a few questions regarding CoCondenser fine tuning.
First,
When training 1st fine tuning and 2nd fine tuning on Ms-marco dataset, can you know the number of negative samples and hard negative samples?

The second,
What are the criteria for selecting hard negative samples? For example, when the rank of positive document is 5th, hard negative samples are 1st to 4th.

太赞了

我留这个留言没什么别的原因,就是感觉这工作太棒了,感觉这工作完全解决了memory bank,moco里的问题,CV里面有类似的工作吗?怎么只投了一个workshop?太棒了我兴奋了一晚上。

pip install . 之后的报错

我现在也在做input data 每个sample 都特别大,所以每个batch的sample 数就很小的cl的问题(医疗数据)。
看到您的paper,发现您的方法正是我想要的!!

麻烦要问一下,windows 下,pip install . 之后
会看到

Processing c:\users\user\dropbox\book\bnl\research\granger\fmri\code\gc-dpr
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [14 lines of output]
      error: Multiple top-level packages discovered in a flat-layout: ['dpr', 'data'].

      To avoid accidental inclusion of unwanted files or directories,
      setuptools will not proceed with this build.

      If you are trying to create a single distribution with multiple packages
      on purpose, you should not rely on automatic discovery.
      Instead, consider the following options:

      1. set up custom discovery (`find` directive with `include` or `exclude`)
      2. use a `src-layout`
      3. explicitly set `py_modules` or `packages` with a list of names

      To find more information, look for "package discovery" on setuptools docs.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

请问这个错误在您配置环境的时候常出现嘛,谢谢!

TPU support?

Will this work on a TPU?
If so, can I train it on a colab notebook?

fine tuning existing dpr model

Hi, Thanks for posting a nice repo!
I see that we can train the DPR model with GC-DPR but I guess we need to train it from scratch by loading base models (bert-base-uncased|roberta-base)
How can we make use of this repo to fine tune a pretrained DPR model. For example, we already have DPR encoder models provided by Facebook.
question_model = "facebook/dpr-question_encoder-single-nq-base"
context_model = "facebook/dpr-ctx_encoder-single-nq-base"

To make these models domain-specific my idea is to fine-tune these models with domain data.
It would be helpful if you can let me know how we can load question and context models with train_dense_encoder function.
Any other suggestion would be appreciated.

Multi GPU support

Hi,
Is the current version of encoder training with GC support multiple GPUs?
I tried to run the training with NQ dataset by following the instructions in README.md but on a machine with 2 GPUs.
seems it is running slower than on a single GPU?
i.e. on a single GPU, one step cost about 4 sec, but with two GPU, one step cost about 24 sec

Gradient caching vs Model dropout

The GC-DPR has two steps

  1. The first step did a full batch forward without gradient, to get the full batch contrastive learning loss and corresponding embedding gradient.
  2. The second step conduct mini-batch forward, and assign the embedding gradient, then do backward. The mini-batch will loop through the full batch to computing all gradient and accumulate.

However, during the computation, there might be one issues:

  1. The backbone model has randomized dropout process, the dropout will make the 1 & 2 to be inconsistent. 1's dropout process will be different from 2, so 1's gradient can not be directly applied to 2. 2's gradient shall be calculated again for every mini-batch. This bug can be fixed using some more sophisticated operation to make sure 1&2 to be consistent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.