Giter VIP home page Giter VIP logo

prop's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

prop's Issues

Questions about your fine-tuning process.

Hi, I recently read your new paper B-PROP in SIGIR'21. I'm trying to take your methods as baselines. I have some questions about your fine-tuning process as follows:

  1. Did you also apply linear_warm_up and linear_decay strategy?
  2. What kind of loss did you adopt? Pair-wise hinge loss or cross-entropy loss or others? For each relevant doc of a query, how many negative docs did you sample?
  3. For the two big datasets, the paper assigns the batch_size to 144. Does it mean 144 (query, pos_doc, neg_doc) or 144 (query, doc) {which means 72(query, pos_doc, neg_doc)s}?

Thanks for any possible help.

Question about your baseline Transformer_ICT

Hi, I'm very interested in your work and I am now following your experimental setups.
Is Transformer_ICT implemented by yourself? Could you please provide the pre-trained model of it?
Thank you.

Fail to download the pretrained model

Hi, this is a great work! However, I failed to download the pretrained model. The given link points to an empty folder. Is the pretrained model available yet?

Questions about text preprocessing.

Hello,
What should I do when cleaning the texts? Should I remove all the numbers, equations and punctuation marks?
Could you provide the codes or a function for text preprocessing?

Do you truncate the token length to 512 for PROP

Hello,
I see your paper that for vanilla bert, you truncate the token length to 512.
And what about PROP, is PROP the same architecture as vanilla bert and truncate token length to 512,
or PROP seperate long documents to many parts?

questions about scripts

Hello,
Is this the code you used in your paper?
It looks like an older version , and errors occur even when running multiprocessing_generate_word_sets.py with proper data.
Did you check whether the code works with the data ?

数据预处理脚本问题

当执行./scripts/process.sh命令时

INPUT_FILE=./data/wiki_info
Bert_MODEL_DIR=../bert-base-uncased-py(PROP目录下没有bert-base-uncased文件啊,这是怎么回事呀?)

python -m prop.preprocessing_data
--corpus_name wikipedia
--data_file ${INPUT_FILE}/wiki_info/wiki_toy.data \ (q1:是不是多写了一个wiki_info,q2:wiki_toy.data难道不是我们的输入数据么,然后根据这个输入数据生成相应的json文件)
--bert_model ${Bert_MODEL_DIR}
--do_lower_case
--output_dir ${INPUT_FILE}/wiki_info/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.