albert-ma / prop Goto Github PK
View Code? Open in Web Editor NEWWSDM'2021, PROP and SIGIR'2021,B-PROP
License: Apache License 2.0
WSDM'2021, PROP and SIGIR'2021,B-PROP
License: Apache License 2.0
Hi, I recently read your new paper B-PROP in SIGIR'21. I'm trying to take your methods as baselines. I have some questions about your fine-tuning process as follows:
Thanks for any possible help.
hello,When will the fine-tuning code be released?
Hi, I'm very interested in your work and I am now following your experimental setups.
Is Transformer_ICT implemented by yourself? Could you please provide the pre-trained model of it?
Thank you.
Hi, this is a great work! However, I failed to download the pretrained model. The given link points to an empty folder. Is the pretrained model available yet?
Hi, I'm not clear how to produce the stem2word file, can you explain what this file should be produced?Thanks a lot !
Hello,
What should I do when cleaning the texts? Should I remove all the numbers, equations and punctuation marks?
Could you provide the codes or a function for text preprocessing?
谢谢
谢谢
Hello,
I see your paper that for vanilla bert, you truncate the token length to 512.
And what about PROP, is PROP the same architecture as vanilla bert and truncate token length to 512,
or PROP seperate long documents to many parts?
Hello,
Is this the code you used in your paper?
It looks like an older version , and errors occur even when running multiprocessing_generate_word_sets.py with proper data.
Did you check whether the code works with the data ?
I notice that in the pretrainng process, the mlm words do not be masked? Is this a bug of this version code?
当执行./scripts/process.sh命令时
INPUT_FILE=./data/wiki_info
Bert_MODEL_DIR=../bert-base-uncased-py(PROP目录下没有bert-base-uncased文件啊,这是怎么回事呀?)
python -m prop.preprocessing_data
--corpus_name wikipedia
--data_file ${INPUT_FILE}/wiki_info/wiki_toy.data \ (q1:是不是多写了一个wiki_info,q2:wiki_toy.data难道不是我们的输入数据么,然后根据这个输入数据生成相应的json文件)
--bert_model ${Bert_MODEL_DIR}
--do_lower_case
--output_dir ${INPUT_FILE}/wiki_info/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.