zjunlp / dart Goto Github PK
View Code? Open in Web Editor NEW[ICLR 2022] Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners
License: MIT License
[ICLR 2022] Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners
License: MIT License
Hi, I notice that you report the few-shot results on ACE-2005 and I'm very interested in your implementation. Do you have any plan to share the code for event extraction? It would be very helpful.
Hello!
I forked your Repo on an AWS g4dn machine and tried to train it using model - roberta-large for the SST2 classification task. It gave me a CUDA error. I tried reducing the batch size from 8-4 as well and it still didn't work. Any suggestions on how to run it ?
The current implementation requires all verbalizers to be single tokens, i.e. the word must be part of the vocabulary of the model being used. This is a significant obstacle for practical applications. I noticed there is a flag called force_single_token in get_verbalization_ids, somehow suggesting that verbalizers with more than one token should be an option. I have tried to modify this, but then I get some other errors further downstream, and I must my Pytorch skills are not quite up to the task of making the necessary modifications to the code. Any hints about how to go around this would be much appreciated.
Hi,
It's so kind that you release your codes. After reading the codes, I have a question about unused token.
According to your paper, DART maps template and label as {h1, ..., hm, ..., hm+n} , where hi are the trainable parameters, and they are replaced with unused tokens(e.g., [unused1] or special tokens in vocabulary). In my opinion, they will be replaced with special tokens in high probability, because it's difficult to distinguish which token were not used during training.
However, in your codes, they are replaced with last few tokens in vocabulary. So I'd like to know how you ensure that the last few tokens in vocabulary were not used? Does it mean that I can pick tokens randomly from vocabulary to replace hi ?
I'm trying to read the code. The 'BLOCK_FLAG' really confused me, I think it is used for calculating the length of prompt, but I'm not sure if values in 'BLOCK_FLAG' are right. Take BoolQPVP as example, I think other than 'passage', 'question', 'self.mask', which are not part of template, the length of rest words in PATTERN should be count. So I think BLOCK_FLAG should be
BLOCK_FLAG = [0, 1, 1, 1, 0, 1, 0 ,1]
insdead of
BLOCK_FLAG = [0, 0, 1, 0, 0, 0, 0, 0]
Is there something I misunderstood?
class BoolQPVP(PVP):
VERBALIZER = {
"False": ["No"],
"True": ["Yes"]
}
"""
VERBALIZER_B = {
"False": ["false"],
"True": ["true"]
}
"""
PATTERN = ['passage', '.', 'the', ' Question: ',
'question', '? Answer: ', 'self.mask', '.']
BLOCK_FLAG = [0, 0, 1, 0, 0, 0, 0, 0]
def get_parts(self, example: InputExample) -> FilledPattern:
passage = self.shortenable(example.text_a)
question = self.shortenable(example.text_b)
# searched patterns in fully-supervised learning
# string_list_a = [passage, '.', 'the', 'Question:', question, '?', 'the', 'Answer:', self.mask]
# string_list_a = [passage, '.', 'the', question, '?', 'the', self.mask]
# string_list_a = [passage, 'the', question, '?', 'the', self.mask]
# few-shot
if self.pattern_id == 1:
string_list_a = [passage, '.', 'the', ' Question: ',
question, '? Answer: ', self.mask, '.']
string_list_b = []
block_flag_a = self.BLOCK_FLAG
block_flag_b = []
assert len(string_list_a) == len(block_flag_a)
assert len(string_list_b) == len(block_flag_b)
return string_list_a, string_list_b, block_flag_a, block_flag_b
else:
raise ValueError("unknown pattern_id.")
Hi, i have two questions about symbols in this paper.
Eq. (4) shows that h_i (0<=i <=j) are trainable parameters. Thus, j is the number of trainable embedding in template.
However, Eq. (6) shows that m is the number of trainable embedding in template.
From Eq. (4) and template, we can see that j is the length of template and m is the length of sentence after being filled.
Thus, what is the number of trainable embedding? I think it's j.
To avoid optimizing any external parameters, {h_1,...,h_m,..,h_{m+n}} is replaced with unused tokens .... What does n mean? Does it denote the number of class?
Thanks.
Hi Dear Author,
When trying to run the inference.py by directly running python inference.py
, it gives me the following error:
Traceback (most recent call last): File "/data/co_project/DART/inference.py", line 53, in <module> train_data = load_examples(task_name, data_dir, TRAIN_SET, num_examples=-1) File "/data/co_project/DART/data_utils/processors.py", line 882, in load_examples examples = processor.get_train_examples(data_dir) File "/data/co_project/DART/data_utils/processors.py", line 700, in get_train_examples return self._create_examples(os.path.join(data_dir, "train.csv"), "train") File "/data/co_project/DART/data_utils/processors.py", line 722, in _create_examples with open(path, encoding='utf8') as f: FileNotFoundError: [Errno 2] No such file or directory: 'data/k-shot/mr/16-13/train.csv'
It looks like the data is not pre-downloaded. Can I ask where to download those data and how can I put them into the correct place? Thanks!
Hello, I am very fond of your paper and work,but I have some problems when I try to understand your code.
In model.py:
def get_loss(self, batch, full_vocab=True, logits_key='pet_logits'):
# Compute Cross-Entropy loss for prompt verbalizers
assert logits_key in batch, 'logits should be pre-computed and stored in batch dict'
masked_logits = batch[logits_key][batch['pet_flags'] == -1]
labels = batch['pet_labels']
if not full_vocab:
masked_logits = masked_logits[:, self.label_ids]
labels = batch['label_ids']
return self.loss_fn(masked_logits, labels)
The size of masked_logits is [batch_size,hidden_states] (I used batch_size=4 and bertmodel so the size is [4,768]), and the size of labels is [batch_size,] . However,when computing crossentropy loss,it always raises error that Index out of range. Is there lacking some layers that can transform the masked_logits's size ([4,768]) into [4,num_classes]?
Meanwhile, I also can't understand the meaning of "full_vocab":when to set it false and when to set it true?
Looking forward to your reply!
Hi,
Great work!
I'm currently looking at the paper and running experiments on it.
In the codes: Is there a way to store/view predictions, In order to analyze example results?
I wanna see what are the shortcomings of regular techniques and how DART has overcome them,
and lastly, are there still things left for DART to solve or opportunities to improve DART?
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.