zjunlp / dart Goto Github PK

[ICLR 2022] Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

License: MIT License

Python 100.00%

prompt-tuning pre-trained-language-models few-shot-learning prompt dart prompt-learning iclr iclr2022 pytorch language-models

dart's Issues

Event extraction.

Hi, I notice that you report the few-shot results on ACE-2005 and I'm very interested in your implementation. Do you have any plan to share the code for event extraction? It would be very helpful.

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Hello!

I forked your Repo on an AWS g4dn machine and tried to train it using model - roberta-large for the SST2 classification task. It gave me a CUDA error. I tried reducing the batch size from 8-4 as well and it still didn't work. Any suggestions on how to run it ?

Multi-token verbalizers

The current implementation requires all verbalizers to be single tokens, i.e. the word must be part of the vocabulary of the model being used. This is a significant obstacle for practical applications. I noticed there is a flag called force_single_token in get_verbalization_ids, somehow suggesting that verbalizers with more than one token should be an option. I have tried to modify this, but then I get some other errors further downstream, and I must my Pytorch skills are not quite up to the task of making the necessary modifications to the code. Any hints about how to go around this would be much appreciated.

Question about unused tokens

Hi,

It's so kind that you release your codes. After reading the codes, I have a question about unused token.

According to your paper, DART maps template and label as {h₁, ..., h_m, ..., h_m+n} , where h_i are the trainable parameters, and they are replaced with unused tokens(e.g., [unused1] or special tokens in vocabulary). In my opinion, they will be replaced with special tokens in high probability, because it's difficult to distinguish which token were not used during training.

However, in your codes, they are replaced with last few tokens in vocabulary. So I'd like to know how you ensure that the last few tokens in vocabulary were not used? Does it mean that I can pick tokens randomly from vocabulary to replace h_i ?

confused about the usage of BLOCK_FLAG

I'm trying to read the code. The 'BLOCK_FLAG' really confused me, I think it is used for calculating the length of prompt, but I'm not sure if values in 'BLOCK_FLAG' are right. Take BoolQPVP as example, I think other than 'passage', 'question', 'self.mask', which are not part of template, the length of rest words in PATTERN should be count. So I think BLOCK_FLAG should be

BLOCK_FLAG = [0, 1, 1, 1, 0, 1, 0 ,1]

insdead of

BLOCK_FLAG = [0, 0, 1, 0, 0, 0, 0, 0]

Is there something I misunderstood?

class BoolQPVP(PVP):

    VERBALIZER = {
        "False": ["No"],
        "True": ["Yes"]
    }
    """
    VERBALIZER_B = {
        "False": ["false"],
        "True": ["true"]
    }
    """

    PATTERN = ['passage', '.', 'the', ' Question: ',
               'question', '? Answer: ', 'self.mask', '.']

    BLOCK_FLAG = [0, 0, 1, 0, 0, 0, 0, 0]

    def get_parts(self, example: InputExample) -> FilledPattern:
        passage = self.shortenable(example.text_a)
        question = self.shortenable(example.text_b)

        # searched patterns in fully-supervised learning
        # string_list_a = [passage, '.', 'the', 'Question:', question, '?', 'the', 'Answer:', self.mask]
        # string_list_a = [passage, '.', 'the', question, '?', 'the', self.mask]
        # string_list_a = [passage, 'the', question, '?', 'the', self.mask]

        # few-shot
        if self.pattern_id == 1:

            string_list_a = [passage, '.', 'the', ' Question: ',
                             question, '? Answer: ', self.mask, '.']
            string_list_b = []
            block_flag_a = self.BLOCK_FLAG
            block_flag_b = []
            assert len(string_list_a) == len(block_flag_a)
            assert len(string_list_b) == len(block_flag_b)
            return string_list_a, string_list_b, block_flag_a, block_flag_b

        else:
            raise ValueError("unknown pattern_id.")

Questions about symbols

Hi, i have two questions about symbols in this paper.

Eq. (4) shows that h_i (0<=i <=j) are trainable parameters. Thus, j is the number of trainable embedding in template.
However, Eq. (6) shows that m is the number of trainable embedding in template.
From Eq. (4) and template, we can see that j is the length of template and m is the length of sentence after being filled.
Thus, what is the number of trainable embedding? I think it's j.
To avoid optimizing any external parameters, {h_1,...,h_m,..,h_{m+n}} is replaced with unused tokens .... What does n mean? Does it denote the number of class?

Thanks.

where to find the data

Hi Dear Author,

When trying to run the inference.py by directly running python inference.py, it gives me the following error:

Traceback (most recent call last): File "/data/co_project/DART/inference.py", line 53, in <module> train_data = load_examples(task_name, data_dir, TRAIN_SET, num_examples=-1) File "/data/co_project/DART/data_utils/processors.py", line 882, in load_examples examples = processor.get_train_examples(data_dir) File "/data/co_project/DART/data_utils/processors.py", line 700, in get_train_examples return self._create_examples(os.path.join(data_dir, "train.csv"), "train") File "/data/co_project/DART/data_utils/processors.py", line 722, in _create_examples with open(path, encoding='utf8') as f: FileNotFoundError: [Errno 2] No such file or directory: 'data/k-shot/mr/16-13/train.csv'

It looks like the data is not pre-downloaded. Can I ask where to download those data and how can I put them into the correct place? Thanks!

Some questions in model.py

Hello, I am very fond of your paper and work,but I have some problems when I try to understand your code.
In model.py:
def get_loss(self, batch, full_vocab=True, logits_key='pet_logits'):
# Compute Cross-Entropy loss for prompt verbalizers
assert logits_key in batch, 'logits should be pre-computed and stored in batch dict'
masked_logits = batch[logits_key][batch['pet_flags'] == -1]
labels = batch['pet_labels']
if not full_vocab:
masked_logits = masked_logits[:, self.label_ids]
labels = batch['label_ids']
return self.loss_fn(masked_logits, labels)

The size of masked_logits is [batch_size,hidden_states] (I used batch_size=4 and bertmodel so the size is [4,768]), and the size of labels is [batch_size,] . However,when computing crossentropy loss,it always raises error that Index out of range. Is there lacking some layers that can transform the masked_logits's size ([4,768]) into [4,num_classes]?
Meanwhile, I also can't understand the meaning of "full_vocab":when to set it false and when to set it true?
Looking forward to your reply!

Is there a way to store/view predictions | In order to analyze example results

Hi,

Great work!
I'm currently looking at the paper and running experiments on it.
In the codes: Is there a way to store/view predictions, In order to analyze example results?

I wanna see what are the shortcomings of regular techniques and how DART has overcome them,
and lastly, are there still things left for DART to solve or opportunities to improve DART?

Thank you

zjunlp / dart Goto Github PK

dart's Issues

Event extraction.

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Multi-token verbalizers

Question about unused tokens

confused about the usage of BLOCK_FLAG

Questions about symbols

where to find the data

Some questions in model.py

Is there a way to store/view predictions | In order to analyze example results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent