Thanks for sharing this repo. When looking at the /examples<

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

More info on what I said earlier... <a class="user-mention notransla

meaning of `DEV_FILE_NAME` about pet HOT 4 CLOSED

timoschick commented on August 17, 2024

meaning of `DEV_FILE_NAME`

from pet.

Comments (4)

timoschick commented on August 17, 2024

Hi @chris-aeviator ,

a) the labeled (training) data is not split at all. In case of fewglue, this means that TRAIN_FILE_NAME should point to a file containing all 32 examples, whereas DEV_FILE_NAME and TEST_FILE_NAME should point to files containing the original dev/test examples. Note that the dev examples are not at all used during training or for hyperparameter optimization; just like the test examples, they are only used for evaluation. If you have no dev examples, you can simply set def get_dev_examples(self, data_dir: str) to return an empty list.

b) yes, but only for the individual models and not for the final distilled classifier. If you need predictions for the unlabeled data, you can simply set TEST_FILE_NAME = UNLABELED_FILE_NAME. The result is then stored in a file predictions.jsonl where each line is of the form {"idx": <IDX>, "label": "<LABEL>"} where <IDX> is the index of the example in the test file and <LABEL> is the predicted label.

from pet.

timoschick commented on August 17, 2024

I'm closing this issue for now. Feel free to reopen it if you have further questions.

from pet.

aidahalitaj commented on August 17, 2024

Hi @timoschick ,

I am running PET for a custom task with --model_type bert . In the --data_dir I have 4 files train.csv, test.csv, dev.csv, unlabeled.csv.

In the shell script, I have:
--do_train
--do_eval

Now in the output, I always get the predictions.jsonl file. The UNLABELED_FILE_NAME = "unlabeled.csv", so it is not set to other datasets. However, in the predictions file I thought I was getting model predictions of dev.csv. I tested it with different number of samples for each file train/test/dev/unlabeled and the number of rows in predictions.jsonl matched with that of dev set. Is it by default predictions file (located in the final folder) showing the predictioins of dev.csv?

from pet.

aidahalitaj commented on August 17, 2024

More info on what I said earlier...

@timoschick I run two similar experiments on the same dataset (playing with unlabeled sample size)

Experiment A settings:

Balanced Dataset
train (50 samples per class)
test (150 samples per class)
dev (150 samples per class)
unlabeled (10 samples per class)

predictions.jsonl file ha 300 predicted labels in total
Experiment A predictions.jsonl has predictions labels (300 samples) of only one class

Experiment B settings:

Balanced Dataset
train (50 samples per class)
test (150 samples per class)
dev (150 samples per class)
unlabeled (100 samples per class)

predictions.jsonl file has 300 predicted labels in total
Experiment B predictions.jsonl file has predictions labels from both classes

My task is a classification problem with two labels but I don't understand what's the role of unlabeled data in this case and why is it impacting the result.

from pet.

meaning of `DEV_FILE_NAME` about pet HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent