joeljang / knowledge-unlearning Goto Github PK

View Code? Open in Web Editor NEW

66.0 66.0 12.0 3.67 MB

[ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models

Python 100.00%

knowledge-unlearning's People

Contributors

Stargazers

Watchers

Forkers

mattyoon chen-yingfa ryan0v0 archit31uniyal zhangyizhao alymostafa csm9493

knowledge-unlearning's Issues

accuracy missing "task" argument

In the validation_forget() function, accuracy has been used without specifying the task argument which throws an AssertionError.

acc = accuracy(pred, label, ignore_index=-100)

I have replaced it with

acc = accuracy(pred, label, task="multiclass", num_classes=5063, ignore_index=-100)

I found 5063 to be the number of unique labels. Is this the right fix?

entire dataset for the extraction dataset

Hi, thanks for the interesting work. Could you provide the entire extraction dataset with 16 domains? I do see five samples are available for eight domains. But could you provide all others as well? Many thanks in advance

If I just want to evaluate the nine benchmark datasets with opt-1.3b ,without train. How should I do?

I have set the config.json. It is right?
{ "mode": "general_lm_eval", "wandb_project": "Knowledge Unlearning", "wandb_run_name": "example", "num_train_epochs": 20, "check_val_every_n_epoch": 1, "check_validation_only": true, "do_init_eval": true, "train_set": "data/main/lm_extraction_32_0.csv", "valid_sets": [ "validation_data/lambada.csv", "piqa", "hellaswag", "ai2_arc", "ai2_arc", "super_glue", "winogrande", "math_qa", "validation_data/pubmed_qa.csv" ], "valid_subset_path": [ "", "", "", "ARC-Easy", "ARC-Challenge", "copa", "winogrande_s", "", "" ], "valid_type_path": [ "test", "validation", "validation", "validation", "validation", "validation", "validation", "validation", "" ], "cache_dir":"/home/data0/cgt/knowledge-unlearning/val", "train_batch_size": 8, "eval_batch_size": 8, "gradient_accumulation_steps": 4, "ngpu": 1, "learning_rate": 5e-5, "model_name_or_path": "/home/chen/.cache/huggingface/hub/models--facebook--opt-1.3b/snapshots/3f5c25d0bc631cb57ac65913f76e22c2dfb61d62", "el_threshold": 0.0499, "ma_threshold": 0.2994, "input_length": 512, "output_length": 512, "target_length": 200, "num_workers": 64, "strategy": "deepspeed_stage_2_offload", "fp16": true, "wandb_log": false }

warnings when run the code

When I try to run the code, large numbers of warnings as below appears. I have no idea where it comes from. Could you please provide a method to remove them?

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

joeljang / knowledge-unlearning Goto Github PK

knowledge-unlearning's People

Contributors

Stargazers

Watchers

Forkers

knowledge-unlearning's Issues

accuracy missing "task" argument

entire dataset for the extraction dataset

If I just want to evaluate the nine benchmark datasets with opt-1.3b ,without train. How should I do?

warnings when run the code

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent