Giter VIP home page Giter VIP logo

t-few's People

Contributors

craffel avatar dptam avatar haokunliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

t-few's Issues

Can't run 11 billion model on A100 with 80GB

Hi @craffel @muqeeth @HaokunLiu,

We're trying to reproduce T-Few results for a paper, but we're getting 'CUDA out of memory' using an A100 with 80GB (your recommended setup).

This is what we're running:

python -m src.pl_train -c t011b.json+ia3.json+rte.json -k load_weight="pretrained_checkpoints/t011b_ia3_finish.pt" exp_name=t011b_rte_seed42_ia3_pretrained few_shot_random_seed=42 seed=42

We installed according to the README instructions and are using the default settings in the config files.
We are able to run the 3 billion model using the command above, just not the 11 billion.
Is there anything we are doing wrong?

This is the exception:

CUDA out of memory

Thank you

Clarification about IA^3

Hi :)

I was reading your interesting paper https://arxiv.org/pdf/2205.05638.pdf.

In Section 3.3, you specify that IA^3 adds a total of d_k + d_v + d_ff parameters.

However, if I look at this line, you seem to be allocating 2 * d vectors for each linear layer (multi_lora_a, multi_lora_b) and multiplying multi_lora_a with the input and multi_lora_b with the transformed input.

hidden = hidden * self.multi_lora_b.flatten()

Am I missing something?

Thank you for your clarification :-)

ImportError: cannot import name 'fast_walsh_hadamard_transform' from 'src.models.fwh_cuda' (unknown location)

I tried running the example from the README and got this error. Can you help?

$ CUDA_VISIBLE_DEVICES=3 python -m src.pl_train -c t0.json+rte.json -k save_model=False exp_name=first_exp
Traceback (most recent call last):
  File "/home/james/.conda/envs/tfew/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/james/.conda/envs/tfew/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/james/github/t-few/src/pl_train.py", line 10, in <module>
    from src.models.EncoderDecoder import EncoderDecoder
  File "/home/james/github/t-few/src/models/EncoderDecoder.py", line 11, in <module>
    from .intrinsic import intrinsic_plugin_on_step
  File "/home/james/github/t-few/src/models/intrinsic.py", line 10, in <module>
    from .fwh_cuda import fast_walsh_hadamard_transform as fast_walsh_hadamard_transform_cuda
ImportError: cannot import name 'fast_walsh_hadamard_transform' from 'src.models.fwh_cuda' (unknown location)

Validation score on WSC decreases with training

Thank you for the amazing work on t-few! I've noticed strange behavior when I am running superglue's wsc. I've been logging the validation score every 40 epochs using self.eval_epoch_interval = 40 and when running the command:
python -m src.pl_train -c ia3.json+wsc.json -k save_model=False exp_name=first_exp the output is as following:

{"accuracy": 0.6730769230769231, "score_gt": 0.5068197436630726, "score_cand": 0.7191649047801127}
{"accuracy": 0.49038461538461536, "score_gt": 1.4563168384707892, "score_cand": 1.505529030584372}
{"accuracy": 0.47115384615384615, "score_gt": 3.4743554890155792, "score_cand": 2.727144861450562}
{"accuracy": 0.46153846153846156, "score_gt": 4.202766236777489, "score_cand": 3.5702959763316007}
{"accuracy": 0.40384615384615385, "score_gt": 5.157541000499175, "score_cand": 3.5657502871293287}
{"accuracy": 0.3942307692307692, "score_gt": 5.397989429533482, "score_cand": 3.975659689651086}
{"accuracy": 0.40384615384615385, "score_gt": 5.073869264469697, "score_cand": 3.995581218542961}

The last accuracy score is reported at 240 epochs out of a total 250 epochs.

Any ideas on what is going on here? Thanks!

results for LoRA

Thank you for your valuable contributions. I am currently attempting to replicate the outcomes presented in your research paper. However, I am encountering difficulties in obtaining the desired results when I attempt to re-run LoRA adapters.

copa: 76.00 (2.00), h-swag: 26.64 (0.36), storycloze: 84.87 (0.21), winogrande: 51.14 (2.13), wsc: 65.38 (2.88), wic: 51.57 (0.63), rte: 59.57 (0.36), cb: 51.79 (1.79), anli-r1: 34.80 (0.80), anli-r2: 34.00 (2.40), anli-r3: 32.92 (1.08)

Have you encountered situations where the training of "h-swag" and "rte" did not yield successful results?

Does pl_train.py support TPU training?

Hello,
I am interested in using T-Few recipe for some experiments with Google Cloud TPUs. I am wondering whether the pl_train.py script supports TPU already? I read in the Acknowledgments section of the paper the authors cite that TPU cloud was utilised, however in this script I can see that gpu is directly supported. Any pointers will be appreciated, particularly I would like to use T0โ€“11b

Sum of logprobs in the probability space adds up to values above 1

Hi!
Congratulations on this great work and thank you for putting up such an easy to use framework! It definitely facilitates research quite a bit :)

I was trying to interpret the scores logged during the evaluation of the development set and I realized that sometimes when summing the scores of the exponentiated negative of the scores for GT and CAND results in a sum bigger than 1 for two class datasets (like RTE). Maybe I'm interpreting these scores wrongly since I was expecting the sum of the scores (after converting them to probability space (that is np.exp(-1 * logprob)) to be less than or equal to 1 for two class datasets.

Would you let me know if my rationale is flawed and if or why the sum of the probabilities may be above 1?

Thank you in advance!

Missing config.split_option_flag?

Hi, thanks for the code!

When I run:

CUDA_VISIBLE_DEVICES=0 python -m src.pl_train -c t03b.json+rte.json -k save_model=False exp_name=first_exp3

I get:

Reusing dataset super_glue (/localdata/hjl/hf/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Train size 32
Eval size 277
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Missing logger folder: /home/hjl/t-few/exp_out/first_exp3/log

  | Name  | Type                       | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 2.8 B
-----------------------------------------------------
2.8 B     Trainable params
0         Non-trainable params
2.8 B     Total params
11,399.029Total estimated model params size (MB)
Validation sanity check:   0%|                                                                                                                                                                                                                             | 0/18 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/hjl/t-few/src/pl_train.py", line 86, in <module>
    main(config)
  File "/home/hjl/t-few/src/pl_train.py", line 57, in main
    trainer.fit(model, datamodule)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
    self._run_sanity_check(self.lightning_module)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
    self._evaluation_loop.run()
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
    dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
    output = self._evaluation_step(batch, batch_idx, dataloader_idx)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
    output = self.trainer.accelerator.validation_step(step_kwargs)
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 236, in validation_step
    return self.training_type_plugin.validation_step(*step_kwargs.values())
  File "/opt/conda/hjl/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/home/hjl/t-few/src/models/EncoderDecoder.py", line 229, in validation_step
    batch_output = self.predict(batch)
  File "/home/hjl/t-few/src/models/EncoderDecoder.py", line 139, in predict
    if not self.config.split_option_flag:
AttributeError: 'Config' object has no attribute 'split_option_flag'

I can't find a reference to split_option_flag in any of the config files.
Should I manually set it?

Thanks!

AttributeError: 'DistributedDataParallel' object has no attribute 'save_checkpoint'

@HaokunLiu @dptam Thank you for your great work and congrats on the neurips acceptance!

I have got an issue when using ddp as follows:
AttributeError: 'DistributedDataParallel' object has no attribute 'save_checkpoint'

It's raised by the following line:

self.trainer.model.save_checkpoint(distributed_save_path)

Any suggestion would be appreciated!

Another related question is why the ddp ckpt also needs to be processed by zero_to_fp32.get_fp32_state_dict_from_zero_checkpoint(distributed_save_path)? I thought it should be applied to deepspeed zero ckpts only. This is done in:

trainable_states = zero_to_fp32.get_fp32_state_dict_from_zero_checkpoint(distributed_save_path)

Multi-GPU Support

Hello,

Have you tried training on Multi-GPU setup? I tried running your fine-tuning example like so:

export CUDA_VISIBLE_DEVICES=0,1
python -m src.pl_train -c t03b.json+ia3.json+rte.json -k load_weight="pretrained_checkpoints/t03b_ia3_finish.pt" exp_name=t03b_rte_seed42_ia3_pretrained100k few_shot_random_seed=42 seed=42

But I get errors in the lightning data loaders.

Any Ideas?
Thank you

Could your please give a detailed explanation for the "rank classification"?

Hi, thanks for your excellent job. I've read the paper the reviewed the code. I've encounter some issues as outlined following:

  1. I'd appreciate a detailed explanation of how the "rank classification" is implemented. Could you please provide clarification on the code found at this link?

  2. I'm curious about how the "rank classification" process influences the final results. Is it feasible to employ a direct generation approach, such as generating the label words and matching them against the true answer, as an alternative method?

Multi-task batching

In the paper, you mention that IA^3 is compatible with multi-task batching, a requirement to be comparable to ICL. Unfortunately, the current implementation of Huggingface PEFT does not support this, and it would apparently be a big refactoring to do so huggingface/peft#759.

Do you know of an implementation or example that shows how to do this?

AttributeError: Can't pickle local object 'create_collate_fn.<locals>.collate_fn'

When I tried to run the demo, I found this error! @dptam @jmohta @muqeeth

Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
WARNING:datasets.builder:Reusing dataset super_glue (/Users/caffrey/Documents/research/t-few-master/cache/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Missing logger folder: exp_out/first_exp/log
WARNING:datasets.builder:Reusing dataset super_glue (/Users/caffrey/Documents/research/t-few-master/cache/super_glue/rte/1.0.2/d040c658e2ddef6934fdd97deb45c777b6ff50c524781ea434e7219b56a428a7)
Train size 32
Eval size 277

  | Name  | Type                       | Params
-----------------------------------------------------
0 | model | T5ForConditionalGeneration | 2.8 B 
-----------------------------------------------------
2.8 B     Trainable params
0         Non-trainable params
2.8 B     Total params
11,399.029Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/caffrey/Documents/paper/t-few-master/src/pl_train.py", line 86, in <module>
    main(config)
  File "/Users/caffrey/Documents/paper/t-few-master/src/pl_train.py", line 57, in main
    trainer.fit(model, datamodule)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
    self._run_sanity_check()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
    val_loop.run()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 199, in run
    self.on_run_start(*args, **kwargs)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 88, in on_run_start
    self._data_fetcher = iter(data_fetcher)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/pytorch_lightning/utilities/fetching.py", line 178, in __iter__
    self.dataloader_iter = iter(self.dataloader)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 443, in __iter__
    return self._get_iterator()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 389, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1062, in __init__
    w.start()
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/caffrey/miniforge3/envs/tongji/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'create_collate_fn.<locals>.collate_fn'

Where are the loss function changes in the codebase?

From the paper: "As an objective, we use the sum of a standard language modeling loss, an unlikelihood loss for incorrect choices, and a length-normalized loss."

And the code also uses huggingface transformers. I was just wondering if you could point me to where the loss function is modified and then used in training in the codebase.

question about intrinsic.py

Some context: in line 179 of the code, we have param.requires_grad_(False). I'm a bit confused why this needs to be set to false. When I try to reproduce this code in a different setting, my loss does not decrease. However, when param.requires_grad_(True), the loss does decrease. Either way, I'm unclear why it should matter because in the optimizer only intrinsic_parameter and intrinsic_said are being updated.

To which epoch/training step does the finish.pt checkpoint belong to?

Hi everyone!

When I run the experiments after eval_epoch_interval's the model is validated and a checkpoint is written out as global_stepXXXXX.pt. At the end there is also a final checkpoint written out named finish.pt. I assumed this one either belongs to the best intermediate validation performance or the last epoch. However, from comparing it with the other checkpoints that were created it seems that finish.pt differs from all global_stepXXXXX.pt checkpoints, so I am wondering to which point in training does the finish.pt belong to?

Sorry if I miss something obvious here.

Best,
Stefan

Where are performance results of experiments stored

Hi,

thank you very much for sharing your code!

I ran the example from the readme and the parts of the few-shot-pretrained-3b-100k.sh script. However, the dev_scores.json for the readme example only contains the line:

{"accuracy": 0.6101083032490975, "score_gt": 0.3983679488032303, "score_cand": 0.6958685107394676}

And for t03b_copa_seed42_ia3_pretrained100k (the first experiment of few-shot-pretrained-3b-100k.sh):

{"accuracy": 0.85, "score_gt": 0.06061243396921782, "score_cand": 0.4640417302213609}

Those are just the results of the "Validation sanity check" right at the beginning, so I wondered where the validation results after each epoch are stored or am I missing something here?

Thanks!

save dev_pred.txt and test_pred.txt for RTE and ANLI

Congrats on your great work! I am interested in analyzing the results of T0-3B + IA3's predictions on NLI tasks. I run the command python -m src.pl_train -c t03b.json+anli-r3.json+ia3.json -k exp_name=anli-r3 load_weight="pretrained_checkpoints/t03b_ia3_finish.pt" eval_epoch_interval=20 but only see the dev_scores.json file in the output. How can I also obtain the prediction file of the model? Thanks!

KeyError: 'HF_HOME'

Hi!
I was trying to run the example in README, but it says KeyError: 'HF_HOME'
This is the script I used: python -m src.pl_train -c t03b.json+rte.json -k save_model=False exp_name=first_exp
I can't find anywhere in the code that sets the value of this environment variable.

Mark experiment first_exp as claimed
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Traceback (most recent call last):
  File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/weiqiuyou/Documents/codebases/t-few/src/pl_train.py", line 86, in <module>
    main(config)
  File "/Users/weiqiuyou/Documents/codebases/t-few/src/pl_train.py", line 57, in main
    trainer.fit(model, datamodule)
  File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1131, in _run
    self._data_connector.prepare_data()
  File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 154, in prepare_data
    self.trainer.datamodule.prepare_data()
  File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/site-packages/pytorch_lightning/core/datamodule.py", line 474, in wrapped_fn
    fn(*args, **kwargs)
  File "/Users/weiqiuyou/Documents/codebases/t-few/src/data/data_module.py", line 17, in prepare_data
    _ = self.dataset_reader.read_few_shot_dataset()
  File "/Users/weiqiuyou/Documents/codebases/t-few/src/data/dataset_readers.py", line 164, in read_few_shot_dataset
    orig_data = self.read_orig_dataset("train")
  File "/Users/weiqiuyou/Documents/codebases/t-few/src/data/dataset_readers.py", line 146, in read_orig_dataset
    orig_data = load_dataset(*self.dataset_stash, split=split, cache_dir=os.environ["HF_HOME"])
  File "/Users/weiqiuyou/opt/miniconda3/envs/tfew/lib/python3.7/os.py", line 678, in __getitem__
    raise KeyError(key) from None
KeyError: 'HF_HOME'

How is l_ff created?

Firstly, thank you for the amazing work! I had a question around the implementation of $l_{ff}$ in the (IA)3 method:

The config file for (IA)3 lists lora_layers as "k|v|wi_1.*"

"lora_layers": "k|v|wi_1.*",

However, when using this string to find model layers to modify (code snippet below), it seems that while the Keys and Values in the self-attention modules are modified, all the FF layers (i.e. in the format encoder.block.x.layer.x.DenseReluDense.wi) are skipped, and thus the vector $l_{ff}$ is not created in the model ($l_k$ and $l_v$ are created as expected).

t-few/src/models/lora.py

Lines 64 to 72 in 4e581fa

if re.fullmatch(config.lora_layers, c_name):
assert isinstance(
layer, nn.Linear
), f"LoRA can only be applied to torch.nn.Linear, but {layer} is {type(layer)}."
setattr(
module,
c_name,
LoRALinear(layer, config.lora_rank, config.lora_scaling_rank, config.lora_init_scale),
)

I was thus wondering if the param lora_layers should instead be "k|v|wi.*"? Or am I missing something, and the existing config file somehow also triggers the creation of $l_{ff}$, in addition to $l_k$ and $l_v$?

Thank you!

IA3 implementation doesn't add parameters for feedforward layers

Hi,

I'm trying to implement your method (IA)3 for use with HuggingFace's PEFT library and had a question. In the paper, it is mentioned that the learned vectors in (IA)3 are added for all the position-wise feedfoward layers in the transformer, along with the various attention layers. I ran src/models/lora.py and used the config parameters in configs/ia3.json to check how the new model layers would be. The typical FeedForward module in T5 is a T5DenseActDense module that looks as follows:

          (DenseReluDense): T5DenseActDense(
          (wi): Linear(in_features=768, out_features=3072, bias=False)
          (wo): Linear(in_features=3072, out_features=768, bias=False)
          (dropout): Dropout(p=0.1, inplace=False)
          (act): ReLU()
          )

Since (IA)3 is implemented as an extension of LoRA, the Linear layers are supposed to get converted into LoRALinear layers. However, the config in ia3.json sets the parameter lora_layers to be "k|v|wi_1.*", which does not include the layers in DenseReluDense (These are attributes with names wi and wo). I've tried T5-small, T5-base and T5-3b, and for all of these models, learned vectors are not added for the feedforward layers. I was wondering if I'm doing something wrong or if I'm supposed to use a different config file. Or are (IA)3 parameters added only for certain feedforward layers?

questions from your paper

Thanks for your great work. I have one question about your paper. Table 4 shows the results for all PEFT methods "without" pertaining. Right?

Releasing evaluation log probabilites

Hi, thanks for open-sourcing model code! Could you release the log probabilities for evaluation tasks (i.e., the model probabilities for valid answers for each prompt on each question for all evaluated datasets)? This data would allow for for fine-grained evaluation of models and comparing against other LLMs.

cf. facebookresearch/metaseq#25

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.