Comments (9)
I don't know of a good guide to fine-tuning unfortunately! One of my colleagues, @shailja-thakur, has fine-tuned CodeGen on Verilog code, but it takes a lot of VRAM to fine-tune the 16B model (we had to use 80GB A100s).
The --dataset_name
is just the location of the code you want to train on in a format that Huggingface Datasets recognizes. The simplest is probably to use JSONL format – a JSON file with one dictionary per line, using the format:
{"text": "content_of_source_file_1", "url": "path_to_source_file_1"}
{"text": "content_of_source_file_2", "url": "path_to_source_file_2"}
...
(You can add other keys if you want; the only field used by the training script is text
, but I find it helpful to include some extra metadata so I can keep track of where the code came from.)
You can see an example of a dataset I put together of C/C++ code found in Debian here: https://huggingface.co/datasets/moyix/debian_csrc
I would not expect the bigger models to get much better from being fine-tuned a relatively small amount of code, but the smallest models (like 350M) might benefit from seeing your code.
Also note that it is still a bit tricky to get a custom model working – you'll have to run the conversion from HF to FasterTransformers after training it, and create a configuration file for the new model (there is a script for this in the converter directory: https://github.com/moyix/fauxpilot/blob/main/converter/triton_config_gen.py).
from fauxpilot.
from fauxpilot.
AttributeError: 'CodeGenAttention' object has no attribute 'causal_mask'
FIXED. I figured out what was causing this problem. It was because the versions I learned and tried to sample were different. This problem has been resolved by using the most recent Transformer's latest version (e.g. 4.25.0.dev0) and incorrect weights in the config.json file. My report will be useful to anyone who may have a similar difficulty in the near future. 😄
- The model card informaiton : fine-tuned Codegen-350M-multi model
- /mylab/fine-tuning-codegen/codegen-350M-finetuned$ cat ./README.md
license: bsd-3-clause
tags:
- generated_from_trainer
datasets: - moyix/debian_csrc
model-index: - name: codegen-350M-finetuned
results: []
This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment.
codegen-350M-finetuned
This model is a fine-tuned version of Salesforce/codegen-350M-multi on the moyix/debian_csrc dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1.0
Training results
Framework versions
- Transformers 4.25.0.dev0
- Pytorch 1.13.0
- Datasets 2.6.1
- Tokenizers 0.11.0
from fauxpilot.
I would not expect the bigger models to get much better from being fine-tuned a relatively small amount of code, but the smallest models (like 350M) might benefit from seeing your code.
Yepp, I think so. :)
from fauxpilot.
I don't know of a good guide to fine-tuning unfortunately! One of my colleagues, @shailja-thakur, has fine-tuned CodeGen on Verilog code, but it takes a lot of VRAM to fine-tune the 16B model (we had to use 80GB A100s).
@moyix, @shailja-thakur, I got the unexpected OOM issue (e.g., torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 198.00 MiB (GPU 0; 11.90 GiB total capacity; 10.55 GiB already allocated; 200.50 MiB free; 10.70 GiB reserved in total by PyTorch
) while running the fine-tuning task with the smallest model (e.g., 350M) and your debian dataset on my Ubuntu 22.04 (DRAM 32GB)+ Nvidia GPU Xp (Vram 12GB).
Have you had a similar experience? Did you have to utilize Nvidia A100 VRAM 80GB (or 40GB) at the time, even if you tried to fine-tune tasks using the smallest model, such as the 350M? Can we try to change the 'ds config.json' file to reduce the memory consumption of the GPU VRAM in order to complete the fine-tuning operation successfully? Any feedback will be appreciated.
- Screenshot:
$ my-codegen-350m-deepspeed-finetune.sh
......... OMISSION ..........
[INFO|trainer.py:1608] 2022-11-04 11:17:11,278 >> ***** Running training *****
[INFO|trainer.py:1609] 2022-11-04 11:17:11,278 >> Num examples = 3786289
[INFO|trainer.py:1610] 2022-11-04 11:17:11,278 >> Num Epochs = 1
[INFO|trainer.py:1611] 2022-11-04 11:17:11,278 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1612] 2022-11-04 11:17:11,278 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1613] 2022-11-04 11:17:11,278 >> Gradient Accumulation steps = 32
[INFO|trainer.py:1614] 2022-11-04 11:17:11,278 >> Total optimization steps = 118321
[INFO|trainer.py:1615] 2022-11-04 11:17:11,278 >> Number of trainable parameters = 354858103
0%| /work/qtlab/transformers/src/transformers/models/codegen/modeling_codegen.py:167: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version
attn_weights = torch.where(causal_mask, attn_weights, mask_value)
Traceback (most recent call last):
File "/work/qtlab/./transformers/examples/pytorch/language-modeling/run_clm.py", line 580, in <module>
main()
File "/work/qtlab/./transformers/examples/pytorch/language-modeling/run_clm.py", line 528, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/work/qtlab/transformers/src/transformers/trainer.py", line 1501, in train
return inner_training_loop(
File "/work/qtlab/transformers/src/transformers/trainer.py", line 1749, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/work/qtlab/transformers/src/transformers/trainer.py", line 2508, in training_step
loss = self.compute_loss(model, inputs)
File "/work/qtlab/transformers/src/transformers/trainer.py", line 2540, in compute_loss
outputs = model(**inputs)
File "/home/invain/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/invain/anaconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
return func(*args, **kwargs)
File "/home/invain/anaconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1680, in forward
loss = self.module(*inputs, **kwargs)
File "/home/invain/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/work/qtlab/transformers/src/transformers/models/codegen/modeling_codegen.py", line 711, in forward
lm_logits = self.lm_head(hidden_states).to(torch.float32)
File "/home/invain/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/invain/.local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 198.00 MiB (GPU 0; 11.90 GiB total capacity; 10.55 GiB already allocated; 200.50 MiB free; 10.70 GiB reserved in total by PyTorch) If re
0%|
[2022-11-04 11:17:13,621] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 3296
[2022-11-04 11:17:13,621] [ERROR] [launch.py:324:sigkill_handler] ['/home/invain/anaconda3/envs/deepspeed/bin/python', '-u', './run_clm.py', '--local_rank= 'moyix/debian_csrc', '--tokenizer_name', 'Salesforce/codegen-350M-multi', '--block_size', '2048', '--gradient_accumulation_steps', '32', '--do_train', '--fp16', '--overwrite_output_dir', '--deepspeed',
real 94m15.273s
user 461m18.611s
sys 3m52.003s
from fauxpilot.
Can you share your my-codegen-350m-deepspeed-
finetune.sh, ds_config.json, and the size of the training data, so I get an
idea of what could be happening in your case?
@shailja-thakur, Here, I don't know why this training strategy still gives a CUDA-out-of-memory issue on out-of-date Nvidia GPU (e.g., VRAM 12GB).
- fine-tune option with deepspeed framework (e.g., my-codegen-350m-deepspeed-finetune.sh)
- 12th Gen Intel Core i7 + DRAM 31GB + Nvidia Titan Xp (VRAM 12GB) : It's failed due to CUDA-OOM 😭
- 12th Gen Intel Core i7 + DRAM 31GB + Nvidia A100 (VRAM 80GB) : It's succeeded thanks to VRAM 80GB 😄
--num_gpus 1 --num_nodes 1 $RUN_CLM --model_name_or_path=Salesforce/codegen-${PARAM_SIZE}-multi \
--per_device_train_batch_size=1 --learning_rate 2e-5 --num_train_epochs 1 \
--output_dir=./codegen-${PARAM_SIZE}-finetuned --dataset_name $MY_DATASET \
--tokenizer_name Salesforce/codegen-${PARAM_SIZE}-multi \
--block_size 2048 --gradient_accumulation_steps 32 --do_train --fp16 --overwrite_output_dir \
--deepspeed $DS_CONFIG
- ds_config.json
"zero_optimization": {
"stage": 2,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"contiguous_gradients": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
- the size of the training data
- 153G ~/.cache/huggingface/datasets/moyix___parquet/
At that time, I concentrated on Parameters, Gradients, Optimizer States to avoid CUDA-OOM issue on Nvidia GPU (with VRAM 12GB). However, I could not still find a recipe to avoid CUDA-OOM issue on Nvidia GPU VRAM 12GB.
- Source : MS Research blog, https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/
from fauxpilot.
12th Gen Intel Core i7 + DRAM 31GB + Nvidia Titan Xp (VRAM 12GB) : It's failed due to CUDA-OOM 😭
12th Gen Intel Core i7 + DRAM 31GB + Nvidia A100 (VRAM 80GB) : It's succeeded thanks to VRAM 80GB 😄
@shailja-thakur, Are there any hints or clues to work on Fine-Tune on NVIDIA TITAN XP? I tried various things, but I failed. So now, in my case, I use the high -performance GPU (e.g. NVIDIA A100 (VRAM 80GB) to avoid the CUDA room reported above.
from fauxpilot.
Also note that it is still a bit tricky to get a custom model working
– you'll have to run the conversion from HF to FasterTransformers after training it,
@moyix, First of all, thank you for sharing your experiences.
Thanks to your sharing, I could create a Fine-tuned model (e.g., codegen-350M-multi-finetuned) as follows.
$ tree ./codegen-350M-multi-finetuned/
./codegen-350M-multi-finetuned/
├── added_tokens.json
├── all_results.json
├── config.json
├── merges.txt
├── pytorch_model.bin
├── README.md
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
├── trainer_state.json
├── training_args.bin
├── train_results.json
└── vocab.json
$ ls -al ./codegen-350M-multi-finetuned/
total 778380
drwxr-xr-x 2 leemgs leemgs 4096 Nov 10 16:40 .
drwxr-xr-x 6 leemgs leemgs 4096 Nov 10 16:43 ..
-rw-r--r-- 1 leemgs leemgs 1080 Nov 10 16:31 added_tokens.json
-rw-r--r-- 1 leemgs leemgs 582 Nov 10 16:31 all_results.json
-rw-r--r-- 1 leemgs leemgs 1011 Nov 10 16:31 config.json
-rw-r--r-- 1 leemgs leemgs 456356 Nov 10 16:31 merges.txt
-rw-r--r-- 1 leemgs leemgs 793630000 Nov 10 16:31 pytorch_model.bin
-rw-r--r-- 1 leemgs leemgs 1149 Nov 10 16:31 README.md
-rw-r--r-- 1 leemgs leemgs 99 Nov 10 16:31 special_tokens_map.json
-rw-r--r-- 1 leemgs leemgs 283 Nov 10 16:31 tokenizer_config.json
-rw-r--r-- 1 leemgs leemgs 2114827 Nov 10 16:31 tokenizer.json
-rw-r--r-- 1 leemgs leemgs 998 Nov 10 16:31 trainer_state.json
-rw-r--r-- 1 leemgs leemgs 4539 Nov 10 16:31 training_args.bin
-rw-r--r-- 1 leemgs leemgs 582 Nov 10 16:31 train_results.json
-rw-r--r-- 1 leemgs leemgs 798156 Nov 10 16:31 vocab.json
(deepspeed) leemgs@ai02:~/qtlab/CodeGen/checkpoints$
Using the generated fined-tuned model, I performed the "def hello_word" test.
Currently, I have read the official CodeGen documentation as follows:
However, I meet an unexpected error message like this:
- error message: 'CodeGenAttention' object has no attribute 'causal_mask'
I am perplexed as to why the "pytorch model.bin" file I prepared throughout the fine-tuning process is incompatible.
I believe that any feedback or experience on this error message will be helpful.
(.venv) $ python3 -m jaxformer.hf.sample --model codegen-350M-multi --context "def hello_world():"
loading parameters
loading parameters took 9.95s
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data/home/leemgs/qtlab/CodeGen/jaxformer/hf/sample.py", line 253, in <module>
main()
File "/data/home/leemgs/qtlab/CodeGen/jaxformer/hf/sample.py", line 225, in main
model = create_model(ckpt=ckpt, fp16=use_fp16).to(device)
File "/data/home/leemgs/qtlab/CodeGen/jaxformer/hf/sample.py", line 63, in create_model
return CodeGenForCausalLM.from_pretrained(ckpt, revision='float16', torch_dtype=torch.float16, low_cpu_mem_usage=True)
File "/data/home/leemgs/qtlab/CodeGen/.venv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1526, in from_pretrained
cls._load_state_dict_into_model_low_mem(model, loaded_state_dict_keys, resolved_archive_file)
File "/data/home/leemgs/qtlab/CodeGen/.venv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1786, in _load_state_dict_into_model_low_mem
new_val = getattr(submodule, param_name)
File "/data/home/leemgs/qtlab/CodeGen/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'CodeGenAttention' object has no attribute 'causal_mask'
from fauxpilot.
I would not expect the bigger models to get much better from being fine-tuned a relatively small amount of code, but the smallest models (like 350M) might benefit from seeing your code.
@moyix, I have one query about the fine-tuned Codegen model. With the 350M Codegen model, how can I compare the quality/accuracy of the original Codegen model and the fine-tuned Codegen model? I'm curious if there are any well-known benchmarking tools or general methods for comparing the quality/accuracy of these two models.
from fauxpilot.
Related Issues (20)
- Maybe add windows/etc installer all-in-one in this project's 'releases'.
- 400 Bad Request when file has around 100 lines of code HOT 3
- C# support! HOT 2
- Hello all. The comments above have been very helpful in setting up the Copilot extension. I managed to get it to work with my instance and figured I would combine the steps I used (this is for Windows. Linux installation is similar, just different locations):
- It was working fine before... HOT 1
- Support for AMD GPUs HOT 1
- Triton doesnt exist anymore I think? HOT 3
- K8s deployment (via helm chart) HOT 2
- Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) HOT 1
- why my response are all !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! HOT 3
- Can I merge images of triton and client into one?eg fastertransformer_backend get content_fetch <fastertransformer&client>in CMakeLists ? HOT 1
- help me HOT 1
- What is the comparison of these model in huggingface? HOT 2
- Python Backend: "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0" HOT 2
- [promptlib] proxy {"cause":{}} HOT 1
- ollama HOT 2
- Company Proxy HOT 1
- is documentation outdated?
- Jetbrains Support
- RTX 4060 Unsupported Message
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fauxpilot.