Comments (2)
Thanks for the help. I was still getting CUDA OOM with your settings for 65b, but it seems it was because I was using the LLaMA pull request from transformers. When I switched to the main branch of transformers, it stopped giving me CUDA OOM.
from koalpaca.
I didn't use special much, just used this below:
torchrun --nproc_per_node=8 --master_port=41234 finetune.py
and modified finetune.py
is:
# ...
# optimized for RTX 4090. for larger GPUs, increase some of these?
MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
BATCH_SIZE = 256
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3 # we don't always need 3 tbh
LEARNING_RATE = 5e-5 # the Karpathy constant
CUTOFF_LEN = 512 # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
VAL_SET_SIZE = 0
TARGET_MODULES = [
"q_proj",
"v_proj",
]
DATA_PATH = "koen_alpaca_data_cleaned.json"
OUTPUT_DIR = "lora-alpaca"
device_map = "auto"
world_size = int(os.environ.get("WORLD_SIZE", 1))
ddp = world_size != 1
if ddp:
device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)}
GRADIENT_ACCUMULATION_STEPS = GRADIENT_ACCUMULATION_STEPS // world_size
model = LlamaForCausalLM.from_pretrained(
"decapoda-research/llama-65b-hf",
load_in_8bit=True,
device_map=device_map,
)
tokenizer = LlamaTokenizer.from_pretrained(
"decapoda-research/llama-65b-hf", add_eos_token=True
)
# ...
from koalpaca.
Related Issues (20)
- LLaMa 30B, 65B token은 7B token 그대로 써도 되는건가요?? HOT 1
- 허깅 페이스의 TGI 이미지로 KoAlpaca-Polyglot-12.8B docker 컨테이너 생성하려고 하는데 오류가 발생됩니다. HOT 1
- chat-ui description 수정 HOT 1
- PEFT로LoRA로드 중에 에러
- decapoda-research/llama-13b-hf 모델이 사라졌습니다. HOT 1
- 학습한 LLM 모델이 말을 끝내지 않고 계속 생성합니다. HOT 5
- KoAlpaca polyglot 12.8b Fine-tuning 시 에러문의 드립니다. HOT 2
- KoAlpaca 모델 실행 예시코드 실행 중 용량 초과로 취소된 문제에 대해 문의드려요.
- ko-alpaca 1.0 데이터셋 관련 문의 HOT 1
- Few-shot 평가 문의
- index.json 파일 문의 드립니다 HOT 1
- beomi/KoAlpaca-Polyglot-12.8B 로 inference를 진행하기 위해서는 48GB의 VRAM이 필요한가요? HOT 3
- prompt 관련 ko_alpaca_data.json 형식 문의 드립니다. HOT 1
- 학습 결과 inference시 질문좀 드리겠습니다.! HOT 3
- 모델 저장 및 허깅페이스에 올리는법..이것때문에 문제가 생기네요 ㅠㅠ HOT 1
- 원하는 형태의 답변으로 고정시킬 수 있는 방법이 있을까요? HOT 4
- NSMC 결과 reproducing HOT 1
- 상업적 이용 가능 여부 관련 HOT 2
- 데모에 성능에 대해 질문있습니다. HOT 1
- Citation 관련 문의드립니다
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from koalpaca.