Comments (8)
A singe node with 8 x A100 80G cards could sft 4k context ?
By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py
from yi.
A singe node with 8 x A100 80G cards could sft 4k context ?
By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py
6B or 34B ?
34b
from yi.
200k=200000 in Yi model
from yi.
so just modify this position?
from yi.
change max_seq_len in script args, as well as max_position_embeddings in config.json in model file to 200k. But this will obviously need more GPU memory. A singe node with 8 cards may not be enough.
Consider using multi-node sft training by setting deepspeed -H /path-to-hostfile/hostfile main.py ........
from yi.
A singe node with 8 x A100 80G cards could sft 4k context ?
from yi.
A singe node with 8 x A100 80G cards could sft 4k context ?
By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py
thanks
from yi.
A singe node with 8 x A100 80G cards could sft 4k context ?
By our experiments, it is okay to sft 4k with zero2+offload on a single node, and sft 16k with zero3+offload by properly setting deepspeed configs in utils/ds_utils.py
6B or 34B ?
from yi.
Related Issues (20)
- 8卡 A100-40G 可以做34B模型sft吗? HOT 1
- 是否支持在npu上微调和推理 HOT 3
- 偶发性的会报错
- v100显卡,加载量化模型Yi-34B-Chat-4bits,推理速度很慢 HOT 7
- Features : openai_api.py support multi turn dialogs. HOT 1
- Result of Yi-6B-Chat on the BBH dataset cannot be reproduced HOT 1
- Yi-VL-34b支持int4量化吗?怎么操作 HOT 2
- 自定义数据train.jsonl 8万多,eval.jsonl 105条,为什么SFT时候只显示 length of train dataset:2852,length of eval dataset: 9 HOT 1
- When the API is called multiple times, the GPU memory continuously increases until it overflows. HOT 1
- LLama3发表了,啥时候Yi出新版本啊 HOT 2
- RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'” HOT 4
- Test issue bot
- Test issue bot
- where can I find the training code or script for YI-VL HOT 1
- lora微调yi-6b-chat之后,生成的结果会出现大量的换行符以及空格 HOT 4
- YI:9b在长上下下回答异常 HOT 5
- 用自己的数据集微调时会出现下面的报错,但是用官方的yi_example数据集就不会出现报错,请问这是为什么? HOT 1
- 请问有Yi-VL可以实现few-shot(in-context)数据的推理或微调吗? HOT 1
- Let's Build Yi Cookbook Together - Your Ideas Matter! HOT 4
- 拉了一个多模态大模型技术交流群,大家可以加入进来进行技术交流
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yi.