Comments (5)
与微调的block_size参数有关,2048已够长了。如果需要更长的回答,可传入历史多轮对话
from llama2-lora-fine-tuning.
与微调的block_size参数有关,2048已够长了。如果需要更长的回答,可传入历史多轮对话
但是,用decode超过2048就会乱生成,没检测到结束符就会一直生成,直接内存就爆了。
有达到最大长度自动停止的配置吗?
from llama2-lora-fine-tuning.
与微调的block_size参数有关,2048已够长了。如果需要更长的回答,可传入历史多轮对话
您说的传入历史是指将上一步没生成完的输出当做输入再送一遍吗?
from llama2-lora-fine-tuning.
有可能没有结束符,这个也没什么好办法,将上次输入、输出放到history入参里,本次提示词用“继续”,这个例子是用Llama-2-7b-chat微调,效果一般。后来我在原始模型上微调过一次,效果比这个好一些。https://github.com/git-cloner/Llama2-chinese
from llama2-lora-fine-tuning.
有可能没有结束符,这个也没什么好办法,将上次输入、输出放到history入参里,本次提示词用“继续”,这个例子是用Llama-2-7b-chat微调,效果一般。后来我在原始模型上微调过一次,效果比这个好一些。https://github.com/git-cloner/Llama2-chinese
个人理解,history的长度也算在2048内,他只是拼接到当前的输入前面了。如果上一步超了,下一步也生成不出来吧
from llama2-lora-fine-tuning.
Related Issues (15)
- 请问扩充中文词表的作用是什么呀 HOT 2
- stream_output 是不是没用到呀 HOT 1
- ValueError: Attention mask should be of size (4, 1, 240, 480), but is torch.Size([4, 1, 240, 240]) HOT 3
- 为什么generate里面使用base model的tokenizer
- ImportError: cannot import name 'import_path' from '_pytest.doctest' HOT 1
- deepspeed这里是使用zero-1/2/3哪个模式切片的?
- 请教关于微调 HOT 5
- llama2-13B和llama2-70b所需要的显卡配置 HOT 1
- validation_files HOT 1
- 可以支持多机多卡吗 HOT 2
- 可以支持多机多卡吗,如何配置支持deepspeed的stage2,stage3
- pip install git+https://github.com/huggingface/transformers -i https://pypi.mirrors.ustc.edu.cn/simple”这步报错了 HOT 1
- ===================================BUG REPORT===================================
- MultiGPU+Deepspeed+4bitQlora HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama2-lora-fine-tuning.