duxiaoman-di / xuanyuan Goto Github PK
View Code? Open in Web Editor NEW轩辕:度小满中文金融对话大模型
轩辕:度小满中文金融对话大模型
请问后续会有13B或者7B、6B版本的量化版本吗?
目前Readme里面提供的示例代码有些简单
用cli_demo.py推理的时候怎么实现流式输出
请问分阶段进行增量预训练,针对70b模型做增量预训练总计用了多少token?
如题,按照官方demo的py脚本运行结果如下:
输入: 介绍下你自己
输出: 猖玺巢玺毡毡磅猖晷帖apis帖湍殄晷毡磅pec刺磅惚毡刺帖殄殄毡磅磅夙窠湍殄惚湍盛玺蒂锥帖磅锥楔湍磅毡磅毡蒂鞣踪锥楔毡疯毡毡帖锥磅窠磅毡蒂磅鞣蒂晷绝毡磅刺锥uga窠dn磅绝玺锥雉毡蒂蒂刺毡雉磅雉窠毡鞣窠窠孤刺晷刺毡蒂蒂绝刺孤猖帖毡磅绝磅磅毡蒂雉晷泪踪毡毡刺骧磅窠蒂盛湍绝疯毡毡盛磅磅刺磅毡夙磅磅GO磅盛uga蒂绝磅磅盛磅毡毡锥ten雉蒂锥锥锥湍雉鞣鞣猖鞣刺玺刺绝蒂惚雉磅窠鞣猖雉磅毡磅绝鹊鞣磅鞣鞣磅绝蒂磅磅帖晷盛毡磅雉磅盛刺晷骧盛磅磅盛帖磅妥毡鞣盛晷绝雉鞣帖夙雉骧磅泪磅猖猖磅磅绝盛磅毡雉晷绝磅磅夙鞣磅晷鞣蒂磅鞣鞣窠酣uga盛毡磅鞣盛妥鞣刺鹊
有遇到相似问题的朋友吗?可以解决吗?
如标题
When will Ollama be supported ,because lots of developers run ollama at mac
很好奇这里是什么原因,以及如果XUANYUAN-13B指标这么高,那XUANYUAN2-70B的意义是什么呢?
我用了transformer的textstreamiter,但是不像其他llama模型可以正常stream输出,xuanyuan似乎输出为空。
请问xuanyuan的stream输出是不是和其他llama模型不一样?
你好,在本工作中金融评测部分虽然选择了许多当前具备代表性的多个开源和可访问模型进行评测,但是并没有看到LLaMA-2-70B模型(而LLaMA-2-7B、13B都有)的评测结果,这是令人诧异的。因为,XuanYuan-70B是基于LLaMA-2-70B继续训练的,从消融研究的角度来看,LLaMA-2-70B的评测是必不可少的。所以想了解这个评测的缺失是否有特殊的原因?
打个包? 增加一下文件pdf 或 txt 的解读功能如何?
谢谢!
在两台不同的机器部署时遇到问题,其中 cuda11.7 环境的机器会报错
输入: 介绍下你自己
/opt/conda/conda-bld/pytorch_1695392020201/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [19,0,0], thread: [96,0
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[5], line 9
7 print(f"输入: {content}")
8 inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
----> 9 outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.95)
10 outputs = tokenizer.decode(outputs.cpu()[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
11 print(f"输出: {outputs}")
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py:1652, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
1644 input_ids, model_kwargs = self._expand_inputs_for_generation(
1645 input_ids=input_ids,
1646 expand_size=generation_config.num_return_sequences,
1647 is_encoder_decoder=self.config.is_encoder_decoder,
1648 **model_kwargs,
1649 )
1651 # 13. run sample
-> 1652 return self.sample(
1653 input_ids,
1654 logits_processor=logits_processor,
1655 logits_warper=logits_warper,
1656 stopping_criteria=stopping_criteria,
1657 pad_token_id=generation_config.pad_token_id,
1658 eos_token_id=generation_config.eos_token_id,
1659 output_scores=generation_config.output_scores,
1660 return_dict_in_generate=generation_config.return_dict_in_generate,
1661 synced_gpus=synced_gpus,
1662 streamer=streamer,
1663 **model_kwargs,
1664 )
1666 elif generation_mode == GenerationMode.BEAM_SEARCH:
1667 # 11. prepare beam search scorer
1668 beam_scorer = BeamSearchScorer(
1669 batch_size=batch_size,
1670 num_beams=generation_config.num_beams,
(...)
1675 max_length=generation_config.max_length,
1676 )
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py:2734, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
2731 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
2733 # forward pass to get next token
-> 2734 outputs = self(
2735 **model_inputs,
2736 return_dict=True,
2737 output_attentions=output_attentions,
2738 output_hidden_states=output_hidden_states,
2739 )
2741 if synced_gpus and this_peer_finished:
2742 continue # don't waste resources running the code we don't need
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py:164, in add_hook_to_module.<locals>.new_forward(module, *args, **kwargs)
162 output = module._old_forward(*args, **kwargs)
163 else:
--> 164 output = module._old_forward(*args, **kwargs)
165 return module._hf_hook.post_forward(module, output)
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:1038, in LlamaForCausalLM.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
1035 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1037 # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
-> 1038 outputs = self.model(
1039 input_ids=input_ids,
1040 attention_mask=attention_mask,
1041 position_ids=position_ids,
1042 past_key_values=past_key_values,
1043 inputs_embeds=inputs_embeds,
1044 use_cache=use_cache,
1045 output_attentions=output_attentions,
1046 output_hidden_states=output_hidden_states,
1047 return_dict=return_dict,
1048 )
1050 hidden_states = outputs[0]
1051 if self.config.pretraining_tp > 1:
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:925, in LlamaModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
921 layer_outputs = torch.utils.checkpoint.checkpoint(
922 create_custom_forward(decoder_layer), hidden_states, attention_mask, position_ids
923 )
924 else:
--> 925 layer_outputs = decoder_layer(
926 hidden_states,
927 attention_mask=attention_mask,
928 position_ids=position_ids,
929 past_key_value=past_key_value,
930 output_attentions=output_attentions,
931 use_cache=use_cache,
932 padding_mask=padding_mask,
933 )
935 hidden_states = layer_outputs[0]
937 if use_cache:
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py:159, in add_hook_to_module.<locals>.new_forward(module, *args, **kwargs)
158 def new_forward(module, *args, **kwargs):
--> 159 args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
160 if module._hf_hook.no_grad:
161 with torch.no_grad():
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py:290, in AlignDevicesHook.pre_forward(self, module, *args, **kwargs)
285 fp16_statistics = self.weights_map[name.replace("weight", "SCB")]
286 set_module_tensor_to_device(
287 module, name, self.execution_device, value=self.weights_map[name], fp16_statistics=fp16_statistics
288 )
--> 290 return send_to_device(args, self.execution_device), send_to_device(
291 kwargs, self.execution_device, skip_keys=self.skip_keys
292 )
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/operations.py:160, in send_to_device(tensor, device, non_blocking, skip_keys)
157 elif skip_keys is None:
158 skip_keys = []
159 return type(tensor)(
--> 160 {
161 k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys)
162 for k, t in tensor.items()
163 }
164 )
165 elif hasattr(tensor, "to"):
166 try:
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/operations.py:161, in <dictcomp>(.0)
157 elif skip_keys is None:
158 skip_keys = []
159 return type(tensor)(
160 {
--> 161 k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys)
162 for k, t in tensor.items()
163 }
164 )
165 elif hasattr(tensor, "to"):
166 try:
File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/operations.py:167, in send_to_device(tensor, device, non_blocking, skip_keys)
165 elif hasattr(tensor, "to"):
166 try:
--> 167 return tensor.to(device, non_blocking=non_blocking)
168 except TypeError: # .to() doesn't accept non_blocking as kwarg
169 return tensor.to(device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
在 CUDA 11.8 环境运行正常。
可能本项目最低需要 CUDA 11.8 ?
如题
如题
输入“我叫克拉拉,我是”之后,生成的结果是
“我叫克拉拉,我是小微企业主,想申请一笔贷款,请问银行会考虑哪些方面?
Assistant:作为银行审核贷款申请的一个主要因素,您的企业信用记录以及财务状况通常是考虑的重点。您需要提供一些关键的财务信息,如财务报表、现金流状况、营业额和利润等等,以证明您的企业的稳健经营状况。此外,您的信用历史记录也很重要,包括您曾经向其他银行申请过贷款、信用卡或者其他的财务服务,以及您的信用评分和还款记录等。此外,银行会考虑您的还款能力和还款意愿,以确保您可以按时还款并避免不良的还款记录。最后,您的经营计划、贷款用途和担保措施也会对银行审核贷款申请产生影响。”
看起来像是在随机续写,并没有实现问答?
你好,方便公开一下用GPT-4修改题目格式的prompt吗?十分感谢。
您好,请问可以提供一些增量预训练时候的训练配置吗?例如,学习率、batch size、warmup、weight_decay等参数。
请问什么时候支持4bit版量化模型的加速,目前的量化模型输出极慢,难以使用,谢谢
您好,请问关于混合微调阶段,预训练数据和指令微调数据放在一起训练,数据格式是怎么统一组织的呢?我理解预训练的数据格式是一段text,而指令微调有instruction和output
vllm推理的时候怎么实现流式输出?
1、如题,论文中只看到用了A100和分布式训练。没有透露实际资源。能否说明?
2、另外,请教一下全量参数训练和一些节约资源的训练方式,如P-turning、Lora等实际使用上的区别和考虑?
1000 thanks your work,请教一下此次开源的13B增量预训练的预训练数据集和指令微调数据集的数据量级是多少?
我在服务器部署Duxiaoman-DI/XuanYuan-6B-Chat,24G显存 占了22G
加载模型是的输出如下:
use transformers.generate to infer...
loading weight with transformers ...
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
推理速度很慢,请问是什么原因
已解决,加载模型改成fp16就可以了。
1、输入汉字长度843字,显存占用25.4G,输出平均1.35字/s;
2、输入汉字长度1770字,显存不足;
请问以上情况是正常的嘛?
请问支持长文本输入吗?大概支持到多少token的输入呢?如果超出范围的输入是怎么处理的呢,直接截断吗?
报错信息:
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory model/XuanYuan-6B-Chat-4bit.
执行示例代码:
import torch
from transformers import LlamaForCausalLM, AutoTokenizer
model_name_or_path = "model/XuanYuan-6B-Chat-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = LlamaForCausalLM.from_pretrained(model_name_or_path, device_map="auto")
model.eval()
seps = [" ", "</s>"]
roles = ["Human", "Assistant"]
content = "介绍下你自己"
prompt = seps[0] + roles[0] + ": " + content + seps[0] + roles[1] + ":"
print(f"输入: {content}")
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
**inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.95
)
outputs = tokenizer.decode(
outputs.cpu()[0][len(inputs.input_ids[0]) :], skip_special_tokens=True
)
print(f"输出: {outputs}")
13B chat的基座模型是在xuanyuan 13B 模型上sft,还是在LLaMA2-13B chat的技术上训练?
hugging face下不了
请问下,在强化学习中,PPO阶段使用的数量大概是什么量级呢?
请问指令数据集后续是否会开源呢?
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ./Duxiaoman-DI/XuanYuan-6B-Chat-4bit.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.