Giter VIP home page Giter VIP logo

xuanyuan's People

Contributors

duxiaomantech avatar homieyoung avatar xyznlp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xuanyuan's Issues

XuanYuan-6B-Chat-4bit输出乱码怎么办?

如题,按照官方demo的py脚本运行结果如下:

输入: 介绍下你自己
输出: 猖玺巢玺毡毡磅猖晷帖apis帖湍殄晷毡磅pec刺磅惚毡刺帖殄殄毡磅磅夙窠湍殄惚湍盛玺蒂锥帖磅锥楔湍磅毡磅毡蒂鞣踪锥楔毡疯毡毡帖锥磅窠磅毡蒂磅鞣蒂晷绝毡磅刺锥uga窠dn磅绝玺锥雉毡蒂蒂刺毡雉磅雉窠毡鞣窠窠孤刺晷刺毡蒂蒂绝刺孤猖帖毡磅绝磅磅毡蒂雉晷泪踪毡毡刺骧磅窠蒂盛湍绝疯毡毡盛磅磅刺磅毡夙磅磅GO磅盛uga蒂绝磅磅盛磅毡毡锥ten雉蒂锥锥锥湍雉鞣鞣猖鞣刺玺刺绝蒂惚雉磅窠鞣猖雉磅毡磅绝鹊鞣磅鞣鞣磅绝蒂磅磅帖晷盛毡磅雉磅盛刺晷骧盛磅磅盛帖磅妥毡鞣盛晷绝雉鞣帖夙雉骧磅泪磅猖猖磅磅绝盛磅毡雉晷绝磅磅夙鞣磅晷鞣蒂磅鞣鞣窠酣uga盛毡磅鞣盛妥鞣刺鹊

有遇到相似问题的朋友吗?可以解决吗?

如何stream方式输出?

我用了transformer的textstreamiter,但是不像其他llama模型可以正常stream输出,xuanyuan似乎输出为空。

请问xuanyuan的stream输出是不是和其他llama模型不一样?

金融评测缺少LLaMA-2-70B的对比

你好,在本工作中金融评测部分虽然选择了许多当前具备代表性的多个开源和可访问模型进行评测,但是并没有看到LLaMA-2-70B模型(而LLaMA-2-7B、13B都有)的评测结果,这是令人诧异的。因为,XuanYuan-70B是基于LLaMA-2-70B继续训练的,从消融研究的角度来看,LLaMA-2-70B的评测是必不可少的。所以想了解这个评测的缺失是否有特殊的原因?

最低 CUDA 版本要求

在两台不同的机器部署时遇到问题,其中 cuda11.7 环境的机器会报错

CUDA 11.7 运行示例报错
输入: 介绍下你自己
/opt/conda/conda-bld/pytorch_1695392020201/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [19,0,0], thread: [96,0
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], line 9
      7 print(f"输入: {content}")
      8 inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
----> 9 outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.95)
     10 outputs = tokenizer.decode(outputs.cpu()[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
     11 print(f"输出: {outputs}")

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py:1652, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   1644     input_ids, model_kwargs = self._expand_inputs_for_generation(
   1645         input_ids=input_ids,
   1646         expand_size=generation_config.num_return_sequences,
   1647         is_encoder_decoder=self.config.is_encoder_decoder,
   1648         **model_kwargs,
   1649     )
   1651     # 13. run sample
-> 1652     return self.sample(
   1653         input_ids,
   1654         logits_processor=logits_processor,
   1655         logits_warper=logits_warper,
   1656         stopping_criteria=stopping_criteria,
   1657         pad_token_id=generation_config.pad_token_id,
   1658         eos_token_id=generation_config.eos_token_id,
   1659         output_scores=generation_config.output_scores,
   1660         return_dict_in_generate=generation_config.return_dict_in_generate,
   1661         synced_gpus=synced_gpus,
   1662         streamer=streamer,
   1663         **model_kwargs,
   1664     )
   1666 elif generation_mode == GenerationMode.BEAM_SEARCH:
   1667     # 11. prepare beam search scorer
   1668     beam_scorer = BeamSearchScorer(
   1669         batch_size=batch_size,
   1670         num_beams=generation_config.num_beams,
   (...)
   1675         max_length=generation_config.max_length,
   1676     )

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py:2734, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
   2731 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
   2733 # forward pass to get next token
-> 2734 outputs = self(
   2735     **model_inputs,
   2736     return_dict=True,
   2737     output_attentions=output_attentions,
   2738     output_hidden_states=output_hidden_states,
   2739 )
   2741 if synced_gpus and this_peer_finished:
   2742     continue  # don't waste resources running the code we don't need

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py:164, in add_hook_to_module.<locals>.new_forward(module, *args, **kwargs)
    162         output = module._old_forward(*args, **kwargs)
    163 else:
--> 164     output = module._old_forward(*args, **kwargs)
    165 return module._hf_hook.post_forward(module, output)

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:1038, in LlamaForCausalLM.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1035 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
   1037 # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
-> 1038 outputs = self.model(
   1039     input_ids=input_ids,
   1040     attention_mask=attention_mask,
   1041     position_ids=position_ids,
   1042     past_key_values=past_key_values,
   1043     inputs_embeds=inputs_embeds,
   1044     use_cache=use_cache,
   1045     output_attentions=output_attentions,
   1046     output_hidden_states=output_hidden_states,
   1047     return_dict=return_dict,
   1048 )
   1050 hidden_states = outputs[0]
   1051 if self.config.pretraining_tp > 1:

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py:925, in LlamaModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
    921     layer_outputs = torch.utils.checkpoint.checkpoint(
    922         create_custom_forward(decoder_layer), hidden_states, attention_mask, position_ids
    923     )
    924 else:
--> 925     layer_outputs = decoder_layer(
    926         hidden_states,
    927         attention_mask=attention_mask,
    928         position_ids=position_ids,
    929         past_key_value=past_key_value,
    930         output_attentions=output_attentions,
    931         use_cache=use_cache,
    932         padding_mask=padding_mask,
    933     )
    935 hidden_states = layer_outputs[0]
    937 if use_cache:

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py:159, in add_hook_to_module.<locals>.new_forward(module, *args, **kwargs)
    158 def new_forward(module, *args, **kwargs):
--> 159     args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
    160     if module._hf_hook.no_grad:
    161         with torch.no_grad():

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py:290, in AlignDevicesHook.pre_forward(self, module, *args, **kwargs)
    285                 fp16_statistics = self.weights_map[name.replace("weight", "SCB")]
    286         set_module_tensor_to_device(
    287             module, name, self.execution_device, value=self.weights_map[name], fp16_statistics=fp16_statistics
    288         )
--> 290 return send_to_device(args, self.execution_device), send_to_device(
    291     kwargs, self.execution_device, skip_keys=self.skip_keys
    292 )

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/operations.py:160, in send_to_device(tensor, device, non_blocking, skip_keys)
    157     elif skip_keys is None:
    158         skip_keys = []
    159     return type(tensor)(
--> 160         {
    161             k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys)
    162             for k, t in tensor.items()
    163         }
    164     )
    165 elif hasattr(tensor, "to"):
    166     try:

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/operations.py:161, in <dictcomp>(.0)
    157     elif skip_keys is None:
    158         skip_keys = []
    159     return type(tensor)(
    160         {
--> 161             k: t if k in skip_keys else send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys)
    162             for k, t in tensor.items()
    163         }
    164     )
    165 elif hasattr(tensor, "to"):
    166     try:

File ~/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/operations.py:167, in send_to_device(tensor, device, non_blocking, skip_keys)
    165 elif hasattr(tensor, "to"):
    166     try:
--> 167         return tensor.to(device, non_blocking=non_blocking)
    168     except TypeError:  # .to() doesn't accept non_blocking as kwarg
    169         return tensor.to(device)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

在 CUDA 11.8 环境运行正常。
可能本项目最低需要 CUDA 11.8 ?

提示模板的例子?

输入“我叫克拉拉,我是”之后,生成的结果是

“我叫克拉拉,我是小微企业主,想申请一笔贷款,请问银行会考虑哪些方面?

Assistant:作为银行审核贷款申请的一个主要因素,您的企业信用记录以及财务状况通常是考虑的重点。您需要提供一些关键的财务信息,如财务报表、现金流状况、营业额和利润等等,以证明您的企业的稳健经营状况。此外,您的信用历史记录也很重要,包括您曾经向其他银行申请过贷款、信用卡或者其他的财务服务,以及您的信用评分和还款记录等。此外,银行会考虑您的还款能力和还款意愿,以确保您可以按时还款并避免不良的还款记录。最后,您的经营计划、贷款用途和担保措施也会对银行审核贷款申请产生影响。”

看起来像是在随机续写,并没有实现问答?

增量预训练阶段的配置

您好,请问可以提供一些增量预训练时候的训练配置吗?例如,学习率、batch size、warmup、weight_decay等参数。

关于混合微调的数据格式

您好,请问关于混合微调阶段,预训练数据和指令微调数据放在一起训练,数据格式是怎么统一组织的呢?我理解预训练的数据格式是一段text,而指令微调有instruction和output

vllm 推理

vllm推理的时候怎么实现流式输出?

请问训练轩辕的千亿大模型需要多少GPU资源?

1、如题,论文中只看到用了A100和分布式训练。没有透露实际资源。能否说明?
2、另外,请教一下全量参数训练和一些节约资源的训练方式,如P-turning、Lora等实际使用上的区别和考虑?

13B模型训练数据量级

1000 thanks your work,请教一下此次开源的13B增量预训练的预训练数据集和指令微调数据集的数据量级是多少?

xuanyuan6B-chat 在3090推理很慢

我在服务器部署Duxiaoman-DI/XuanYuan-6B-Chat,24G显存 占了22G
加载模型是的输出如下:
use transformers.generate to infer...
loading weight with transformers ...
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.

推理速度很慢,请问是什么原因
已解决,加载模型改成fp16就可以了。

6B-chat在A100上推理

1、输入汉字长度843字,显存占用25.4G,输出平均1.35字/s;
2、输入汉字长度1770字,显存不足;
请问以上情况是正常的嘛?

支持长文本输入吗?

请问支持长文本输入吗?大概支持到多少token的输入呢?如果超出范围的输入是怎么处理的呢,直接截断吗?

HuggingFace下载XuanYuan-6B-Chat-4bit模型,执行示例代码报错:OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory model/XuanYuan-6B-Chat-4bit.

报错信息:

OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory model/XuanYuan-6B-Chat-4bit.

执行示例代码:

import torch
from transformers import LlamaForCausalLM, AutoTokenizer

model_name_or_path = "model/XuanYuan-6B-Chat-4bit"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = LlamaForCausalLM.from_pretrained(model_name_or_path, device_map="auto")
model.eval()

seps = [" ", "</s>"]
roles = ["Human", "Assistant"]

content = "介绍下你自己"
prompt = seps[0] + roles[0] + ": " + content + seps[0] + roles[1] + ":"
print(f"输入: {content}")
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs, max_new_tokens=256, do_sample=True, temperature=0.7, top_p=0.95
)
outputs = tokenizer.decode(
    outputs.cpu()[0][len(inputs.input_ids[0]) :], skip_special_tokens=True
)
print(f"输出: {outputs}")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.