deepseek-ai / deepseek-llm Goto Github PK
View Code? Open in Web Editor NEWDeepSeek LLM: Let there be answers
Home Page: https://chat.deepseek.com/
License: MIT License
DeepSeek LLM: Let there be answers
Home Page: https://chat.deepseek.com/
License: MIT License
The pytorch_model-00013-of-00014.bin and pytorch_model-00014-of-00014.bin files are missing for the intermediate ckpt——"DeepSeek-LLM-67B-Base-Intermediate-1400B" in the aws link.
Could you please update the model resource?
为什么Deepseek-Math-7B-rl 已经到了88.2%,但是DeepSeek-LLM-67B Chat只有84%?67B的综合模型,在数学能力上比7B的Math专有模型要差。
问题描述:
我正在尝试使用 AWS CLI 从 deepseek-ai 的 S3 桶复制文件到本地。我使用的命令是 aws s3 cp s3://deepseek-ai/DeepSeek-LLM/DeepSeek-LLM-7B-Base <local_path> --recursive --request-payer。
但是,我遇到了两个错误:
InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.
Service Unavailable: An error occurred (503) when calling the ListObjectsV2 operation (reached max retries: 4).
实际结果:
出现了 InvalidAccessKeyId 和 Service Unavailable 错误。
其他信息:
我想知道如何正确地访问 deepseek-ai 的 S3 桶,以及是否需要特定的 Access Key ID 或者 endpoint 参数。
首先感谢你们的优秀开源工作!
在你们发布的技术报告中提到的System Prompt,与你们在DeepSeek coder模板中的第一句提示词接近。
不过,在你们发布的67B的chat模型中,我检查了你们发布的tokenizer_config.json,发现模板中没有位置加入这个提示词。
请问System Prompt应该是加在哪里呢?
另外你们有function call或者agent版本的模型开发计划吗?
It appears a significant performance jump on point where lr decay.
Hi, the paper is very detailed in most aspects, but the training data is not mentioned in as much detail.
Specifically, I am interested in the following:
Will technical reports be released in the future?
感谢分享这么好的模型。
我使用5万条多轮数据对 67b base模型进行了sft微调。微调了一个epoch。但是测试时,模型输出会出现乱码。
使用的sft框架是 llama factory
实验参数如下:
deepspeed --num_gpus 2 --master_port=9901 src/train_bash.py
--deepspeed ds_config.json
--stage sft
--model_name_or_path /data/origin_models/deepseek-llm-67b-base
--do_train
--dataset 50k_multiple,self_cognition
--template alpaca
--finetuning_type lora
--quantization_bit 4
--lora_target q_proj,v_proj
--output_dir output-deepseek-67b-sft
--overwrite_cache
--overwrite_output_dir true
--per_device_train_batch_size 1
--gradient_accumulation_steps 10
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 100
--learning_rate 2e-4
--num_train_epochs 2.0
--plot_loss
--lora_rank 64
--lora_alpha 128
--cutoff_len 4096
--ddp_find_unused_parameters False
--preprocessing_num_workers 20
--save_total_limit 1
--flash_attn
测试时load模型的参数如下:
python src/web_demo.py
--model_name_or_path /data/origin_models/deepseek-llm-67b-base
--template alpaca
--finetuning_type lora
--quantization_bit 4
--checkpoint_dir /home/output-deepseek-deepctrl-67b-sft
训练硬件:
尝试调整了 repetition_penalty temprature top_p 的各种组合,这个问题仍然存在。
我的疑问是,是不是lora的rank过小,或者学习率过小,导致训练sft训练非常不充分造成的呢?
再次感谢!
why init lr can be so much higher than llama2-70b?
And, would such a lr decay schedule be remarkable better than a routine cosine decay lr schedule?
I was excited to use this model for coding but it looks like I'm better off sticking to the 33b until there's an Instructor model uploaded. Can we expect that anytime soon and/or if not, any sort of timeframe would be highly appreciated! (...I am presuming its not "if" but "when" for optimism's sake (= )
Thanks for this. The 7B model can be fit in Google colab given there model are tiny pieces.
Example: https://huggingface.co/bn22/Mistral-7B-Instruct-v0.1-sharded
尊敬的DeepSeek团队:
我写这封信是为了表达我对你们团队极富创造力的工作的感激之情。我注意到在仓库中并没有关于Lora微调的脚本和教程,而llama-factory也没有为DeepSeek 7B chat模型做Lora微调适配。然而,在我实测了Lora微调的效果之后,我感到非常佩服你们团队的工作。
我非常感谢你们团队在开发DeepSeek 7B chat模型方面所做的努力。你们的模型在Lora微调方面表现出色,这让我感到非常惊喜。我已经在我的教程中分享了我的Lora微调经验,并将其发布在GitHub上。如果需要,我可以将其整理成脚本形式,并提交PR。
再次感谢你们团队的工作,期待着你们未来的创新和贡献。
DeepSeek 7B chat lora 教程 地址:https://github.com/datawhalechina/self-llm/blob/master/DeepSeek/04-DeepSeek-7B-chat%20Lora%20%E5%BE%AE%E8%B0%83.md
仓库地址:https://github.com/datawhalechina/self-llm.git
你好,我尝试着复现base模型(7B和67B)在TriviaQA上的结果。发现使用tech report 中的prompt格式,结果还是相差了7个点左右。请问可以提供复现的代码吗?感谢你的帮助。
I am researching scaling laws across models and architectures among other things and was wondering if you could share the logs\training losses\val eval of the models you have ran for the scaling law experiments in DeepSeek LLM. If you have other similar losses or results it would also be interesting. It might not be super well curated, anything can be helpful.
Thanks
What programming languages are DeepSeek and other open source models submitted in the LeetCode Weekly Contest evaluation?
C++ or Python3?
Here are the responses for few models and deepseek-llm cannot output "ö" and "ü":
%ollama run orca2:13b "Please repeat: wäre, Tür, höchstens"
wäre, Tür, höchstens
Translation: would be, door, at most
%ollama run codellama:34b "Please repeat: wäre, Tür, höchstens"
Wäre, Tür, höchstens.
%ollama run deepseek-llm:67b-chat "Please repeat: wäre, Tür, höchstens"
To complete this task, I will first listen to the audio file provided and write down the German words that are spoken. Then,
I will repeat those words in a clear manner for you.
Step 1: Listen to the audio file and identify the German words being spoken. In this case, the words are "wäre", "Tr"
(door), and "hchstens" (at most).
Step 2: Repeat each word in a clear manner.
- wäre -> I would say this as "vare".
- Tr -> Pronounced like "tuer", which means door.
- hchstens -> This is pronounced like "hkhs-tens" and it translates to "at most."
Is this a problem of the model or with ollama ?
Thanks so much for sharing the findings and insights about "Multi-Choice Question Benchmarks", I have a quick question about the 20 million Chinese MC data leading to overfiting without generalizing to other tasks, are the data composed of questions with pure options OR with sort of explanations in the answers?
Thank you again for your great work!
请问可以提供一下量化后的模型吗?感谢!
请问如何复现在leetcode、humaneval评测集上的评测精度?可以分享下评测脚本吗?
注意到你们的模型在alignbench上的sota表现于是尝试复现了一下
模型名称,专业能力,中文理解,基本任务,数学计算,文本写作,综合问答,角色扮演,逻辑推理,中文推理,中文语言,总分
deepseek67b,6.870967741935484,6.086206896551724,6.661764705882353,4.901785714285714,6.613333333333333,7.394736842105263,6.431034482758621,4.478260869565218,4.690023291925466,6.676340667094462,5.683181979509964
我的认知里这个应该是低于预期的(虽然没有控制变量), 我推测大概是生成过程的问题, 我这边简单参考了huggingface上提供的例子写的generate过程如下,大概就按照官方的setting改了temperature参数,其他都是default
question = sample['question']
temperature = sample['temperature']
messages = [
{
"role": "user",
"content": question
}
]
input_tensor = self.tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = self.model.generate(input_tensor.to(self.model.device), temperature=temperature,max_new_tokens=2048)
answer = self.tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
return answer
请问如果要复现tech report中相近的精度,有没有更正确的template? 谢谢!
你好!在使用官方提供的vllm代码的时候,我有一个问题:
prompts = [tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) for messages in messages_list]
这一步操作之后,实际的生成结果是一个字符串序列。但是诸如<|begin▁of▁sentence|>等其实应该是作为special token拼接的。请问这样使用是否是正确的?
目前的上下文长度是4096,接下来是否会出长上下文的版本?
Hello, I'm seeking guidance on prompt engineering and managing toxicity/hallucination in scenarios where the system prompt is not compatible with the model. Could you provide advice or best practices for prompt engineering in such cases? Additionally, how can we effectively address issues related to toxicity and hallucination without a compatible system prompt? Any insights or examples would be greatly appreciated. Thank you for your assistance.
你好,请问如果我的SFT数据里面有system,那么我的模型输入应该是什么样的呢?
我用的LLama_factory做Deepseek的SFT,模型input是这样的:
<|begin▁of▁sentence|>You are a helpful assistant.User: Query
Assistant: RESPONSE<|end▁of▁sentence|>User: Query
Assistant: RESPONSE<|end▁of▁sentence|>
不知道这样处理system是否合适?
On the model page for deepseek-vl-7b-chat you link to https://github.com/deepseek-ai/DeepSeek-VL but the repository does not exist?
It makes it impossible to actually try the model. Please fix!
Which one would you recommend to use to get the best possible performance?
Great work
Would you like to share the test data for LeetCode Weekly Contest. It's very helpful for community.
From the paper, Eq.2 list Chinchilla compute calculation as
The first term comes from the
So,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.