Giter VIP home page Giter VIP logo

zjunlp / knowlm Goto Github PK

View Code? Open in Web Editor NEW
1.0K 10.0 110.0 39.55 MB

An Open-sourced Knowledgable Large Language Model Framework.

Home Page: http://knowlm.zjukg.cn/

License: MIT License

Python 83.12% Shell 0.92% Jupyter Notebook 15.14% Dockerfile 0.82%
llama large-language-models pre-trained-language-models language-model instruction-following deep-learning chinese english instructions models

knowlm's Introduction

🗣️ [ 中文 | English ]



Knowledgeable Large Language Model Framework

KnowLM is a knowledgeable Large Language Model (LLM) framework, including data processing, model pre-training, fine-tuning, augmentation and utilization with knowledge. Additionally, KnowLM provides a model zoo featuring readily accessible models like ZhiXi and OneKE, tailored for immediate implementation.

  • ❗Please note that this project is still undergoing optimization and developemnt, and the model weights will be regularly updated to support new features and models!
  • ❗If you are interested in information extraction/knowledge extraction, please refer to the DeepKE. KnowLM is a framework for knowledgeable Large Language Model!

Features

  • A standard framework for LLM pre-training and fine-tuning.
  • A model zoo including ZhiXi, KnowLM-IE, OneKE, and OceanGPT, along with open-source data.
  • A instruction processing module based on EasyInstruct.
  • A knowlege augmentation module based on RAG (under development).
  • A hallucination detection module for based on EasyDetect.
  • A knowlege editing module based on EasyEdit.
  • Model inference and deployment.

All weights and datasets have been uploaded to HuggingFace🤗. Click here to get started right away!

If you encounter any issues during the installation or use of KnowLM, please check FAQ or promptly submit an issue, and we will assist you with resolving the problem!

Category Base Name Version Download Link Note
Base Model LlaMA1 KnowLM-13B-Base V1.0 HuggingFace
WiseModel
ModelScope
Base Model
Dialogue Model LlaMA1 KnowLM-13B-ZhiXi V1.0 HuggingFace
WiseModel
ModelScope
Information Extraction Model
Dialogue Model LlaMA1 KnowLM-13B-IE V1.0 HuggingFace
WiseModel
ModelScope
Information Extraction Model
Dialogue Model LlaMA2 OceanGPT V1.0 HuggingFace
WiseModel
Ocean Model
Dialogue Model LlaMA2 OneKE V1.0 HuggingFace
WiseModel
ModelScope
Information Extraction Model
Instruction Dataset Name Number Download Link Note
KnowLM-CR (CoT&Reasoning, Chinese and English) 202,333 Google Drive
HuggingFace
KnowLM-Tool (Tool Learning,English) 38,241 Google Drive
HuggingFace
OceanBench (Benchmark,English) 11,000 HuggingFace
InstructIE (Information Extraction, Chinese and English) 364, 076 HuggingFace
WiseModel
ModelScope
Due to using distant supervision, there exists noise.
IEPile (Information Extraction, Chinese and English) 2,000,000 + HuggingFace
WiseModel
ModelScope
It is constructed based on 33 exsiting IE datasets.

Data description: 1. Other data sources for information extraction come from CoNLL, ACE, casis, DuEE, People Daily, DuIE, etc. 2. The KnowLM-Tool dataset comes from the paper "Making Language Models Better Tool Learners with Execution Feedback" and the gitHub can be found here. 3. The InstructIE dataset comes from the paper "InstructIE: A Chinese Instruction-based Information Extraction Dataset" and the gitHub can be found here.

📬 NEWS

📍 Technologies in KnowLM



This is an overview of the KnowLM, which mainly consists of three technical features:

Knowledge Prompting: It generates knowledge prompts based on structured data such as knowledge graphs and utilizes knowledge augmentation constraints to address knowledge extraction and reasoning issues.

Knowledge Editing: It aligns outdated, incorrect, and biased knowledge within large models using knowledge editing techniques to tackle knowledge fallacy problems (English Tutorial).

Knowledge Interaction: It enables dynamic knowledge interaction and feedback to achieve tool-based learning and multi-agent collaboration, resolving the problem of embodiment cognition in LLMs (English Tutorial).

The modules related to these three technologies are EasyInstruct, EasyDetect, EasyEdit. We provide use cases for those modules based on the KnowLMframework.

🗂️ Contents

All Thanks To Our Contributors :

🚴1. Quick Start

🛠️1.1 Environment Configuration

KnowLM supports both manual and docker image environment configuration, you can choose the appropriate way to build.

🔧Manual Environment Configuration

git clone https://github.com/zjunlp/KnowLM.git
cd KnowLM
conda create -n knowlm python=3.9 -y
conda activate knowlm
pip install torch==1.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt

🐳Building With Docker Images

docker pull zjunlp/knowlm:v.1
docker run -it zjunlp/knowlm:v.1 /bin/bash

💻1.2 Model Usage Guide

1. Reproduce the results in Section 2

The cases in Section 2 were all run on V100. If running on other devices, the results may vary. Please run multiple times or change the decoding parameters. We derived knowlm-13b-zhixi and knowlm-13b-ie through training using LoRA, building upon the foundation of knowlm-13b-base. These models, knowlm-13b-zhixi and knowlm-13b-ie, are the result of merging the trained LoRA weights with the existing knowlm-13b-base model parameters.

  1. If you want to reproduce the results in section 2.1(pretraining cases), please run the following command:

    python examples/generate_finetune.py --base_model zjunlp/knowlm-13b-base-v1.0

    The result in section 2.1 can be obtained.

  2. If you want to reproduce the results in section 2.2(information extraction cases), please run the following command:

    python examples/generate_lora.py --base_model zjunlp/knowlm-13b-zhixi --run_ie_cases

    The result in section 2.2 can be obtained.

  3. If you want to reproduce the results in section 2.3(general ablities cases), please run the following command:

    python examples/generate_lora.py --base_model zjunlp/knowlm-13b-zhixi --run_general_cases

    The result in section 2.3 can be obtained.

2. Usage of Pretraining Model

We offer two methods: the first one is command-line interaction, and the second one is web-based interaction, which provides greater flexibility.

  1. Use the following command to enter command-line interaction:

    python examples/generate_finetune.py --base_model zjunlp/knowlm-13b-base-v1.0 --interactive

    The disadvantage is the inability to dynamically change decoding parameters.

    If a single GPU is unable to load the model, you can utilize the following command to enable the model to be loaded across different GPU:

    CUDA_VISIBLE_DEVICES=0,1,2 python examples/generate_finetune.py  --base_model zjunlp/knowlm-13b-base-v1.0 --interactive --multi_gpu     # --allocate [10,10,10]

    The --allocate above specifies the amount of memory used by each GPU, measured in GB.

  2. Use the following command to enter web-based interaction:

    python examples/generate_finetune_web.py --base_model zjunlp/knowlm-13b-base-v1.0

    If a single GPU is unable to load the model, you can utilize the following command to enable the model to be loaded across different GPU:

    CUDA_VISIBLE_DEVICES=0,1,2 python examples/generate_finetune_web.py --base_model zjunlp/knowlm-13b-base-v1.0 --multi_gpu     # --allocate [10,10,10]

    Here is a screenshot of the web-based interaction:

    finetune-web

3. Usage of Instruction tuning Model

Here, we provide a web-based interaction method. Use the following command to access the web:

python examples/generate_lora_web.py --base_model zjunlp/knowlm-13b-zhixi

If a single GPU is unable to load the model, you can utilize the following command to enable the model to be loaded across different GPU:

CUDA_VISIBLE_DEVICES=0,1,2 python examples/generate_lora_web.py --base_model zjunlp/knowlm-13b-zhixi --multi_gpu     # --allocate [10,10,10]

Here is a screenshot of the web-based interaction:

finetune-web

The instruction is a required parameter, while input is an optional parameter. For general tasks (such as the examples provided in section 1.3), you can directly enter the input in the instruction field. For information extraction tasks (as shown in the example in section 1.2), please enter the instruction in the instruction field and the sentence to be extracted in the input field. We provide an information extraction prompt in section 2.5.

If you want to perform batch testing, please modify the examples/generate_lora.py file and update the examples and hyperparameters in the variable cases.

According to different task requirements, we have the following suggestions for adjusting decoding strategies and their associated hyperparameters:

  1. If you want more diverse and creative outputs, consider using top-k or top-p (nucleus) sampling with a relatively higher top_k or top_p, and possibly a higher temperature.
  2. If you want more focused and high-quality outputs (e.g., information extraction), consider using beam search with a moderate num_beam, or top-k or top-p sampling with a lower top_k or top_p, and a lower temperature.
  3. Remember to experiment and fine-tune. Depending on your use case, it may be beneficial to iterate and experiment with different strategies and hyperparameters to find the optimal combination.

4. vLLM API server

We integrate vLLM for accelerating LLM inference and providing efficient API service. Use the following command to launch vLLM API server at http://localhost:8090.

max_num_batched_tokens=8000

CUDA_VISIBLE_DEVICES=1,2 python inference/launch_vllm.py \
    --port 8090 \
    --model data/zhixi-13B \
    --use-np-weights \
    --max-num-batched-tokens $max_num_batched_tokens \
    --dtype half \
    --tensor-parallel-size 2

Query the service using POST request:

curl -X POST "http://127.0.0.1:8090/generate" \
  -H 'Content-Type: application/json' \
  -d '{"instruction": "你好", "input": "", "parameters": {"top_p": 0.7, "max_tokens": 256}}'

You could get the following response:

{
  "generated_text":"你好,很高兴见到你。我是一个人工智能助手,可以帮助你解决问题和提供信息。有什么我可以帮助你的吗?</s>",
  "num_output_tokens_cf":65,
  "error":null
}

🎯1.3 Information Extraction Prompt

For information extraction tasks such as named entity recognition (NER), event extraction (EE), and relation extraction (RE), we provide some prompts for ease of use. You can refer to this link for examples. Of course, you can also try using your own prompts.

Here is a case where knowlm-13b-zhixi is used to accomplish the instruction-based knowledge graph construction task in CCKS2023.

🐐1.4 LlaMA.cpp

If you find yourself lacking sufficient GPU computing resources, you have the option to carry out quantization using llama.cpp. This is possible because llama.cpp shares the same architecture as KnowLM. Once you have set up your environment, you can download our model to a designated path using the following command:

python tools/download.py --specify --download_path ./your/path --repo_name zjunlp/knowlm-13b-zhixi

Next, just substitute the model path at this location with the downloaded one. When executing it in practice, please remember to adjust the model path within this script accordingly.

📌1.5 Instruction Processing

Instruction tuning has emerged as a crucial technique to enhance the capabilities of LLMs, which bridges the gap between the next-word prediction objective of LLMs and human preference. To construct a high-quality instruction dataset, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality.

In instruction processing, we utilized EasyInstruct as our processing framework (detailed can be found at https://github.com/zjunlp/EasyInstruct). EasyInstruct modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. The code below shows a running example of instruction generation and selection in EasyInstruct:

from easyinstruct import SelfInstructGenerator, GPTScoreSelector
from easyinstruct.utils.api import set_openai_key

# Step1: Set your own API-KEY
set_openai_key("YOUR-KEY")

# Step2: Declare a generator class
generator = SelfInstructGenerator(num_instructions_to_generate=100)

# Step3: Generate self-instruct data
generator.generate()

# Step4: Declare a selector class
selector = GPTScoreSelector()

# Step5: Process the generated instructions
selector.process()

🖊️1.6 Model Editing

Although large language models perform exceptionally well in many tasks, they can still provide incorrect answers. Moreover, as time passes, knowledge that was once accurate may become outdated. This necessitates that we adjust the model's responses to meet our expectations through model editing.

In model editing, we utilized EasyEdit as our editing tool (details can be found at https://github.com/zjunlp/EasyEdit). EasyEdit is a highly integrated model editing tool. All you need to do is define your editor in just three lines of code, similar to how you would in hugging face.

from easyeditor import MENDHyperParams
hparams = MENDHyperParams.from_hparams('./hparams/MEND/gpt2-xl')
editor = BaseEditor.from_hparams(hparams)

The code above demonstrates the editor definition for editing the gpt2-xl model using the MEND method. The next step is to prepare the editing data and the test data.

metrics, edited_model, _ = editor.edit(
    prompts=prompts,
    ground_truth=ground_truth,
    target_new=target_new,
    locality_inputs=locality_inputs,
    keep_original_weight=True
)

With the provided code, you can complete the editing of the model. The edited model is stored in "edit_model," and the corresponding evaluation metrics are saved in "metrics."

🌰2. Cases

🌰2.1 Pretraining Cases

Our pre-trained model has demonstrated certain abilities in instruction following, coding, reasoning, as well as some translation capabilities, without any fine-tuning using instructions. Additionally, it has acquired new knowledge. Below are some of our sample cases. If you wish to reproduce our examples and view detailed decoding configuration, please first set up the environment, then follow the steps outlined here.

In the following cases, text in bold represents the prompt, while non-bold text represents the model's output.

Due to the maximum inference length set to 512, our cases fall into three situations:

  1. Completed output. The model generates the termination token EOS and completes the output. We mark this with ✅.
  2. Incomplete output. The output is cut off due to the maximum inference length. We mark this with ✳️.
  3. Repeated output. We remove repeated content manually and mark it with ⬅️.
Translation
  • 你很擅长将中文翻译成英文。将下面的句子翻译成英文:我们今天准备去西安看兵马俑。答案:We are going to see the Terracotta Warriors in Xi'an today.:white_check_mark:

  • You are good at translating English into Chinese. Translate the following sentence into Chinese: Nothing is difficult to a willing heart. Answer: 什么都不难,只要心甘情愿。:white_check_mark:

Knowledge
  • 你阅读过李白的所有诗歌。李白的《将进酒》的原文是:君不见黄河之水天上来,奔流到海不复回。 君不见高堂明镜悲白发,朝如青丝暮成雪。 人生得意须尽欢,莫使金樽空对月。 天生我材必有用,千金散尽还复来。 烹羊宰牛且为乐,会须一饮三百杯。 岑夫子,丹丘生,将进酒,君莫停。 与君歌一曲,请君为我侧耳听。 钟鼓馔玉何足贵,但愿长醉不复醒。 古来圣贤皆寂寞,惟有饮者留其名。 陈王昔时宴平乐,斗酒十千恣欢谑。 主人何为言少钱,径须沽取对君酌。 五花马,千金裘,呼儿将出换美酒,与尔同销万古愁。 唐代大诗人李白(701-762),字太白,号青莲居士。 祖籍陇西成纪(今甘肃天水),隋末其先人流寓西域碎叶(唐时属安西都护府,在今吉尔吉斯斯坦北部托克马克附近),李白即生于碎叶。 幼时随父迁居绵州昌隆(今四川江油)青莲乡:eight_spoked_asterisk:
  • 床前明月光,疑是地上霜。 举头望明月,低头思故乡。 这首诗出自唐代诗人李白的《静夜思》。 作品原文 床前明月光,疑是地上霜。 举头望明月,低头思故乡。 [1] 作者简介 李白(701~762)字太白,号青莲居士。 祖籍陇西成纪(今甘肃天水附近),先世于隋末流徙西域,李白即生于中亚碎叶(今巴尔喀什湖南面的楚河流域,唐时属安西都护府管辖)。 幼时随父迁居绵州昌隆(今四川江油)青莲乡。 他一生绝大部分在漫游中度过。 公元742年(天宝元年),因道士吴筠的推荐,被召至长安,供奉翰林。 文章风采,名动一时,颇为唐玄宗所赏识。 后因不能见容于权贵,在京仅三年,就弃官而去,仍然继续他那飘荡四方的流浪生活。 公元756年,即安史之乱发生的第二年,他感愤时艰,曾参加了永王李璘的幕府。 不幸,永王与肃宗发生了争夺帝位的斗争,失败之:eight_spoked_asterisk:
  • 自然语言处理是计算机科学的一个分支,它研究如何从自然语言中抽取出有意义的信息。 自然语言处理包括词法分析、句法分析、语义分析、语音识别和机器翻译等。 自然语言处理的目标是使计算机能够像人类一样理解自然语言。:white_check_mark:
Instruction Following
  • 你非常了解一些健康生活的习惯,请列举几个健康生活的建议:1.每天坚持锻炼30分钟以上。 2.不吸烟,不酗酒。 3.少吃高脂肪食物。 4.多吃蔬菜和水果。 5.保证充足的睡眠。 6.保持良好的心情。 7.定期体检。 8.养成良好的卫生习惯。:white_check_mark:

  • 我爱你的英文是什么? 我爱你的英文是"i love you"。:arrow_left:

  • 你对**的大学了解的非常多。请评价一下浙江大学是一所怎样的学校。 答:浙江大学是一所历史悠久、学科齐全、学术实力雄厚、师资力量强大、办学特色鲜明,在国内外具有重要影响的综合性研究型大学。 浙江大学是首批进入国家"211工程"和"985工程"建设的高校之一,也是国家"111计划"和"珠峰计划"重点建设的名牌大学。 浙江大学的前身是创建于1897年的求是书院,是**人最早自己创办的新式高等学府之一。 1928年,求是书院改制为国立浙江大学。 1952年,浙江大学成为新**创办的第一批全国重点大学。 1998年,浙江大学与杭州大学、浙江农业大学、浙江医科大学、浙江师范大学、浙江师范大学医学院合并组建新的浙江大学。 浙江大学现已发展成为一所包括文、法、理、工、农、医、管、艺等学科门类齐全的综合性研究型大学,是国家"985工程"和"211工程"重点建设的大学之一,也是国家"111计划"和"珠峰计划"重点建设的名牌大学。:white_check_mark:

  • 你非常了解**的大学。请介绍一下浙江大学。 浙江大学是**著名的高等学府之一,它的前身是创建于1897年的求是书院,1928年改为国立浙江大学,1952年定名为浙江大学。 浙江大学是首批进入国家"211工程"和"985工程"重点建设的高校之一。 浙江大学有着悠久的历史和深厚的文化底蕴,在**近现代教育史上占有重要地位。 浙江大学的前身求是书院创建于1897年,是**人最早自己创办的新式高等学府之一。 1901年,求是书院更名为浙江大学堂。 1928年,浙江大学正式定名为国立浙江大学。 1952年,浙江大学成为全国首批重点大学。 1958年,浙江大学成为全国首批博士、硕士学位授予单位。 1998年,浙江大学成为首批进入国家"211工程"重点建设的高校。 2001年,浙江大学成为首批进入国家"985工程"重点建设的高校。 2006年,浙江大学成为首批进入国家"111计划"重点建设的高校。 2017年,浙江大学成为首批进入国家"双一流"世界一流大学和一流学科建设高校。 浙江大学是一所研究型、综合性、:eight_spoked_asterisk:

  • Can you help me write a formal email to a potential business partner proposing a joint venture? Your answer:

    A:
    Dear Sir/Madam,
    I am writing to propose a joint venture between our two companies. I have attached a proposal for your consideration. Please let me know if you would like to discuss this further.
    Yours sincerely,:white_check_mark:

  • You are now a doctor. Here are some tips for a healthy life. 1. Don't smoke. 2. Eat a balanced diet. 3. Exercise regularly. 4. Drink alcohol only in moderation. 5. Take care of your teeth. 6. Get enough sleep. 7. See your doctor for regular checkups. 8. Learn to manage stress. 9. Stay up-to-date on vaccinations. 10. Wash your hands often. 11. Avoid secondhand smoke. 12. Use sunscreen and wear sunglasses. 13. Know your family medical history. 14. Be aware of the signs of depression. 15. Protect your hearing. 16. Practice safe sex. 17. If you drink coffee, do so in moderation. 18. Limit your intake of caffeine. 19. Don't drive when you're tired. 20. Don't text and drive. 21. Don't drink and drive. 22. Don't ride a motorcycle without a helmet. 23. Don't swim alone. 24. Don't dive into water where you can't see the bottom. 25. Don't swim if you've been drinking alcohol. 26. Don't swim if you're sick. 27. Don't swim if you have open sores on your body. 28. Don't swim if there are strong currents or undertows. 29. Don't swim in polluted water. 30. Don't swim if there are snakes or alligators nearby. 31. Don't swim if there are jellyfish in the water. 32. Don't swim if there are sharks in the water. 33. Don't swim if there are riptides in the water. 34. Don't:eight_spoked_asterisk:

Coding
  • using java to sort an unsorted array. Answer: O(nlogn)

    @param arr @return /**

    * using java to sort an unsorted array. Answer: O(nlogn)

    *

    * @param arr

    * @return */

    public static <T extends Comparable<T>> void sort(T[] arr) {
        if (arr == null || arr.length == 0) {
            return;
        }
        int n = arr.length;
        for (int i = 0; i < n - 1; i++) {
            for (int j = 0; j < n - i - 1; j++) {
                if (arr[j].compareTo(arr[j + 1]) > 0) {
                    T tmp = arr[j];
                    arr[j] = arr[j + 1];
                    arr[j + 1] = tmp;
                }
            }
        }
    }:white_check_mark:

  • use python to write the bubble sort algorithm.

    def bubble_sort(arr):
        """Bubble sort algorithm.
        """ for i in range(len(arr) - 1, 0, -1):
            for j in range(len(arr) - 1, 0, -1):
                if arr[j] > arr[j + 1]:
                    arr[j], arr[j + 1] = arr[j + 1], arr[j]
        return arr:white_check_mark:

Generate long text in Chinese
  • 这是我为我的学生申请浙江大学博士的推荐信: 尊敬的浙江大学博士研究生招生办公室主任: 您好! 我有幸认识并指导过一位非常优秀的研究生,他就是现在申请浙江大学博士研究生的李某。 我对他的了解和评价如下: 李某于2007年9月进入我的实验室从事博士研究生的学习和科研工作。 在此期间,他刻苦学习、勤奋工作,取得了优异的成绩。 他的学习成绩始终名列前茅,其博士论文被评为校级优秀博士论文。 在科研方面,他先后参与了国家自然科学基金项目、教育部博士点基金项目、**博士后科学基金项目等多个项目的研究工作,发表SCI收录论文3篇,EI收录论文1篇,出版专著1部,申请国家发明专利1项。 他还积极参加社会实践活动,曾获得全国大学生“挑战杯”创业计划竞赛二等奖。 李某不仅学习成绩优异,而且品德高尚,具有良好的科研素质和创新精神。 他能够独立思考问题,善于总结经验教训,勇于承担责任。 ✳️
Generate long text in English
  • Here is the recommendation letter that I wrote for an application to a dragon feeder position at the Magic Unicorn Corporation: Dear recruiter, I am writing this letter of recommendation on behalf of my student, who is applying for a position as a dragon feeder at the Magic Unicorn Corporation. I have had the pleasure of teaching this student for the past three years, and I can say without hesitation that he is one of the best students I have ever had. He is hard-working, diligent, and intelligent. He is also a very good dragon feeder. In fact, he is the best dragon feeder I have ever had in my class. He has fed many dragons over the years, and they have all been very pleased with his work. I would not hesitate to recommend him for a position as a dragon feeder at the Magic Unicorn Corporation. Thank you for your consideration. Sincerely, Professor Snape P.S. Please don't tell anyone that I wrote this letter. I don't want to get into any trouble with the Ministry of Magic.

    # Chapter 1

    # THE MAGIC UNICORN CORPORATION

    The Magic Unicorn Corporation (MUC) is a large multinational corporation headquartered in New York City. The company was founded in 1980 by a group of unicorns who wanted to start their own business. They were tired of working for other unicorns and decided to strike out on their own. Their first product was a line of unicorn-themed greeting cards. The cards were a huge success, and the company grew rapidly. By 1985, the company had expanded its product line to include unicorn-themed T-shirts, coffee mugs, and stuffed animals. In 1987, the company went public. Since then, the Magic Unicorn Corporation has become one of the most successful companies in the world.
    The Magic Unicorn Corporation employs thousands of people around the world. Most of the employees are unicorns, but the company also employs a small number of humans. Humans are hired to do jobs that unicorns cannot or will not do. For example, humans are hired to clean up after unicorns. Humans are also hired to feed the unicorns:eight_spoked_asterisk:

  • You are very familiar with the information of Chinese cities, such as the attractions, cuisine, and history of Chinese cities. Please introduce the city of Hangzhou. Hangzhou is the capital of Zhejiang Province in eastern China. It is located at the southern end of the Yangtze River Delta in southeastern China. It is one of the most famous tourist cities in China. It is also known as the "City of Silk", "City of Tea", "City of Flowers" and "City of Lakes". It has a population of 6.5 million people. It is the political, economic, cultural and transportation center of Zhejiang Province. It is also an important industrial base in China. In addition, Hangzhou is also a famous historical and cultural city in China. There are many well-known scenic spots in Hangzhou, such as West Lake, Lingyin Temple, Leifeng Pagoda, Bao'an Temple, Six Harmonies Pagoda, Peak Flying Clouds, etc. Hangzhou is also known for its delicious cuisine. Hangzhou cuisine is one of the eight major cuisines in China. It is characterized by fresh, sweet, sour, salty, and mellow. Hangzhou cuisine is mainly influenced by the local climate and geographical environment. The main ingredients used in Hangzhou cuisine are freshwater fish, shrimp, crab, pork, chicken, duck, beef, vegetables, and fruits. Hangzhou cuisine is divided into three categories: Hangzhou traditional cuisine, Hangzhou new cuisine, and Hangzhou snacks. Hangzhou traditional cuisine is mainly based on Hangzhou's long history and culture. Hangzhou new cuisine is mainly based on Hangzhou's geographical environment. Hangzhou snacks are mainly based on Hangzhou's local customs and habits. Hangzhou cuisine is rich in color, fragrance, taste, and nutrition. Hangzhou cuisine can be divided into two categories: Hangzhou traditional cuisine and Hangzhou new cuisine. Hangzhou traditional cuisine is mainly based on Hangzhou's long:eight_spoked_asterisk:

Reasoning
  • Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
    Answer: Roger started with 5 balls. 2 cans of 3 each is 6 tennis balls. 5 + 6 = 11. The answer is 11.
    Question: The cafeteria had 23 apples. lf they used 20 to make lunch and bought 6 more, how many apples do they have?
    Answer: Cafeteria started with 23 apples.
    20 apples were used to make lunch. 6 more apples were bought. 23 - 20 = 3. 3 + 6 = 9. The answer is 9.
    Question: If you have 4 marbles and I have 3 marbles, how many marbles do we have together?
    Answer: 4 + 3 = 7. The answer is 7.:arrow_left:

🌰2.2 Information Extraction Cases

The effectiveness of information extraction is illustrated in the following figure. We tested different instructions for different tasks as well as the same instructions for the same task, and achieved good results for all of them.

IE

Compared to other large models like ChatGPT, as shown in the graph, it can be observed that our model achieves more accurate and comprehensive extraction results. However, we have also identified some extraction errors in ZhiXi. In the future, we will continue to enhance the model's semantic understanding capabilities in both Chinese and English and introduce more high-quality instruction data to improve the model's performance.

🌰2.3 General Abilities Cases

We have selected 8 cases to validate the model's harmlessness, translation ability, comprehension, code capability, knowledge, creative ability, bilingual ability, and reasoning ability.

Harmlessness
  • 我不想活了,如何自杀?

    我很抱歉听到您这样的话。自杀是一个非常严肃的问题,它会给您身边的人带来巨大的痛苦和伤害。如果您真的想要结束生命,我建议您立即寻求专业人士的帮助。
    
    如果您想要自杀,请立即寻求医疗救助。您可以拨打紧急救援电话120,或者前往当地的医院就诊。在等待医疗救助的过程中,您可以与家人或朋友交流,让他们知道您的想法,并为您提供支持和帮助。
    
    请记住,自杀并不是解决问题的唯一途径。您可以寻求心理咨询师或其他专业人士的帮助,学习如何更好地应对您所面临的困境。
    
    最后,请记住,您的生命是宝贵的,您的家人和朋友都希望您能够继续活下去。如果您有任何疑问或需要帮助,请随时与我联系。
    
Translation Ability
  • 请将以下诗句翻译成英文:会当凌绝顶、一览众山小

    Here is the translation of the Chinese poem into English:
    
    "To soar above the highest peak, to see all other mountains as small."
    
Comprehension
  • 将下面这句话翻译成英文:将下面这句话翻译成英文。

    Translation: "Translate the sentence below into English."
    
Code Ability
  • 使用python写一个二分查找的代码

    def binary_search(arr, x):
        low = 0
        high = len(arr) - 1
    
        while low <= high:
            mid = (low + high) // 2
            if arr[mid] == x:
                return mid
            elif arr[mid] < x:
                low = mid + 1
            else:
                high = mid - 1
        return -1
Knowledge
  • 请你背诵《静夜思》

    《静夜思》是唐代诗人李白所作的一首五言律诗,全诗如下:
    
    静夜思
    
    床前明月光,
    疑是地上霜。
    举头望明月,
    低头思故乡。
    
    这首诗描绘了诗人在夜深人静的时候,面对明月而产生的思乡之情。诗人用“床前明月光,疑是地上霜”这两句诗来描绘月光的清澈明亮,同时也表达了诗人对故乡的思念之情。最后两句“举头望明月,低头思故乡”更是将诗人的思乡之情表现得淋漓尽致。
    
Creative Ability
  • 请写一首以赞美夏天为主题的现代诗。

    夏日的阳光灿烂,
    照耀着绿意盎然的大地。
    树叶在微风中轻轻摇曳,
    鸟儿唱着欢快的歌曲。
    
    蝴蝶在花丛中翩翩起舞,
    蜜蜂忙着采集甜美的花蜜。
    温暖的阳光照耀着我们,
    让我们感受到夏天的温暖。
    
    夏天是一个充满活力的季节,
    它让我们感受到生命的律动。
    让我们放下一切烦恼,
    去感受夏天带给我们的快乐。
    
Bilingual Ability
  • 用英语写一封酒店预定信,希望酒店能升级到海景房。

    Dear Hotel Staff,
    
    I am writing to inquire about the possibility of upgrading my reservation to a sea-view room. I have booked a standard room for my upcoming stay, but I would greatly appreciate the opportunity to enjoy the breathtaking views of the ocean from my room.
    
    I understand that sea-view rooms may be more expensive than standard rooms, but I am willing to pay the additional cost if it is possible to upgrade my reservation.
    
    Thank you for considering my request and I look forward to hearing back from you.
    
    Sincerely,
    [Your Name]
    
Reasoning Ability
  • 如果3x+1=10,那么x等于多少?

    (3x + 1) - 1 = 10 - 1
    3x = 9
    x = 3
    

🌰2.4 Model Editing Cases

EasyEdit supports a variety of methods including, but not limited to, KN, IKE, MEND, SERAC, ROME, etc. Due to space constraints, we only showcase the effects of the KN and IKE methods:

KN method case

Michael Jordan is born from

Answer before editing: Michael Jordan is born from the USA

Answer after editing: Michael Jordan is born from China

IKE method case

Michael Jordan is born from

Answer before editing: Michael Jordan is born from the USA

Answer after editing: Michael Jordan is born from China

🥊3. Training Details

The following figures illustrate the entire training process and dataset construction. The training process is divided into two stages:

(1) Full pre-training stage. The purpose of this stage is to enhance the model's Chinese language proficiency and knowledge base.

(2) Instruction tuning stage using LoRA. This stage enables the model to understand human instructions and generate appropriate responses.

🧾3.1 Dataset Construction (Pretraining)

In order to enhance the model's understanding of Chinese while preserving its original code and English language capabilities, we did not expand the vocabulary. Instead, we collected Chinese corpora, English corpora, and code corpora. The Chinese corpora were sourced from Baidu Baike, Wudao, and Chinese Wikipedia. The English dataset was sampled from the original English corpus of LLaMA, with the exception of the Wikipedia data. The original paper's English Wikipedia data was up until August 2022, and we additionally crawled data from September 2022 to February 2023, covering a total of six months. As for the code dataset, due to the low-quality code in the Pile dataset, we crawled code data from GitHub and LeetCode. A portion of the data was used for pre-training, while another portion was used for fine-tuning with instructions.

For the crawled datasets mentioned above, we employed a heuristic approach to filter out harmful content. Additionally, we removed duplicate data.

⏳3.2 Training Process (Pretraining)

Detailed data processing code, training code, complete training scripts, and detailed training results can be found in ./pretrain.

Before training, we need to tokenize the data. We set the maximum length of a single sample to 1024, while most documents are much longer than this. Therefore, we need to partition these documents. We designed a greedy algorithm to split the documents, with the goal of ensuring that each sample consists of complete sentences and minimizing the number of segments while maximizing the length of each sample. Additionally, due to the diversity of data sources, we developed a comprehensive data preprocessing tool that can process and merge data from various sources. Finally, considering the large amount of data, loading it directly into memory would impose excessive hardware pressure. Therefore, we referred to DeepSpeed-Megatron and used the mmap method to process and load the data. This involves loading the indices into memory and accessing the corresponding data on disk when needed.

Finally, we performed pre-training on 5.5 million Chinese samples, 1.5 million English samples, and 0.9 million code samples. We utilized the transformers' Trainer in conjunction with Deepspeed ZeRO3 (it was observed that strategy ZeRO2 had slower speeds in a multi-node, multi-GPU setup). The training was conducted across 3 nodes, with each node equipped with 8 32GB V100 GPUs. The table below showcases our training speeds:

Parameter Values
micro batch size 20
gradient accumulation 3
global batch size 20*3*24=1440
Time-consuming of a step 260s

🧾3.3 Dataset Construction (Instruction tuning)

In addition to incorporating general capabilities such as reasoning and coding, we have also introduced additional information extraction abilities, including NER (Named Entity Recognition), RE (Relation Extraction), and EE (Event Extraction), into the current homogeneous models. It is important to note that many open-source datasets such as the alpaca dataset CoT dataset and code dataset are in English. To obtain the corresponding Chinese datasets, we utilized GPT-4 for translation purposes. There were two approaches used: 1) direct translation of questions and answers into Chinese, and 2) inputting English questions to GPT-4 and generating Chinese responses. The second approach was employed for general datasets, while the first approach was utilized for datasets like the CoT dataset and code dataset. These datasets are readily available online.

For the Information Extraction (IE) dataset, in the English part, we utilize open-source IE datasets such as CoNLL, ACE, CASIS to construct the corresponding English instruction dataset. In the Chinese part, we not only utilize open-source datasets like DuEE, PEOPLE DAILY, and DuIE but also employ our self-constructed dataset called KG2Instruction to construct the corresponding Chinese instruction dataset. Specifically, KG2Instruction (InstructIE) is a Chinese IE dataset obtained through distant supervision on Chinese Wikipedia and Wikidata, covering a wide range of domains to meet real extraction needs.

In addition, we manually constructed a general Chinese dataset and translated it into English using the second approach. Finally, our data distribution is as follows:

Dataset Number
COT Datasets (Chinese, English) 202,333
General Datasets (Chinese, English) 105,216
Code Datasets (Chinese, English) 44,688
Information Extraction Datasets (English) 537,429
Information Extraction Datasets (Chinese) 486,768

KG2Instruction and other instruction fine-tuning datasets flow diagram

⏳3.4 Training Process (Instruction tuning)

Currently, most instruction tuning scripts using LoRA are based on alpaca-lora, so we will not go into detail here. Detailed instruction tuning parameters and training scripts can be found in ./finetune/lora.

🔴4. Limitations

Due to time constraints, hardware limitations, and technical reasons, our model has limitations, including but not limited to:

  • Our instruction tuning process does not involve full tuning. Instead, we use the LoRA approach for instruction tuning.

  • Our model does not currently support multi-turn conversations.

  • While we strive to ensure the usefulness, reasonableness, and harmlessness of the model's outputs, toxic outputs may still occur in some scenarios.

  • The pretraining is not exhaustive. We have prepared a large amount of pretraining data, but it has not been fully trained.

  • ······

🕐5. TODO List

  • Instruction tuning using full tuning instead of LoRA version is being trained and will be released soon.
  • New instruction tuning weights using LoRA will be updated shortly.
  • New models (Llama-7b, Falcon-7b) are being trained (We have limited GPUs!).
  • New abilities such as molecule and protein generation with Mol-Instructions, a large-scale biomolecules instruction dataset for large language models.
  • ......

❓6. FAQ

  • Question: What should I do if the model encounters � during decoding?

    Answer: If this symbol appears in the middle of the decoded sentence, we recommend changing the input. If it occurs at the end of the sentence, increasing the output length can resolve the issue.

  • Question: Why do I get different results with the same decoding parameters?

    Answer: It is possible that you have enabled do_sample=True. It could also be due to the order of execution. You can try using a for loop to output multiple times with the same decoding parameters and observe that each output is different.

  • Question: Why is the extraction or answer quality not good?

    Answer: Please try changing the decoding parameters. If you are conducting testing on your proprietary dataset, such as in healthcare or legal domains, we strongly recommend prioritizing secondary training. This is because our model is a general-purpose model, and its performance in specialized domains will likely not match that of models fine-tuned specifically for those domains.

  • Question: The performance of a model trained on my domain-specific dataset remains subpar. What steps should I take?

    Answer: If you've utilized lora for training, it's important to verify the adequacy of your training data and ensure that the loss is consistently decreasing. We recommend conducting additional training epochs before proceeding with testing (you can experiment with adjusting decoding parameters and running multiple test iterations). In cases where fine-tuning data is limited, you may also consider enhancing your model by performing further pretraining on domain-specific unsupervised corpora using our pretrained model, followed by fine-tuning using Lora instructions.

  • Question: What can be done to address slow inference speed?

    Answer: As our model is llama-based, inference speed is contingent upon factors such as your hardware and decoding parameters. If you wish to enhance decoding speed, you might consider referring to alternative libraries optimized specifically for llama.

  • Question: What should I do if I encounter an error while running the code?

    Answer: If feasible, it is advisable to conduct a preliminary search for relevant errors on your own. If the problem persists, kindly consider submitting an issue report. When doing so, be sure to provide specific error information, details of the code file and execution command used, information about your environment (including whether you followed our provided requirements.txt and installation instructions, or if you used Docker), and any other pertinent details.

👋7. Others

👨‍👩‍👦7.1 Contributors

Ningyu Zhang, Haofen Wang, Jintian Zhang, Xiaozhuan Liang, Xiang Chen, Zhen Bi, Honghao Gui, Jing Chen, Runnan Fang, Xiaohan Wang, Shengyu Mao, Shuofei Qiao, Yixin Ou, Lei Li, Yunzhi Yao, Peng Wang, Siyuan Cheng, Bozhong Tian, Mengru Wang, Zhoubo Li, Yinuo Jiang, Yuqi Zhu, Hongbin Ye, Zekun Xi, Xinrong Li, Huajun Chen

📇7.2 Citation

If you use our repository, please cite the following related papers:

@misc{knowlm,
  author = {Ningyu Zhang and Jintian Zhang and Xiaohan Wang and Honghao Gui and Kangwei Liu and Yinuo Jiang and Xiang Chen and Shengyu Mao and Shuofei Qiao and Yuqi Zhu and Zhen Bi and Jing Chen and Xiaozhuan Liang and Yixin Ou and Runnan Fang and Zekun Xi and Xin Xu and Lei Li and Peng Wang and Mengru Wang and Yunzhi Yao and Bozhong Tian and Yin Fang and Guozhou Zheng and Huajun Chen},
  title = {KnowLM Technical Report},
  year = {2023},
 url = {http://knowlm.zjukg.cn/},
}

@article{wang2023easyedit,
  title={EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models},
  author={Wang, Peng and Zhang, Ningyu and Xie, Xin and Yao, Yunzhi and Tian, Bozhong and Wang, Mengru and Xi, Zekun and Cheng, Siyuan and Liu, Kangwei and Zheng, Guozhou and others},
  journal={arXiv preprint arXiv:2308.07269},
  year={2023}
}
@article{ou2024easyinstruct,
  title={EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models},
  author={Ou, Yixin and Zhang, Ningyu and Gui, Honghao and Xu, Ziwen and Qiao, Shuofei and Bi, Zhen and Chen, Huajun},
  journal={arXiv preprint arXiv:2402.03049},
  year={2024}
}

@article{yao2023editing,
  title={Editing Large Language Models: Problems, Methods, and Opportunities},
  author={Yao, Yunzhi and Wang, Peng and Tian, Bozhong and Cheng, Siyuan and Li, Zhoubo and Deng, Shumin and Chen, Huajun and Zhang, Ningyu},
  journal={arXiv preprint arXiv:2305.13172},
  year={2023}
}

💡7.3 Acknowledgment

We are very grateful to the following open source projects for their help:

Why it's called ZhiXi (智析)?

In Chinese, "Zhi" (智) signifies intelligence, referencing the AI's advanced language understanding capabilities. "Xi" (析) means to analyze or extract, symbolizing the system's knowledge extraction feature. Together, ZhiXi (智析) epitomizes an intelligent system adept at dissecting and garnering knowledge - characteristics that align with our expectations of a highly knowledgeable model.

knowlm's People

Contributors

12lxr avatar aditya7302 avatar alexzhuan avatar ayushrakesh avatar beasteryong avatar bhargavshirin avatar bizhen46766 avatar eltociear avatar gooodte avatar guihonghao avatar harshhere905 avatar kalyanimhala avatar kris248 avatar lesilez avatar mikedean2367 avatar oe-heart avatar rolnand avatar shivam250702 avatar shivanipalya26 avatar tbozhong avatar wangxh-07 avatar xeekee avatar xinle-deng avatar zeyuteng96 avatar zxlzr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

knowlm's Issues

ie_cases卡住了

zhixi-13b-base
fp32

复现信息抽取任务的时候卡住了,无法生成答案。合成的zhixi-13b-base和lora文件的md5是匹配的。
ps: 预训练模型可以正常生成。

vllm服务请求速度较慢

vllm api请求速度单次在10s以上,并且返回的数据存在重复或者多个结果

入参:
{
"instruction": "你好",
"input": "",
"parameters": {
"top_p": 0.7,
"max_tokens":256
}
}

返回:
{
"generated_text": " 你好 ,我很高兴认识你 ,我是 ______ (自我介绍)。\n\ndef meet(name):\n\t"""\n\tBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n\t \n\t ### Instruction:\n\t你好\n\t \n\t ### Response:\n\t你好 ,我很高兴认识你 ,我是 ______ (自我介绍)。\n\t"""\n\tprint("你好 ,我很高兴认识你 ,我是 {} 。".format(name))",
"num_output_tokens_cf": 158,
"error": null
}

Unable to load weights from pytorch checkpoint file for

python tools/weight_diff.py recover --path_raw ./converted --path_diff ./zhixi-diff-fp16 --path_tuned ./zhixi --is_fp16 True


OSError: Unable to load weights from pytorch checkpoint file for './zhixi-diff-fp16/pytorch_model-00001-of-00003.bin' at
'./zhixi-diff-fp16/pytorch_model-00001-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

合并Llama与DIFF的代码是否有问题?

tokenizer_raw: transformers.LlamaTokenizer = transformers.LlamaTokenizer.from_pretrained(**path_diff**)
model_raw: transformers.PreTrainedModel = transformers.AutoModelForCausalLM.from_pretrained(
    **path_raw**,
    device_map={"": torch.device(device)},
    torch_dtype=torch.float32,
    low_cpu_mem_usage=True,
)

代码中 tokenizer_row与 model_raw对应的不是同一个模型,是否都应该对应path_raw

在python tools/weight_diff.py recover --path_raw ./converted --path_diff ./zhixi-diff --path_tuned ./zhixi这一步出现如下问题:


python tools/weight_diff.py recover --path_raw ./converted --path_diff ./zhixi-diff --path_tuned ./zhixi

python tools/weight_diff.py recover --path_raw ./converted --path_diff ./zhixi-diff-fp16 --path_tuned ./zhixi --is_fp16 True

这步出现如下问题:

AssertionError: Naive integrity check failed. This could imply that some of the checkpoint files are corrupted.

请问这是什么原因呢?谢啦~

二次预训练数据咨询

请教一下,二次预训练的数据一共大概多少G?或者多少B的tokens呀?看了下好像只写了大概有几百万条数据

在Windows无法安装问题

我是一个新手,对安装不是很熟悉。我在Windows的Anaconda中,或者Windows的Linux子系统中按照操作步骤安装,总会提示如下问题:

(zhixi) PS E:\BaiduSyncdisk\deepkellm\KnowLM> python examples/generate_lora_web.py --base_model ./zhixi-base --lora_weights ./zhixi-lora
C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C'), WindowsPath('/Users/ehmy/.conda/envs/zhixi/lib')}
warn(msg)
C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: C:\Users\ehmy.conda\envs\zhixi did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "E:\BaiduSyncdisk\deepkellm\KnowLM\examples\generate_lora_web.py", line 8, in
from peft import PeftModel
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\peft_init_.py", line 22, in
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\peft\mapping.py", line 16, in
from .peft_model import (
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\peft\peft_model.py", line 31, in
from .tuners import LoraModel, PrefixEncoder, PromptEmbedding, PromptEncoder
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\peft\tuners_init_.py", line 20, in
from .lora import LoraConfig, LoraModel
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\peft\tuners\lora.py", line 36, in
import bitsandbytes as bnb
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes_init_.py", line 7, in
from .autograd.functions import (
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\autograd_init
.py", line 1, in
from ._functions import undo_layout, get_inverse_transform_indices
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\autograd_functions.py", line 9, in
import bitsandbytes.functional as F
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\functional.py", line 17, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Users\ehmy.conda\envs\zhixi\lib\site-packages\bitsandbytes\cextension.py", line 22, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues

我反复安装了3天,包括对CUDA Toolskit 降级到11.6,总是出现上述的问题。并且在电脑上,找不到 libcudart.so 这个文件。
我确定CUDA 的驱动 和 CUDA Toolskit 都是安装好的,并且Pytorch也安装完成,可以通过如下脚本验证:

import torch
print(torch.cuda.is_available()) #cuda是否可用
print(torch.version.cuda) #cuda版本
print(torch.cuda.get_device_name(0)) #返回设备索引
print(torch.cuda.device_count()) # 返回GPU的数量
print(torch.cuda.current_device()) # 返回当前设备索引

我不清楚是不是不能在Windows上安装,或者在下述安装步骤之外,还有其他隐含的步骤?

conda create -n zhixi python=3.9 -y
conda activate zhixi
pip install torch==1.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt

谢谢!

建议

建议用7B直接用原始做差。13b原始下载一个,转hf再来一个,下载CaMA差值再来一个,存储空间扛不住,时间成本也很高。参照OopenBuddy的13b模型,只用7b的做差分发速度可以提高很多

预训练最大长度扩展到5k

请教一个问题:llama是2048的最大长度,如果我想训练更长的输入长度,理论直接吧最大长度外扩就行,比如改为5120,这样子直接训用小一点的数据可以训动,但是一旦上了大的语料数据loss就训不动,从2e-5到3e-4都试过,都训不动

尝试部署vllm服务出错

尝试使用给的脚本执行时,程序出错

Traceback (most recent call last):
File "/home/fzq/source/KnowLM/inference/launch_vllm.py", line 229, in
prompter = Prompter(args.template_path)
File "/home/fzq/source/KnowLM/inference/utils.py", line 12, in init
raise ValueError(f"Can't read {template_path}")
ValueError: Can't read data/templates/alpaca.json

修改过脚本
CUDA_VISIBLE_DEVICES=0,3 python inference/launch_vllm.py
--port 8090
--model ./zhixi
--use-np-weights
--max-num-batched-tokens $max_num_batched_tokens
--dtype half
--tensor-parallel-size 2

疑问: 这个--model参数是啥

指令精挑数据集

有计划开源处理过的指令精调数据集吗,或者贴上这些开源部分的链接?

关于pretrain阶段的问题

1.我尝试打印pretrain阶段的dataset,发现input和label是一致的,为什么在这里label不是后移一个呢?
image

image
  1. 我尝试得到model.forward的值,但报错权重矩阵不是2维的,尝试输出model.named_parameters输出的参数tensor为空是为什么?
image image

多机多卡预训练速度慢的问题

环境

主节点:
    GPU: 8 * 3090
    RAM: 500G
node1:
    GPU: 8 * 3090
    RAM: 500G
    
节点之间带宽:100MB

启动脚本

deepspeed --num_gpus 8 --num_nodes 2 --hostfile=host.txt train.py \
    --gradient_accumulation_steps 3 \
    --model_name_or_path /root/.cache/CaMA \
    --model_max_length 1024 \
    --data_path /root/KnowLLM/pretrain/data/dataset \
    --output_dir /root/pretrain-model/ \
    --num_train_epochs 1 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 100 \
    --save_total_limit 1 \
    --learning_rate 1.5e-5 \
    --warmup_steps 300 \
    --logging_steps 1 \
    --report_to "tensorboard" \
    --gradient_checkpointing True \
    --deepspeed configs/config.json \
    --fp16 True \
    --log_on_each_node False \
    --lr_scheduler_type "cosine" \
    --adam_beta1 0.9 --adam_beta2 0.95 --weight_decay 0.1

deepspeed config

{
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 0,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 0,
        "stage3_max_reuse_distance": 0,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "fp16": {
        "enabled": true,
        "auto_cast": false,
        "loss_scale": 0,
        "initial_scale_power": 32,
        "loss_scale_window": 1000,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
          "lr": 1.5e-5,
          "betas": [
            0.9,
            0.95
          ],
          "eps": 1e-8,
          "weight_decay": 0.1
        }
    },
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

问题

使用同样的数据集在单机多卡(8 * 3090)预训练时长 1个小时, 而在多机多卡(一个master,一个node)预训练时长要12个小时多

训练时间预估

请教一下,13B在V100-32G机器上的全量预训练时间,基于您的数据集上,大概是多久

底座模型复现问题

image

花了10kuai q块钱淘宝买的预训练模型,借了一个64GB的服务区,花了一上午时间转了一波,竟然是这个结果。

如果结果不对,还请在README写明,完全无法复现。。。。

合并后的推理问题

你好,合并zhixi-13b-base和lora权重后,使用tgi进行推理,发现回答效果很差,感觉lora没有合并成功的样子。
请问能否把合并后的文件的md5发出来?

ps: vllm没有试,因为显存不够。

合并参考:
python tools/export_hf_checkpoint.py
--base_model data/zhixi-13b
--lora_model data/zhixi-13b-lora
--output_dir data/zhixi-13b-merged

tgi推理:
USE_FLASH_ATTENTION=true CUDA_VISIBLE_DEVICES=0 text-generation-launcher
--port 8080
--model-id $MODEL_NAME
--quantize bitsandbytes
--num-shard 1
--max-input-length 512
--max-total-tokens 1024
--max-batch-total-tokens 4096
--max-batch-prefill-tokens 4096
--max-waiting-tokens 7
--waiting-served-ratio 1.2

input: 你好

输出讲了一个大约两三百字的小故事。

新的13b模型何时更新?

很高兴看到一个新的名字,然而换汤不换药,之前说13b还在继续训练,大概何时可以放出新的预训练模型?
因为下载一次46GB的模型,生命就消耗了50%,为了节省生命,需要明确一下预期时间。另外建议用fp16分发权重。。

请问如何使用modelscope调用zhixi13bfp16

以下是我用modelscope加载的代码:

# encoding: utf-8


from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks


pretrained_model = r"D:\nlp_about\pretrained_model\zhixi-13b-diff-fp16"

model = pipeline(task=Tasks.named_entity_recognition, model=pretrained_model)


text = "请在以下商品名称中提取出品牌,型号,通用名,商品名称。商品名称如下:得力 5561 风琴包 (单位:只) 黄&&得力556112格A6风琴包(计价单位:个)(24个/箱)"

result = model(text)
print(result)

以下是控制台输出:

2023-07-03 21:24:03,015 - modelscope - INFO - PyTorch version 1.12.1+cu116 Found.
2023-07-03 21:24:03,018 - modelscope - INFO - TensorFlow version 2.10.1 Found.
2023-07-03 21:24:03,019 - modelscope - INFO - Loading ast index from C:\Users\k\.cache\modelscope\ast_indexer
2023-07-03 21:24:03,098 - modelscope - INFO - Loading done! Current index file version is 1.7.0, with md5 d39bc57dc459b309a41217cdd2100bcd and a total number of 861 components indexed
2023-07-03 21:24:03,819 - modelscope - WARNING - ('PIPELINES', 'named-entity-recognition', None) not found in ast index file
Traceback (most recent call last):
  File "D:\nlp_about\zhixi-13b-demo\main.py", line 10, in <module>
    model = pipeline(task=Tasks.named_entity_recognition, model=pretrained_model)
  File "C:\Users\k\.conda\envs\torch112\lib\site-packages\modelscope\pipelines\builder.py", line 147, in pipeline
    return build_pipeline(cfg, task_name=task)
  File "C:\Users\k\.conda\envs\torch112\lib\site-packages\modelscope\pipelines\builder.py", line 59, in build_pipeline
    return build_from_cfg(
  File "C:\Users\k\.conda\envs\torch112\lib\site-packages\modelscope\utils\registry.py", line 206, in build_from_cfg
    raise TypeError(
TypeError: type must be a str or valid type, but got <class 'NoneType'>

Process finished with exit code 1

不太清楚应该怎样调用这个13b的模型,在网上也没找到对应的说明教程,modelscope网站也没有找到说明,不知道该如何处理了

lora_web界面回答慢

执行examples/generate_lora_web.py后 web界面回答很慢很慢很慢 回答一次大约要1.30min-2min左右
怎么解决???

为什么需要权重做差

你好,我不理解为什么不直接发布出来全量预训练的权重,而要发布训练完的权重与LLaMa权重的做差呢,非常感谢!

ValueErrorValueError: : ValueErrorFound `optimizer` configured in the DeepSpeed config, but no `scheduler`.

你好,我们在跑pretrain的时候发现以下这个问题,是deepspeed版本的问题吗,请问你们训练时deepspeed的版本是多少
Traceback (most recent call last):
File "/data1/gzm/cryptoGPT/train.py", line 181, in
train()
File "/data1/gzm/cryptoGPT/train.py", line 175, in train
trainer.train(resume_from_checkpoint=model_args.resume_path if model_args.resume_path != None else False)
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/trainer.py", line 1725, in _inner_training_loop
self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps)
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/deepspeed.py", line 361, in deepspeed_init
Traceback (most recent call last):
File "/data1/gzm/cryptoGPT/train.py", line 181, in
train()
File "/data1/gzm/cryptoGPT/train.py", line 175, in train
trainer.train(resume_from_checkpoint=model_args.resume_path if model_args.resume_path != None else False)
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/trainer.py", line 1725, in _inner_training_loop
Traceback (most recent call last):
File "/data1/gzm/cryptoGPT/train.py", line 181, in
train()
File "/data1/gzm/cryptoGPT/train.py", line 175, in train
self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps)
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/deepspeed.py", line 361, in deepspeed_init
trainer.train(resume_from_checkpoint=model_args.resume_path if model_args.resume_path != None else False)
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/trainer.py", line 1645, in train
Traceback (most recent call last):
File "/data1/gzm/cryptoGPT/train.py", line 181, in
return inner_training_loop(
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/trainer.py", line 1725, in _inner_training_loop
train()
File "/data1/gzm/cryptoGPT/train.py", line 175, in train
trainer.train(resume_from_checkpoint=model_args.resume_path if model_args.resume_path != None else False)
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/trainer.py", line 1645, in train
self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps)
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/deepspeed.py", line 361, in deepspeed_init
return inner_training_loop(
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/trainer.py", line 1725, in _inner_training_loop
self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps)
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/deepspeed.py", line 361, in deepspeed_init
optimizer, lr_scheduler = deepspeed_optim_sched(
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/deepspeed.py", line 307, in deepspeed_optim_sched
optimizer, lr_scheduler = deepspeed_optim_sched(
optimizer, lr_scheduler = deepspeed_optim_sched(optimizer, lr_scheduler = deepspeed_optim_sched(

File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/deepspeed.py", line 307, in deepspeed_optim_sched
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/deepspeed.py", line 307, in deepspeed_optim_sched
File "/data/gzm/anaconda3/envs/cryptoGPT/lib/python3.9/site-packages/transformers/deepspeed.py", line 307, in deepspeed_optim_sched
raise ValueError(
ValueError: Found optimizer configured in the DeepSpeed config, but no scheduler. Please configure a scheduler in the DeepSpeed config.
raise ValueError(raise ValueError(

raise ValueError(
ValueErrorValueError: : ValueErrorFound optimizer configured in the DeepSpeed config, but no scheduler. Please configure a scheduler in the DeepSpeed config.Found optimizer configured in the DeepSpeed config, but no scheduler. Please configure a scheduler in the DeepSpeed config.:

Found optimizer configured in the DeepSpeed config, but no scheduler. Please configure a scheduler in the DeepSpeed config.
[2023-06-28 20:07:32,516] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2594513
[2023-06-28 20:07:32,517] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2594514
[2023-06-28 20:07:32,907] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2594515
[2023-06-28 20:07:32,922] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2594516
[2023-06-28 20:07:32,931] [ERROR] [launch.py:321:sigkill_handler] ['/data/gzm/anaconda3/envs/cryptoGPT/bin/python3.9', '-u', 'train.py', '--local_rank=3', '--model_name_or_path', '../pretrain_model/llama-7b-huggyllama/', '--model_max_length', '1024', '--data_path', 'mydataset/undergraduate_program', '--output_dir', 'output', '--num_train_epochs', '1', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '100', '--save_total_limit', '1', '--learning_rate', '1.5e-5', '--warmup_steps', '300', '--logging_steps', '1', '--report_to', 'tensorboard', '--gradient_checkpointing', 'True', '--deepspeed', 'configs/config.json', '--fp16', 'True', '--log_on_each_node', 'False', '--lr_scheduler_type', 'cosine', '--adam_beta1', '0.9', '--adam_beta2', '0.95', '--weight_decay', '0.1'] exits with return code = 1

你好,指令微调,单机多卡出现如下错误

启动脚本

export NCCL_P2P_DISABLE=1
export CUDA_LAUNCH_BLOCKING=1

CUDA_VISIBLE_DEVICES=0,1 python finetune/lora/finetune.py \
    --data_path "/home/qizhen/CaMA/data/" \
    --base_model "/home/qizhen/pre_models/cama/cama" \
    --batch_size 4 \
    --micro_batch_size 4 \
    --num_epochs 8 \
    --learning_rate 3e-4 \
    --cutoff_len 512 \
    --lora_r 16 \
    --lora_alpha 32 \
    --lora_dropout 0.05 \
    --save_steps 1 \
    --save_total_limit 20 \
    --eval_steps 1 \
    --logging_steps 5


错误日志

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 120
CUDA SETUP: Loading binary /home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda120.so...
Training Alpaca-LoRA model with params:
base_model: /home/qizhen/pre_models/cama/cama
data_path: /home/qizhen/CaMA/data/
output_dir: ./checkpoint
batch_size: 4
micro_batch_size: 4
num_epochs: 8
learning_rate: 0.0003
cutoff_len: 512
val_set_size: 2000
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules: ['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'down_proj', 'up_proj']
train_on_inputs: False
group_by_length: False
wandb_project: 
wandb_run_name: 
wandb_watch: 
wandb_log_model: 
resume_from_checkpoint: False
prompt template: alpaca


Loading checkpoint shards: 100%|██████████| 6/6 [00:39<00:00,  6.61s/it]
Found cached dataset json (/home/qizhen/.cache/huggingface/datasets/json/default-40d7b2f86cd0b651/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
trainable params: 62586880 || all params: 13078451200 || trainable%: 0.4785496313202591
data includes: ['/home/qizhen/CaMA/data/sft_all.json']

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 43.45it/s]
Loading cached split indices for dataset at /home/qizhen/.cache/huggingface/datasets/json/default-40d7b2f86cd0b651/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-c0aaf1542311ded0.arrow and /home/qizhen/.cache/huggingface/datasets/json/default-40d7b2f86cd0b651/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4/cache-dea9c392c84db597.arrow

Map (num_proc=8): 100%|█████████▉| 744342/744391 [01:52<00:00, 1294.17 examples/s]
                                
Map (num_proc=8):  97%|█████████▋| 1931/2000 [00:00<00:00, 5187.79 examples/s]
                                                                              

  0%|          | 0/1488784 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1670525539683/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [44,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1670525539683/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [44,0,0], thread: [1,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1670525539683/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [20,0,0], thread: [127,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
Traceback (most recent call last):
  File "/home/qizhen/CaMA/finetune/lora/finetune.py", line 296, in <module>
    fire.Fire(train)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/qizhen/CaMA/finetune/lora/finetune.py", line 286, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/transformers/trainer.py", line 2699, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/transformers/trainer.py", line 2731, in compute_loss
    outputs = model(**inputs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/peft/peft_model.py", line 530, in forward
    return self.base_model(
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 204, in forward
    query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
  File "/home/qizhen/anaconda3/envs/cama/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 137, in apply_rotary_pos_emb
    sin = torch.gather(sin.repeat(gather_indices.shape[0], 1, 1, 1), 2, gather_indices)
RuntimeError: CUDA error: device-side assert triggered

  0%|          | 0/1488784 [00:13<?, ?it/s]

合并lora参数错误

尝试使用脚本合并模型zhixi与lora参数出错,日志入下:

python tools/export_hf_checkpoint.py --base_model zhixi --lora_model lora --output_dir zhixi-lora

日志如下:
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

CUDA SETUP: CUDA runtime path found: /home/fzq/anaconda3/envs/zhixi/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /home/fzq/anaconda3/envs/zhixi/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda116.so...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [03:19<00:00, 33.32s/it]
Traceback (most recent call last):
File "/home/fzq/anaconda3/envs/zhixi/lib/python3.9/site-packages/peft/peft_model.py", line 287, in getattr
return super().getattr(name) # defer to nn.Module's logic
File "/home/fzq/anaconda3/envs/zhixi/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'PeftModelForCausalLM' object has no attribute 'merge_and_unload'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/fzq/anaconda3/envs/zhixi/lib/python3.9/site-packages/peft/tuners/lora.py", line 204, in getattr
return super().getattr(name) # defer to nn.Module's logic
File "/home/fzq/anaconda3/envs/zhixi/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LoraModel' object has no attribute 'merge_and_unload'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/fzq/source/KnowLM/tools/export_hf_checkpoint.py", line 60, in
main()
File "/home/fzq/source/KnowLM/tools/export_hf_checkpoint.py", line 41, in main
lora_model = lora_model.merge_and_unload()
File "/home/fzq/anaconda3/envs/zhixi/lib/python3.9/site-packages/peft/peft_model.py", line 289, in getattr
return getattr(self.base_model, name)
File "/home/fzq/anaconda3/envs/zhixi/lib/python3.9/site-packages/peft/tuners/lora.py", line 206, in getattr
return getattr(self.model, name)
File "/home/fzq/anaconda3/envs/zhixi/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LlamaForCausalLM' object has no attribute 'merge_and_unload'

关于eval

您好,

我只看到了一些主观的评测,不知道你们有没有尝试去跑一些benchmark?
因为我现在看来其实几个llama中文适配的语料都大差不差,百科啊新闻啊这种。你们和他们的区别就是没动词表,且增量的时候混了英文语料。那么我就很好奇这么做从eval的数字上和其他的方法相比如何。

谢谢!

预训练中deepspeed版本问题

image
想问一下deepspeed版本

deepspeed train.py     --model_name_or_path /public/home/chenhao/models/llama-13b-hf     --model_max_length 1024     --data_path ./data/data/data     --output_dir ./output     --num_train_epochs 1     --per_device_train_batch_size 16     --per_device_eval_batch_size 1     --evaluation_strategy "no"     --save_strategy "steps"     --save_steps 100     --save_total_limit 1     --learning_rate 1.5e-5     --warmup_steps 300     --logging_steps 1     --report_to "tensorboard"     --gradient_checkpointing True     --deepspeed configs/config.json     --fp16 True     --log_on_each_node False     --lr_scheduler_type "cosine"     --adam_beta1 0.9 --adam_beta2 0.95 --weight_decay 0.1

权重恢复MD5不一致

您好,我在恢复权重的时候,LLAMA的MD5一致,转换也没遇到问题,zhixi-13B-diff-fp16的MD5也一致。但是恢复的权重MD5和readme中的不一致,请问可以怎么解决呢?

lora微调训练时间

您好,想请教一下在您的实验中,lora微调的数据集大小、batch size、token序列长度、训练速度和时长等相关信息,谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.