kent0n-li / chatdoctor Goto Github PK

License: Apache License 2.0

Python 100.00%

chatdoctor's Introduction

Hi 👋

Welcome to my profile! I'm a Ph.D. student at University of Texas Southwestern Medical Center.

📫 How to reach me: [email protected]
Homepage: https://www.yunxiangli.top/
Google Scholar: https://scholar.google.com/citations?user=evbcKz8AAAAJ

chatdoctor's People

Contributors

Stargazers

Watchers

Forkers

xzllxls yankuo111 overbestfitting zqiang2008 shengzhang90 rogervaas abimbolaobadare clcarwin chizhu maimate-dev dumpmemory shujiahuang bityangke rudrho whitefu roacherm bananemure worthlesspixels atmomomo starlab-llm kyongpiltae ketsakda mohsinmahmood12 sagorbrur mdp0999 rake93 younes-ammari getkksingh1 susano dhinugraha azizullah2017 bunyaminkeles xbsd waqas700297 drelvischen vineedkaladharan graphgrailai linuer pratiksinghchauhan mpadmaraj leegang fossil-explorer jayagami piotrlnordea jmarinaqi dnzengou khaghanijavad miyaken1003 iamkamleshrangi ghomsi airwide-chatgpt standardgalactic ibrahimjamil lacrosse91 ykyou souravzzz appbootup vcip2015 farazbhatti mikigom michael-wzhu ftgreat waldonhendricks facumartinarguello nguyenducnhaty sanzexstha touristshaun williamtran29 zaraiah anylee2021 hothit1 duytinvo samhashemica rhinojosa mambo-commit mirogannan thomas-pegot smallw00d2211 qidouhai xunyuw ttwoosta xiaoqingwang ttagu99 iakirago techthiyanes fr3027 pithematic fxlp shoopcola paulsunnypark edsun3941 vuisme saharmor briup1 jianantian techventurebuilder gavin123 ekharitonov michiko5173 ailabteam

chatdoctor's Issues

transformer source link is invalid

https://github.com/zphang/transformers.git@68d640f7c368bcaaaecfc678f11908ebbd3d6176 in requirements.txt cannot reach

How to import peft module？

https://github.com/Kent0n-Li/ChatDoctor/blob/c818462895eaa0dfdb3b1c4da1a84f49d04514ad/train_lora.py#L16C2-L16C2

I found there is a peft module imported, but it is not like the peft module from huggingface transformers package. (or I didn't find it in a specific package version)

Can you give me some hints about how to import the peft and its version detail?

Where can we find the InstructorDoctor-200k training dataset?

The REAMDE mentioned "We uploaded a larger training data, InstructorDoctor-200k."

Where can we find this dataset file in this repo? Many thanks.

If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Has anyone encountered this problem?

Unable to load weights from pytorch checkpoint file for './pretrained/pytorch_model-00002-of-00003.bin' at
'./pretrained/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint,
please set from_tf=True.

request for the pretrained weights

hello, I have filled out the link several times, but I do not receive related weight files. Is there something missing here? (I had check my spam) My email is [email protected], could you please send me the pre-trained weights? Thanks a lot.

AttributeError: module transformers has no attribute LLaMATokenizer - transformer update fail

i have still an issue, it can not make work the AttributeError: module transformers has no attribute LLaMATokenizer

any idea how to fix this?

Training from scratch

Hi!

Can we train the model from scratch? Do you have plans to release the training code as well without loading pre-trained weights?
Thanx

will you plan to use Chinese data to finetune a chatdoctor?

i'd like to see whether Chinese data works in LLAMA finetune

Could I know details about how do you evaluate your model performance?

Thanks for such a great job.In the paper, there is not that much details about your evaluate model performance.can we get the efvaluate dataset,evaluater rules,and detailed model performence result?thank you

Can the model refuse to answer when it cannot guarantee the result？

Thanks for your work，
可否在输出结果时加入判断，若置信度较低或不曾在训练集见过类似的问题就让模型拒绝回答问题？

could it be used in Chinese ?

as the title,
LLM is always used for English very well , but not good for Chinese .
So how about the performance of Chinese ?

AttributeError: module transformers has no attribute LLaMATokenizer

run python chat.py, throw this error：
AttributeError: module transformers has no attribute LLaMATokenizer

What's the code for "how to utilize conversation demonstrations synthesized via ChatGPT"?

It seems that there is not code for " utilize conversation demonstrations synthesized via ChatGPT to finetune the LLaMA model "
in the code , I see that your use HealthCareMagic-200k.json ,not the "5k generated conversations between patients and physicians from ChatGPT [GenMedGPT-5k]" ,
how to utilize conversation demonstrations synthesized via ChatGPT ? Can you show us the code for this ?

Can you provide a new link for demo? The one you provided is not working.

Hello, the link you provided, https://huggingface.co/spaces/kenton-li/ChatDoctor, does not work. Can you provide a new one?

model not found in pretrained section

Error :

(base) hemang@hemang-HP-Pavilion-g6-Notebook-PC:~/Documents/GitHub/ChatDoctor$ python3.11 chat.py
2023-03-30 16:16:25.135057: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-30 16:16:26.061195: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading ./pretrained/...
/home/hemang/.local/lib/python3.11/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
gpu_count 0
Loading checkpoint shards:   0%|                                                    | 0/3 [00:00<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/hemang/.local/lib/python3.11/site-packages/transformers/modeling_utils.py:415 in           │
│ load_state_dict                                                                                  │
│                                                                                                  │
│    412 │   │   │   )                                                                             │
│    413 │   │   return safe_load_file(checkpoint_file)                                            │
│    414 │   try:                                                                                  │
│ ❱  415 │   │   return torch.load(checkpoint_file, map_location="cpu")                            │
│    416 │   except Exception as e:                                                                │
│    417 │   │   try:                                                                              │
│    418 │   │   │   with open(checkpoint_file) as f:                                              │
│                                                                                                  │
│ /home/hemang/.local/lib/python3.11/site-packages/torch/serialization.py:791 in load              │
│                                                                                                  │
│    788 │   if 'encoding' not in pickle_load_args.keys():                                         │
│    789 │   │   pickle_load_args['encoding'] = 'utf-8'                                            │
│    790 │                                                                                         │
│ ❱  791 │   with _open_file_like(f, 'rb') as opened_file:                                         │
│    792 │   │   if _is_zipfile(opened_file):                                                      │
│    793 │   │   │   # The zipfile reader is going to advance the current file position.           │
│    794 │   │   │   # If we want to actually tail call to torch.jit.load, we need to              │
│                                                                                                  │
│ /home/hemang/.local/lib/python3.11/site-packages/torch/serialization.py:271 in _open_file_like   │
│                                                                                                  │
│    268                                                                                           │
│    269 def _open_file_like(name_or_buffer, mode):                                                │
│    270 │   if _is_path(name_or_buffer):                                                          │
│ ❱  271 │   │   return _open_file(name_or_buffer, mode)                                           │
│    272 │   else:                                                                                 │
│    273 │   │   if 'w' in mode:                                                                   │
│    274 │   │   │   return _open_buffer_writer(name_or_buffer)                                    │
│                                                                                                  │
│ /home/hemang/.local/lib/python3.11/site-packages/torch/serialization.py:252 in __init__          │
│                                                                                                  │
│    249                                                                                           │
│    250 class _open_file(_opener):                                                                │
│    251 │   def __init__(self, name, mode):                                                       │
│ ❱  252 │   │   super().__init__(open(name, mode))                                                │
│    253 │                                                                                         │
│    254 │   def __exit__(self, *args):                                                            │
│    255 │   │   self.file_like.close()                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: [Errno 2] No such file or directory: './pretrained/pytorch_model-00001-of-00003.bin'

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/hemang/Documents/GitHub/ChatDoctor/chat.py:43 in <module>                                  │
│                                                                                                  │
│    40 │                                                                                          │
│    41 │   generator = model.generate                                                             │
│    42                                                                                            │
│ ❱  43 load_model("./pretrained/")                                                                │
│    44                                                                                            │
│    45 First_chat = "ChatDoctor: I am ChatDoctor, what medical questions do you have?"            │
│    46 print(First_chat)                                                                          │
│                                                                                                  │
│ /home/hemang/Documents/GitHub/ChatDoctor/chat.py:28 in load_model                                │
│                                                                                                  │
│    25 │   print('gpu_count', gpu_count)                                                          │
│    26 │                                                                                          │
│    27 │   tokenizer = transformers.LLaMATokenizer.from_pretrained(model_name)                    │
│ ❱  28 │   model = transformers.LLaMAForCausalLM.from_pretrained(                                 │
│    29 │   │   model_name,                                                                        │
│    30 │   │   #device_map=device_map,                                                            │
│    31 │   │   #device_map="auto",                                                                │
│                                                                                                  │
│ /home/hemang/.local/lib/python3.11/site-packages/transformers/modeling_utils.py:2630 in          │
│ from_pretrained                                                                                  │
│                                                                                                  │
│   2627 │   │   │   │   mismatched_keys,                                                          │
│   2628 │   │   │   │   offload_index,                                                            │
│   2629 │   │   │   │   error_msgs,                                                               │
│ ❱ 2630 │   │   │   ) = cls._load_pretrained_model(                                               │
│   2631 │   │   │   │   model,                                                                    │
│   2632 │   │   │   │   state_dict,                                                               │
│   2633 │   │   │   │   loaded_state_dict_keys,  # XXX: rename?                                   │
│                                                                                                  │
│ /home/hemang/.local/lib/python3.11/site-packages/transformers/modeling_utils.py:2939 in          │
│ _load_pretrained_model                                                                           │
│                                                                                                  │
│   2936 │   │   │   │   # Skip the load for shards that only contain disk-offloaded weights when  │
│   2937 │   │   │   │   if shard_file in disk_only_shard_files:                                   │
│   2938 │   │   │   │   │   continue                                                              │
│ ❱ 2939 │   │   │   │   state_dict = load_state_dict(shard_file)                                  │
│   2940 │   │   │   │                                                                             │
│   2941 │   │   │   │   # Mistmatched keys contains tuples key/shape1/shape2 of weights in the c  │
│   2942 │   │   │   │   # matching the weights in the model.                                      │
│                                                                                                  │
│ /home/hemang/.local/lib/python3.11/site-packages/transformers/modeling_utils.py:418 in           │
│ load_state_dict                                                                                  │
│                                                                                                  │
│    415 │   │   return torch.load(checkpoint_file, map_location="cpu")                            │
│    416 │   except Exception as e:                                                                │
│    417 │   │   try:                                                                              │
│ ❱  418 │   │   │   with open(checkpoint_file) as f:                                              │
│    419 │   │   │   │   if f.read(7) == "version":                                                │
│    420 │   │   │   │   │   raise OSError(                                                        │
│    421 │   │   │   │   │   │   "You seem to have cloned a repository without having git-lfs ins  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: [Errno 2] No such file or directory: './pretrained/pytorch_model-00001-of-00003.bin'

Demo web page is not working

Hi,

Thanks to share this interesting repository. I'm working in HealthTech. The https://huggingface.co/spaces/kenton-li/ChatDoctor link in the readme is not working. I have a 404 error from Hugging Face website.
Thanks in advance.

Joel

pretrained/pytorch_model-00001-of-00003.bin FileNotFoundError

Hi, thank you for this model!
I am trying to build this app and getting this error message:

File "/home/ChatDoctor-main/env_doct/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2939, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ChatDoctor-main/env_doct/lib/python3.11/site-packages/transformers/modeling_utils.py", line 418, in load_state_dict
with open(checkpoint_file) as f:
^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: './pretrained/pytorch_model-00001-of-00003.bin'

Already filled this form: link. :
https://forms.office.com/Pages/ResponsePage.aspx?id=lYZBnaxxMUy1ssGWyOw8ij06Cb8qnDJKvu2bVpV1-ANUMDIzWlU0QTUxN0YySFROQk9HMVU0N0xJNC4u

Can you please share this file 'pytorch_model-00001-of-00003.bin'

thanks in advance

Why do the instructions start with "If you are a doctor"

Thank you for your interesting work.
In the chatdoctor5k.json and chatdoctor200k.json I see that the instructions start with "If you are a doctor".
I am curious why the instructions do not start with "You are a doctor".
Is this a common way to perform the alpaca instruction fine-tuning?

Does it support Chinese question？

pretrained/pytorch_model-00001-of-00003.bin

Greetings, and thank you for this model!

Please i cant find or know what to do to find and install this file 'pytorch_model-00001-of-00003.bin'

If you can guide me please, thanks in advance

DDP expects same model across all ranks, but Rank 0 has 128 params, while rank 1 has inconsistent 0 params.

Hi，I met a problem that said ranks have different model. Followings are details.

./train_lora.sh
WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in y
our application as needed.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /root/anaconda3/envs/chat-doctor/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
bin /root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so

Finetuning model with params:
base_model: /disk2/data/xk/retr-llm/files/model/llama-7b/
data_path: /disk2/data/xk/retr-llm/files/datasets/mental_health_chatbot_dataset.json
output_dir: ./lora-chatDoctor_bs192_Mbs24_ep3_len512_lr3e-5_fromAlpacaLora
batch_size: 192
micro_batch_size: 24
num_epochs: 3
learning_rate: 3e-05
cutoff_len: 256
val_set_size: 120
use_gradient_checkpointing: False
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: None
bottleneck_size: 256
non_linearity: tanh
adapter_dropout: 0.0
use_parallel_adapter: False
use_adapterp: False
train_on_inputs: True
scaling: 1.0
adapter_name: lora
target_modules: None
group_by_length: False
wandb_project:
wandb_run_name:
wandb_watch:
wandb_log_model:
resume_from_checkpoint: None
Loading checkpoint shards: 100%|##########| 33/33 [00:12<00:00, 2.58it/s]
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Map: 100%|##########| 52/52 [00:00<00:00, 687.22 examples/s]
Map: 100%|##########| 120/120 [00:00<00:00, 765.56 examples/s]
[E ProcessGroupNCCL.cpp:828] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807082 milliseconds before timi
ng out.
Traceback (most recent call last):
File "train_lora.py", line 353, in
fire.Fire(train)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "train_lora.py", line 299, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1749, in _inner_training_loop
model = self._wrap_model(self.model_wrapped)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1569, in _wrap_model
model = nn.parallel.DistributedDataParallel(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 674, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/torch/distributed/utils.py", line 118, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: DDP expects same model across all ranks, but Rank 0 has 128 params, while rank 1 has inconsistent 0 params.
[E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplet
e data.
[E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807082 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:828] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807414 milliseconds before timi
ng out.
Traceback (most recent call last):
File "train_lora.py", line 353, in
fire.Fire(train)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "train_lora.py", line 299, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1749, in _inner_training_loop
model = self._wrap_model(self.model_wrapped)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1569, in _wrap_model
model = nn.parallel.DistributedDataParallel(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 674, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/torch/distributed/utils.py", line 118, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: DDP expects same model across all ranks, but Rank 3 has 128 params, while rank 0 has inconsistent 0 params.
[E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplet
e data.
[E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807414 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:828] [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807716 milliseconds before timi
ng out.

my environment：
GPU：8 X A100 80GB
pytorch version：2.0.1

How can I solve this bug？Thanks!

ImportError: cannot import name 'openai_object' from 'openai'

Traceback (most recent call last):
File "/content/ChatDoctor/train.py", line 25, in
import utils
File "/content/ChatDoctor/utils.py", line 15, in
from openai import openai_object
ImportError: cannot import name 'openai_object' from 'openai' (/usr/local/lib/python3.10/dist-packages/openai/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train.py", line 25, in
import utils
File "/content/ChatDoctor/utils.py", line 15, in
from openai import openai_object
ImportError: cannot import name 'openai_object' from 'openai' (/usr/local/lib/python3.10/dist-packages/openai/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train.py", line 25, in
import utils
File "/content/ChatDoctor/utils.py", line 15, in
from openai import openai_object
ImportError: cannot import name 'openai_object' from 'openai' (/usr/local/lib/python3.10/dist-packages/openai/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train.py", line 25, in
import utils
File "/content/ChatDoctor/utils.py", line 15, in
from openai import openai_object
ImportError: cannot import name 'openai_object' from 'openai' (/usr/local/lib/python3.10/dist-packages/openai/init.py)
[2024-02-09 11:04:21,257] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 14644) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/content/ChatDoctor/train.py FAILED

Failures:
[1]:
time : 2024-02-09_11:04:21
host : 10ca0fca3068
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 14645)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-02-09_11:04:21
host : 10ca0fca3068
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 14646)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-02-09_11:04:21
host : 10ca0fca3068
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 14647)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-02-09_11:04:21
host : 10ca0fca3068
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 14644)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Dataset License

Hello,
Is there a specific license for the associated datasets?

How do you create format_dataset.csv?

Hi, @Kent0n-Li in the paper you mentioned that you used MedlinePlus as database to create the format_dataset.csv, but I found if directly typing the name in format_dataset.csv (e.g.: Panic disorder) in MedlinePlus system, there are multiple results and symptoms may not be complete. Just wondering how you create this file? Is there any scripts or you just manually select it? Can you give an example of how you utilize MedlinePlus system to derive Symptom,reason, TestsAndProcedures, commonMedications in format_dataset.csv(e.g.: Panic disorder )

About Instruction data generation

The first step in building a physician-patient conversation dataset is to collect the disease database that serves as the gold standard. Therefore, we collected and organized a database of diseases, which contains about 700 diseases with their relative symptoms, medical tests, and recommended medications. To train high-quality conversation models on an academic budget, we input each message from the disease database separately as a prompt into the ChatGPT API to automatically generate instruction data. It is worth noting that our prompts to the ChatGPT API contain the gold standard of diseases and symptoms, and drugs, so our fine-tuned ChatDoctor is not only able to achieve ChatGPT's conversational fluency but also higher diagnostic accuracy compared to ChatGPT. We finally collected 5K doctor-patient conversation instructions and named it InstructorDoctor-5K.

I'm confused by this process. Can anyone explain it more precisely?

No such file or directory: './pretrained/pytorch_model-00001-of-00003.bin'

Got this error while running chat.py file

https://huggingface.co/spaces/ChatDoctor/ChatDoctor not working

Giving 404 error @saharmor @Kent0n-Li

RuntimeError: Failed to import transformers.models.llama.modeling_llama

I'm trying to load the model with the checkpoint but it's showing this error.

What is the purpose of format_data.csv?

I haven't seen where you load the format_data.csv, so I am wondering what is the purpose of the file?

Can we run this on Google Colab ?

Why did you chose to train in two steps?

Thanks for your sharing, your attempt is very interesting and valuable.

However, I have some questions about the training process.

I notice that ChatDoctor is first trained using 52K instruction-following data from provided by Stanford Alpaca, and then finetuned on the your specific data.

Why not finetune the model using a mixture of two parts of the data?

What is the insight of this finetuning process?
What is the insight behind this finetuning model?

Have you ever tried to train with two pieces of data mixed together?

Loading checkpoint shards killed

for --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer>

Hello, if i want to follow your code to fine-tune based on llama, what file should i prepare for this --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer>? Thank you very much.

Colab run out of ram memory ?

I cannot load the model because Colab Free does not have enough ram memory. Does anyone have a solution for this ?
Thanks

Doesn't LLaMA have basic conversational capabilities? Why fine-tune it based on Alpaca to get basic conversational skills?

generate duplicate text

I had trained a ChatDoctor using lora following your instruction, and evaluate in iCliniq-10k without providing external documents (e.g. wikpedia or medical databse). The problem is the model generates duplicate context as the below example:

Patient: 'Hi doctor,My wife missed her periods and we doubt she is pregnant. Our first kid is just 1 year old now and it was a Cesarean. We are not in a position to have a second child now. Her LMP was 22nd of last month. I read that Mifepristone can induce abortion. Is it a safe drug? Do we have to consult doctor for taking this drug? What are all the possible side effects? Is that drug 100 percent reliable? Is there any other options? Whether this drug is available without doctor\s prescription? Thank you.'
Predictions: 1. Mifepristone is a safe drug. 2. Mifepristone is a safe drug. 3. Mifepristone is a safe drug. 4. Mifepristone is a safe drug. 5. Mifepristone is a safe drug. 6. Mifepristone is a safe drug. 7. Mifepristone is a safe drug. 8. Mifepristone is a safe drug. 9. Mifepristone is a safe drug. 10. Mifepristone is a safe drug. 11. Mifepristone is a safe drug. 12. Mifepristone is a safe drug. 13. Mifepristone is a safe drug. 14. Mifepristone is a safe drug. 15. Mifepristone is a safe drug. 16. Mifepristone is a safe drug. 17. Mifepristone is a safe drug. 18. Mifepristone is a safe drug. 19. Mifepristone is a safe drug.
ChatDoctor: 1. No, mifepristone is not available without doctors prescription. 2. It is a safe medication, but as with any medications there may be some side effects which include nausea, vomiting, headache, diarrhea etc. These symptoms usually subside within 48 hours after ingestion.3. Yes, it is 100% reliable if taken under supervision by a trained gynecologist.4. There are many other options for termination of early pregnancy like medical methods (like MTP kit) or surgical methods (like suction evacuation). But these need to be done under proper guidance of a qualified Gynecologist.'

In my genertaed texts, this phenomeon is pretty common. Did you encounter this problem before? How did you solve this? thx!

关于训练lora的问题

在训练lora的时候, train_on_inputs 这个参数为什么设置成True呢？

Question about output in HealthCareMagic-100k.json

Hi, I noticed that a lot of "Chat Doctor" appears in the outputs of HealthCareMagic-100k.json. For example:
""Hi thanks for contacting Chat Doctor ... Your brother have both hepatitis b and c positive...."
"Hi and welcome to Chat Doctor."
"Hi and welcome to Chat Doctor. Thank you for your query. I am Chat Doctor.."

I wonder if that cost by some post processing? Is there any data without these words?

What prompt was used to generate the dataset?

Hello,

I am curious to know which prompt you used to generate the dataset. I couldnt find it in the utils.py`.

Also you might want to remove your OpenAI API key from the utils.py.

difference between the paper and code

It seems that in your paper the train dataset is 'InstructorDoctor-205k' but in this repo, from the training command, the dataset is 'HealthCareMagic-100k.json'
In the paper, the training was 'fine tuning on nstructorDoctor-205k (seems to be one step?)', but in this repo: 'Our model was firstly be fine-tuned by Stanford Alpaca's data to have some basic conversational capabilities.' does it mean the repo contains updated method?
Training time difference: paper - 18 hours. repo - 30 minutes
Can you help to provide some clarifications?
Thanks!

Abandoned (core dumped)

Hello, I am a college student reading your paper. My server GPU is only 48G, does that mean I don't have enough memory in my GPU to do the inference

wandb error

i got an error when run train.py :
wandb: ERROR api_key not configured (no-tty). call wandb.login(key=[your_api_key])

Could I know how do you evaluate your model performance?

See title. What's the dataset? Did you run any evaluation steps?

ImportError: cannot import name 'BottleneckConfig' from 'peft'

File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402Traceback (most recent call last):

ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
[2024-02-09 11:16:17,756] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 17760) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/content/ChatDoctor/train_lora.py FAILED

Failures:
[1]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 17761)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 17762)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 17763)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 17764)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 17765)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 17760)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Generation got frozen

staing on 0%| for about 20 mins now.
Is this okay?

AttributeError: module transformers has no attribute LLaMATokenizer

Can`t work

gpu_count 0 | Cuda issue

As you can see im in Conda Powershell as Admin. I have installed PyTorch 2.0 with the updated torchvision for acceleration, along with the required downloads for transformers and tokenizer. The models load as well from the pretrained folder. Additionally, I have installed the CUDA toolkit 11.7 with drivers, and my GTX 1060 GPU 6GVRam is listed as available for computing. However, when attempting to activate CUDA, it shows as 0 or false. I am currently in the correct Conda environment and CUDA is installed and activated, but the issue persists. I noticed in the chat.py file that the model tokenizer shows as 8-bit floats to be disabled, which leads me to wonder if it is related. Also to mention that the tokenizers LLama name is written falsely perhaps, because i have found a thread on github on it, here is the link treadon/llama-7b-example#1 (comment) . There may be a typo error in your code as well in the chat.py file. I have been working on this issue for 3 days and would greatly appreciate any help. Thank you.