Giter VIP home page Giter VIP logo

chatdoctor's Introduction

chatdoctor's People

Contributors

kent0n-li avatar saharmor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatdoctor's Issues

request for the pretrained weights

hello, I have filled out the link several times, but I do not receive related weight files. Is there something missing here? (I had check my spam) My email is [email protected], could you please send me the pre-trained weights? Thanks a lot.

Training from scratch

Hi!

Can we train the model from scratch? Do you have plans to release the training code as well without loading pre-trained weights?
Thanx

Can the model refuse to answer when it cannot guarantee the result๏ผŸ

Thanks for your work๏ผŒ
ๅฏๅฆๅœจ่พ“ๅ‡บ็ป“ๆžœๆ—ถๅŠ ๅ…ฅๅˆคๆ–ญ๏ผŒ่‹ฅ็ฝฎไฟกๅบฆ่พƒไฝŽๆˆ–ไธๆ›พๅœจ่ฎญ็ปƒ้›†่ง่ฟ‡็ฑปไผผ็š„้—ฎ้ข˜ๅฐฑ่ฎฉๆจกๅž‹ๆ‹’็ปๅ›ž็ญ”้—ฎ้ข˜๏ผŸ

could it be used in Chinese ?

as the title,
LLM is always used for English very well , but not good for Chinese .
So how about the performance of Chinese ?

What's the code for "how to utilize conversation demonstrations synthesized via ChatGPT"?

It seems that there is not code for " utilize conversation demonstrations synthesized via ChatGPT to finetune the LLaMA model "
in the code , I see that your use HealthCareMagic-200k.json ,not the "5k generated conversations between patients and physicians from ChatGPT [GenMedGPT-5k]" ,
how to utilize conversation demonstrations synthesized via ChatGPT ? Can you show us the code for this ?

model not found in pretrained section

Error :

(base) hemang@hemang-HP-Pavilion-g6-Notebook-PC:~/Documents/GitHub/ChatDoctor$ python3.11 chat.py
2023-03-30 16:16:25.135057: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-30 16:16:26.061195: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading ./pretrained/...
/home/hemang/.local/lib/python3.11/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
gpu_count 0
Loading checkpoint shards:   0%|                                                    | 0/3 [00:00<?, ?it/s]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /home/hemang/.local/lib/python3.11/site-packages/transformers/modeling_utils.py:415 in           โ”‚
โ”‚ load_state_dict                                                                                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    412 โ”‚   โ”‚   โ”‚   )                                                                             โ”‚
โ”‚    413 โ”‚   โ”‚   return safe_load_file(checkpoint_file)                                            โ”‚
โ”‚    414 โ”‚   try:                                                                                  โ”‚
โ”‚ โฑ  415 โ”‚   โ”‚   return torch.load(checkpoint_file, map_location="cpu")                            โ”‚
โ”‚    416 โ”‚   except Exception as e:                                                                โ”‚
โ”‚    417 โ”‚   โ”‚   try:                                                                              โ”‚
โ”‚    418 โ”‚   โ”‚   โ”‚   with open(checkpoint_file) as f:                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/hemang/.local/lib/python3.11/site-packages/torch/serialization.py:791 in load              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    788 โ”‚   if 'encoding' not in pickle_load_args.keys():                                         โ”‚
โ”‚    789 โ”‚   โ”‚   pickle_load_args['encoding'] = 'utf-8'                                            โ”‚
โ”‚    790 โ”‚                                                                                         โ”‚
โ”‚ โฑ  791 โ”‚   with _open_file_like(f, 'rb') as opened_file:                                         โ”‚
โ”‚    792 โ”‚   โ”‚   if _is_zipfile(opened_file):                                                      โ”‚
โ”‚    793 โ”‚   โ”‚   โ”‚   # The zipfile reader is going to advance the current file position.           โ”‚
โ”‚    794 โ”‚   โ”‚   โ”‚   # If we want to actually tail call to torch.jit.load, we need to              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/hemang/.local/lib/python3.11/site-packages/torch/serialization.py:271 in _open_file_like   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    268                                                                                           โ”‚
โ”‚    269 def _open_file_like(name_or_buffer, mode):                                                โ”‚
โ”‚    270 โ”‚   if _is_path(name_or_buffer):                                                          โ”‚
โ”‚ โฑ  271 โ”‚   โ”‚   return _open_file(name_or_buffer, mode)                                           โ”‚
โ”‚    272 โ”‚   else:                                                                                 โ”‚
โ”‚    273 โ”‚   โ”‚   if 'w' in mode:                                                                   โ”‚
โ”‚    274 โ”‚   โ”‚   โ”‚   return _open_buffer_writer(name_or_buffer)                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/hemang/.local/lib/python3.11/site-packages/torch/serialization.py:252 in __init__          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    249                                                                                           โ”‚
โ”‚    250 class _open_file(_opener):                                                                โ”‚
โ”‚    251 โ”‚   def __init__(self, name, mode):                                                       โ”‚
โ”‚ โฑ  252 โ”‚   โ”‚   super().__init__(open(name, mode))                                                โ”‚
โ”‚    253 โ”‚                                                                                         โ”‚
โ”‚    254 โ”‚   def __exit__(self, *args):                                                            โ”‚
โ”‚    255 โ”‚   โ”‚   self.file_like.close()                                                            โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
FileNotFoundError: [Errno 2] No such file or directory: './pretrained/pytorch_model-00001-of-00003.bin'

During handling of the above exception, another exception occurred:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /home/hemang/Documents/GitHub/ChatDoctor/chat.py:43 in <module>                                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    40 โ”‚                                                                                          โ”‚
โ”‚    41 โ”‚   generator = model.generate                                                             โ”‚
โ”‚    42                                                                                            โ”‚
โ”‚ โฑ  43 load_model("./pretrained/")                                                                โ”‚
โ”‚    44                                                                                            โ”‚
โ”‚    45 First_chat = "ChatDoctor: I am ChatDoctor, what medical questions do you have?"            โ”‚
โ”‚    46 print(First_chat)                                                                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/hemang/Documents/GitHub/ChatDoctor/chat.py:28 in load_model                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    25 โ”‚   print('gpu_count', gpu_count)                                                          โ”‚
โ”‚    26 โ”‚                                                                                          โ”‚
โ”‚    27 โ”‚   tokenizer = transformers.LLaMATokenizer.from_pretrained(model_name)                    โ”‚
โ”‚ โฑ  28 โ”‚   model = transformers.LLaMAForCausalLM.from_pretrained(                                 โ”‚
โ”‚    29 โ”‚   โ”‚   model_name,                                                                        โ”‚
โ”‚    30 โ”‚   โ”‚   #device_map=device_map,                                                            โ”‚
โ”‚    31 โ”‚   โ”‚   #device_map="auto",                                                                โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/hemang/.local/lib/python3.11/site-packages/transformers/modeling_utils.py:2630 in          โ”‚
โ”‚ from_pretrained                                                                                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2627 โ”‚   โ”‚   โ”‚   โ”‚   mismatched_keys,                                                          โ”‚
โ”‚   2628 โ”‚   โ”‚   โ”‚   โ”‚   offload_index,                                                            โ”‚
โ”‚   2629 โ”‚   โ”‚   โ”‚   โ”‚   error_msgs,                                                               โ”‚
โ”‚ โฑ 2630 โ”‚   โ”‚   โ”‚   ) = cls._load_pretrained_model(                                               โ”‚
โ”‚   2631 โ”‚   โ”‚   โ”‚   โ”‚   model,                                                                    โ”‚
โ”‚   2632 โ”‚   โ”‚   โ”‚   โ”‚   state_dict,                                                               โ”‚
โ”‚   2633 โ”‚   โ”‚   โ”‚   โ”‚   loaded_state_dict_keys,  # XXX: rename?                                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/hemang/.local/lib/python3.11/site-packages/transformers/modeling_utils.py:2939 in          โ”‚
โ”‚ _load_pretrained_model                                                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2936 โ”‚   โ”‚   โ”‚   โ”‚   # Skip the load for shards that only contain disk-offloaded weights when  โ”‚
โ”‚   2937 โ”‚   โ”‚   โ”‚   โ”‚   if shard_file in disk_only_shard_files:                                   โ”‚
โ”‚   2938 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   continue                                                              โ”‚
โ”‚ โฑ 2939 โ”‚   โ”‚   โ”‚   โ”‚   state_dict = load_state_dict(shard_file)                                  โ”‚
โ”‚   2940 โ”‚   โ”‚   โ”‚   โ”‚                                                                             โ”‚
โ”‚   2941 โ”‚   โ”‚   โ”‚   โ”‚   # Mistmatched keys contains tuples key/shape1/shape2 of weights in the c  โ”‚
โ”‚   2942 โ”‚   โ”‚   โ”‚   โ”‚   # matching the weights in the model.                                      โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/hemang/.local/lib/python3.11/site-packages/transformers/modeling_utils.py:418 in           โ”‚
โ”‚ load_state_dict                                                                                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    415 โ”‚   โ”‚   return torch.load(checkpoint_file, map_location="cpu")                            โ”‚
โ”‚    416 โ”‚   except Exception as e:                                                                โ”‚
โ”‚    417 โ”‚   โ”‚   try:                                                                              โ”‚
โ”‚ โฑ  418 โ”‚   โ”‚   โ”‚   with open(checkpoint_file) as f:                                              โ”‚
โ”‚    419 โ”‚   โ”‚   โ”‚   โ”‚   if f.read(7) == "version":                                                โ”‚
โ”‚    420 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   raise OSError(                                                        โ”‚
โ”‚    421 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   "You seem to have cloned a repository without having git-lfs ins  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
FileNotFoundError: [Errno 2] No such file or directory: './pretrained/pytorch_model-00001-of-00003.bin'

pretrained/pytorch_model-00001-of-00003.bin FileNotFoundError

Hi, thank you for this model!
I am trying to build this app and getting this error message:

File "/home/ChatDoctor-main/env_doct/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2939, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ChatDoctor-main/env_doct/lib/python3.11/site-packages/transformers/modeling_utils.py", line 418, in load_state_dict
with open(checkpoint_file) as f:
^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: './pretrained/pytorch_model-00001-of-00003.bin'

Already filled this form: link. :
https://forms.office.com/Pages/ResponsePage.aspx?id=lYZBnaxxMUy1ssGWyOw8ij06Cb8qnDJKvu2bVpV1-ANUMDIzWlU0QTUxN0YySFROQk9HMVU0N0xJNC4u

Can you please share this file 'pytorch_model-00001-of-00003.bin'

thanks in advance

Why do the instructions start with "If you are a doctor"

Thank you for your interesting work.
In the chatdoctor5k.json and chatdoctor200k.json I see that the instructions start with "If you are a doctor".
I am curious why the instructions do not start with "You are a doctor".
Is this a common way to perform the alpaca instruction fine-tuning?

pretrained/pytorch_model-00001-of-00003.bin

pytorch pretrained

Greetings, and thank you for this model!

Please i cant find or know what to do to find and install this file 'pytorch_model-00001-of-00003.bin'

If you can guide me please, thanks in advance

DDP expects same model across all ranks, but Rank 0 has 128 params, while rank 1 has inconsistent 0 params.

Hi๏ผŒI met a problem that said ranks have different model. Followings are details.


./train_lora.sh
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in y
our application as needed.


===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /root/anaconda3/envs/chat-doctor/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
bin /root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda118.so




Finetuning model with params:
base_model: /disk2/data/xk/retr-llm/files/model/llama-7b/
data_path: /disk2/data/xk/retr-llm/files/datasets/mental_health_chatbot_dataset.json
output_dir: ./lora-chatDoctor_bs192_Mbs24_ep3_len512_lr3e-5_fromAlpacaLora
batch_size: 192
micro_batch_size: 24
num_epochs: 3
learning_rate: 3e-05
cutoff_len: 256
val_set_size: 120
use_gradient_checkpointing: False
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules: None
bottleneck_size: 256
non_linearity: tanh
adapter_dropout: 0.0
use_parallel_adapter: False
use_adapterp: False
train_on_inputs: True
scaling: 1.0
adapter_name: lora
target_modules: None
group_by_length: False
wandb_project:
wandb_run_name:
wandb_watch:
wandb_log_model:
resume_from_checkpoint: None
Loading checkpoint shards: 100%|##########| 33/33 [00:12<00:00, 2.58it/s]
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
Map: 100%|##########| 52/52 [00:00<00:00, 687.22 examples/s]
Map: 100%|##########| 120/120 [00:00<00:00, 765.56 examples/s]
[E ProcessGroupNCCL.cpp:828] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807082 milliseconds before timi
ng out.
Traceback (most recent call last):
File "train_lora.py", line 353, in
fire.Fire(train)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "train_lora.py", line 299, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1749, in _inner_training_loop
model = self._wrap_model(self.model_wrapped)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1569, in _wrap_model
model = nn.parallel.DistributedDataParallel(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 674, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/torch/distributed/utils.py", line 118, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: DDP expects same model across all ranks, but Rank 0 has 128 params, while rank 1 has inconsistent 0 params.
[E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplet
e data.
[E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807082 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:828] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807414 milliseconds before timi
ng out.
Traceback (most recent call last):
File "train_lora.py", line 353, in
fire.Fire(train)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "train_lora.py", line 299, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1749, in _inner_training_loop
model = self._wrap_model(self.model_wrapped)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/transformers/trainer.py", line 1569, in _wrap_model
model = nn.parallel.DistributedDataParallel(
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 674, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/root/anaconda3/envs/chat-doctor/lib/python3.8/site-packages/torch/distributed/utils.py", line 118, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: DDP expects same model across all ranks, but Rank 3 has 128 params, while rank 0 has inconsistent 0 params.
[E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplet
e data.
[E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807414 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:828] [Rank 6] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1807716 milliseconds before timi
ng out.


my environment๏ผš
GPU๏ผš8 X A100 80GB
pytorch version๏ผš2.0.1

How can I solve this bug๏ผŸThanks!

ImportError: cannot import name 'openai_object' from 'openai'

Traceback (most recent call last):
File "/content/ChatDoctor/train.py", line 25, in
import utils
File "/content/ChatDoctor/utils.py", line 15, in
from openai import openai_object
ImportError: cannot import name 'openai_object' from 'openai' (/usr/local/lib/python3.10/dist-packages/openai/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train.py", line 25, in
import utils
File "/content/ChatDoctor/utils.py", line 15, in
from openai import openai_object
ImportError: cannot import name 'openai_object' from 'openai' (/usr/local/lib/python3.10/dist-packages/openai/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train.py", line 25, in
import utils
File "/content/ChatDoctor/utils.py", line 15, in
from openai import openai_object
ImportError: cannot import name 'openai_object' from 'openai' (/usr/local/lib/python3.10/dist-packages/openai/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train.py", line 25, in
import utils
File "/content/ChatDoctor/utils.py", line 15, in
from openai import openai_object
ImportError: cannot import name 'openai_object' from 'openai' (/usr/local/lib/python3.10/dist-packages/openai/init.py)
[2024-02-09 11:04:21,257] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 14644) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/content/ChatDoctor/train.py FAILED

Failures:
[1]:
time : 2024-02-09_11:04:21
host : 10ca0fca3068
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 14645)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-02-09_11:04:21
host : 10ca0fca3068
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 14646)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-02-09_11:04:21
host : 10ca0fca3068
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 14647)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-02-09_11:04:21
host : 10ca0fca3068
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 14644)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Dataset License

Hello,
Is there a specific license for the associated datasets?

How do you create format_dataset.csv?

Hi, @Kent0n-Li in the paper you mentioned that you used MedlinePlus as database to create the format_dataset.csv, but I found if directly typing the name in format_dataset.csv (e.g.: Panic disorder) in MedlinePlus system, there are multiple results and symptoms may not be complete. Just wondering how you create this file? Is there any scripts or you just manually select it? Can you give an example of how you utilize MedlinePlus system to derive Symptom,reason, TestsAndProcedures, commonMedications in format_dataset.csv(e.g.: Panic disorder )

About Instruction data generation

The first step in building a physician-patient conversation dataset is to collect the disease database that serves as the gold standard. Therefore, we collected and organized a database of diseases, which contains about 700 diseases with their relative symptoms, medical tests, and recommended medications. To train high-quality conversation models on an academic budget, we input each message from the disease database separately as a prompt into the ChatGPT API to automatically generate instruction data. It is worth noting that our prompts to the ChatGPT API contain the gold standard of diseases and symptoms, and drugs, so our fine-tuned ChatDoctor is not only able to achieve ChatGPT's conversational fluency but also higher diagnostic accuracy compared to ChatGPT. We finally collected 5K doctor-patient conversation instructions and named it InstructorDoctor-5K.

I'm confused by this process. Can anyone explain it more precisely?

Why did you chose to train in two steps?

Thanks for your sharing, your attempt is very interesting and valuable.

However, I have some questions about the training process.

I notice that ChatDoctor is first trained using 52K instruction-following data from provided by Stanford Alpaca, and then finetuned on the your specific data.

Why not finetune the model using a mixture of two parts of the data?

What is the insight of this finetuning process?
What is the insight behind this finetuning model?

Have you ever tried to train with two pieces of data mixed together?

Colab run out of ram memory ?

I cannot load the model because Colab Free does not have enough ram memory. Does anyone have a solution for this ?
Thanks

generate duplicate text

I had trained a ChatDoctor using lora following your instruction, and evaluate in iCliniq-10k without providing external documents (e.g. wikpedia or medical databse). The problem is the model generates duplicate context as the below example:

Patient: 'Hi doctor,My wife missed her periods and we doubt she is pregnant. Our first kid is just 1 year old now and it was a Cesarean. We are not in a position to have a second child now. Her LMP was 22nd of last month. I read that Mifepristone can induce abortion. Is it a safe drug? Do we have to consult doctor for taking this drug? What are all the possible side effects? Is that drug 100 percent reliable? Is there any other options? Whether this drug is available without doctor\s prescription? Thank you.'
Predictions: 1. Mifepristone is a safe drug. 2. Mifepristone is a safe drug. 3. Mifepristone is a safe drug. 4. Mifepristone is a safe drug. 5. Mifepristone is a safe drug. 6. Mifepristone is a safe drug. 7. Mifepristone is a safe drug. 8. Mifepristone is a safe drug. 9. Mifepristone is a safe drug. 10. Mifepristone is a safe drug. 11. Mifepristone is a safe drug. 12. Mifepristone is a safe drug. 13. Mifepristone is a safe drug. 14. Mifepristone is a safe drug. 15. Mifepristone is a safe drug. 16. Mifepristone is a safe drug. 17. Mifepristone is a safe drug. 18. Mifepristone is a safe drug. 19. Mifepristone is a safe drug.
ChatDoctor: 1. No, mifepristone is not available without doctors prescription. 2. It is a safe medication, but as with any medications there may be some side effects which include nausea, vomiting, headache, diarrhea etc. These symptoms usually subside within 48 hours after ingestion.3. Yes, it is 100% reliable if taken under supervision by a trained gynecologist.4. There are many other options for termination of early pregnancy like medical methods (like MTP kit) or surgical methods (like suction evacuation). But these need to be done under proper guidance of a qualified Gynecologist.'

In my genertaed texts, this phenomeon is pretty common. Did you encounter this problem before? How did you solve this? thx!

Question about output in HealthCareMagic-100k.json

Hi, I noticed that a lot of "Chat Doctor" appears in the outputs of HealthCareMagic-100k.json. For example:
""Hi thanks for contacting Chat Doctor ... Your brother have both hepatitis b and c positive...."
"Hi and welcome to Chat Doctor."
"Hi and welcome to Chat Doctor. Thank you for your query. I am Chat Doctor.."

I wonder if that cost by some post processing? Is there any data without these words?

What prompt was used to generate the dataset?

Hello,

I am curious to know which prompt you used to generate the dataset. I couldnt find it in the utils.py`.

Also you might want to remove your OpenAI API key from the utils.py.

difference between the paper and code

It seems that in your paper the train dataset is 'InstructorDoctor-205k' but in this repo, from the training command, the dataset is 'HealthCareMagic-100k.json'
In the paper, the training was 'fine tuning on nstructorDoctor-205k (seems to be one step?)', but in this repo: 'Our model was firstly be fine-tuned by Stanford Alpaca's data to have some basic conversational capabilities.' does it mean the repo contains updated method?
Training time difference: paper - 18 hours. repo - 30 minutes
Can you help to provide some clarifications?
Thanks!

Abandoned (core dumped)

Hello, I am a college student reading your paper. My server GPU is only 48G, does that mean I don't have enough memory in my GPU to do the inference

wandb error

i got an error when run train.py :
wandb: ERROR api_key not configured (no-tty). call wandb.login(key=[your_api_key])

ImportError: cannot import name 'BottleneckConfig' from 'peft'

File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
Traceback (most recent call last):
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402Traceback (most recent call last):

ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
File "/content/ChatDoctor/train_lora.py", line 17, in
from peft import ( # noqa: E402
ImportError: cannot import name 'BottleneckConfig' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)
[2024-02-09 11:16:17,756] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 17760) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/content/ChatDoctor/train_lora.py FAILED

Failures:
[1]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 17761)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 17762)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 17763)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 17764)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 17765)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-02-09_11:16:17
host : 10ca0fca3068
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 17760)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

gpu_count 0 | Cuda issue

cuda 0

As you can see im in Conda Powershell as Admin. I have installed PyTorch 2.0 with the updated torchvision for acceleration, along with the required downloads for transformers and tokenizer. The models load as well from the pretrained folder. Additionally, I have installed the CUDA toolkit 11.7 with drivers, and my GTX 1060 GPU 6GVRam is listed as available for computing. However, when attempting to activate CUDA, it shows as 0 or false. I am currently in the correct Conda environment and CUDA is installed and activated, but the issue persists. I noticed in the chat.py file that the model tokenizer shows as 8-bit floats to be disabled, which leads me to wonder if it is related. Also to mention that the tokenizers LLama name is written falsely perhaps, because i have found a thread on github on it, here is the link treadon/llama-7b-example#1 (comment) . There may be a typo error in your code as well in the chat.py file. I have been working on this issue for 3 days and would greatly appreciate any help. Thank you.

chatpy

HFValidationError when running chat_wiki.py and chat_csv.py files

I got the following error when I run the chat_wiki.py :

raise HFValidationError(
huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'chatDoctor100k/'.

demo on the huggingface

I am a medico, i don't have a server with large GPUs. can you online(or re-onine) the demo on the huggingface?
Thank you so much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.