Giter VIP home page Giter VIP logo

bimedix's Introduction

BiMediX: Bilingual Medical Mixture of Experts LLM

Oryx Video-ChatGPT

* Equally contributing first authors

Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), UAE

Website HuggingFace Paper Demo License

Video Title


๐Ÿ“ข Latest Updates

  • Feb-22-24: Trained models and demo are live. ๐Ÿ”ฅ๐Ÿ”ฅ
  • Feb-21-24: BiMediX paper is released arxiv link. ๐Ÿ”ฅ๐Ÿ”ฅ
  • ๐Ÿ“ฆ Code and datasets coming soon! ๐Ÿš€

๐Ÿ‘ฉโ€โš•๏ธ Overview

We introduce BiMediX, the first bilingual medical mixture of experts LLM designed for seamless interaction in both English and Arabic.
Our model facilitates a wide range of medical interactions in English and Arabic, including multi-turn chats to inquire about additional details such as patient symptoms and medical history, multiple-choice question answering, and open-ended question answering.

Our models are available for download at the Project's HuggingFace Page.


๐Ÿ† Contributions

Our contributions are as follows:

  • We introduce BiMediX, the first bilingual medical LLM with expertise in both English and Arabic, enabling seamless medical interactions such as multi-turn chats, multiple choice, and closed question answering.
  • We developed a semi-automated translation pipeline with human verification for high-quality translation of English medical texts into Arabic, aiding in the creation of a dataset and benchmark for Arabic healthcare LLM evaluation.
  • We curated the BiMed1.3M dataset, a comprehensive Arabic-English bilingual instruction set with over 1.3 million instructions and 632 million healthcare-specialized tokens, supporting diverse medical interactions and enabling a chatbot for patient follow-ups, with a focus on a 1:2 Arabic-to-English ratio across medical content.
  • BiMediX outperforms existing models in medical benchmarks while being 8-times faster than comparable existing approaches.

โšก Model

  • The BiMediX model, built on the state-of-the-art Mixture of Experts (MoE) architecture, leverages the Mixtral-8x7B base model. This approach enables the model to scale significantly by utilizing a sparse operation method, where less than 13 billion parameters are active during inference, enhancing efficiency.
  • The training utilized the BiMed1.3M dataset, focusing on bilingual medical interactions in both English and Arabic, with a substantial corpus of over 632 million healthcare-specialized tokens.
  • The fine-tuning process included QLoRA, of both experts and router, to adapt the model efficiently to specific tasks while keeping computational demands manageable.
Model Name Link Download
BiMediX-Bilingual HuggingFace
BiMediX-Arabic HuggingFace
BiMediX-English HuggingFace

๐Ÿ” Data

The BiMed1.3M dataset, central to BiMediX's training, was meticulously compiled to include a wide range of medical interactions. The creation process involved generating multi-turn chat conversations using ChatGPT, based on publicly available medical MCQAs to simulate realistic doctor-patient dialogues. This dataset includes over 200,000 high-quality multi-turn medical dialogues, enriching the model's training material.

data gif

A semi-automated, iterative translation process was employed to create high-quality Arabic versions of the data, utilizing ChatGPT for initial translations and human professionals for refinement. This ensured the dataset's fidelity and relevance across both English and Arabic. Furthermore, we translated the English evaluation set to Arabic to evaluate the models. Through these meticulous data creation and processing efforts, BiMediX is able to excel in understanding and generating medical content across two languages.


๐Ÿ’ซ Qualitative Results

Bilingual Conversation Multiple Choice Question Answering

๐Ÿ“Š Quantitative Results

The BiMediX model was evaluated across several benchmarks, demonstrating its effectiveness in medical language understanding and question answering in both English and Arabic.

Medical Benchmarks Used for Evaluation:

  • PubMedQA: A dataset for question answering from biomedical research papers, requiring reasoning over biomedical contexts.
  • MedMCQA: Multiple-choice questions from Indian medical entrance exams, covering a wide range of medical subjects.
  • MedQA: Questions from US and other medical board exams, testing specific knowledge and patient case understanding.
  • Medical MMLU: A compilation of questions from various medical subjects, requiring broad medical knowledge.

Bilingual Benchmark

Model CKG CBio CMed MedGen ProMed Ana MedMCQA MedQA PubmedQA AVG
Jais-30B 57.4 55.2 46.2 55.0 46.0 48.9 40.2 31.0 75.5 50.6
Mixtral-8x7B 59.1 57.6 52.6 59.5 53.3 54.4 43.2 40.6 74.7 55.0
BiMediX (Bilingual) 70.6 72.2 59.3 74.0 64.2 59.6 55.8 54.0 78.6 65.4

BiMediX shows superior performance in bilingual (Arabic-English) evaluations, outperforming both the Mixtral-8x7B base model and Jais-30B. It demonstrated more than 10 and 15 points higher average accuracy, respectively.

Arabic Benchmark

Model CKG CBio CMed MedGen ProMed Ana MedMCQA MedQA PubmedQA AVG
Jais-30B 52.1 50.7 40.5 49.0 39.3 43.0 37.0 28.8 74.6 46.1
BiMediX (Arabic) 60.0 54.9 55.5 58.0 58.1 49.6 46.0 40.2 76.6 55.4
BiMediX (Bilingual) 63.8 57.6 52.6 64.0 52.9 50.4 49.1 47.3 78.4 56.5

In Arabic-specific evaluations, BiMediX outperforms Jais-30B in all categories, highlighting the effectiveness of the BiMed1.3M dataset and bilingual training.

English Benchmark

Model CKG CBio CMed MedGen ProMed Ana MedMCQA MedQA PubmedQA AVG
PMC-LLaMA-13B 63.0 59.7 52.6 70.0 64.3 61.5 50.5 47.2 75.6 60.5
Med42-70B 75.9 84.0 69.9 83.0 78.7 64.4 61.9 61.3 77.2 72.9
Clinical Camel-70B 69.8 79.2 67.0 69.0 71.3 62.2 47.0 53.4 74.3 65.9
Meditron-70B 72.3 82.5 62.8 77.8 77.9 62.7 65.1 60.7 80.0 71.3
BiMediX 78.9 86.1 68.2 85.0 80.5 74.1 62.7 62.8 80.2 75.4

BiMediX also excells in English medical benchmarks, surpassing other state-of-the-art models like Med42-70B and Meditron-70B in terms of average performance and efficiency.

These results underscore BiMediX's advanced capability in handling medical queries and its significant improvement over existing models in both languages, leveraging its unique bilingual dataset and training approach.


๐Ÿ“œ License & Citation

BiMediX is released under the CC-BY-NC-SA 4.0 License. For more details, please refer to the LICENSE file included in this repository.

โš ๏ธ Warning! This release, intended for research, is not ready for clinical or commercial use.

Users are urged to employ BiMediX responsibly, especially when applying its outputs in real-world medical scenarios. It is imperative to verify the model's advice with qualified healthcare professionals and not rely on it for medical diagnoses or treatment decisions. Despite the overall advancements BiMediX shares common challenges with other language models, including hallucinations, toxicity, and stereotypes.
BiMediX's medical diagnoses and recommendations are not infallible.

If you use BiMediX in your research, please cite our work as follows:

@misc{pieri2024bimedix,
      title={BiMediX: Bilingual Medical Mixture of Experts LLM}, 
      author={Sara Pieri and Sahal Shaji Mullappilly and Fahad Shahbaz Khan and Rao Muhammad Anwer and Salman Khan and Timothy Baldwin and Hisham Cholakkal},
      year={2024},
      eprint={2402.13253},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}  


๐Ÿ™ Acknowledgements

We are thankful to Mistral AI for releasing their models and FastChat and Axolotl for their open-source code contributions.


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.