Awesome-Healthcare-Foundation-Models

Curated list of awesome large AI models (LAMs), or foundation models, in healthcare. We organize the current LAMs into four categories: large language models (LLMs), large vision models (LVMs), large audio models (LAudiMs), and large multi-modal models (LMMs). The areas that these LAMs are applied to include but not limited to bioinformatics, medical diagnosis and decision making, medical imaging and vision, medical informatics, medical education, public health, and medical robotics.

We welcome contributions to this repository to add more resources. Please submit a pull request if you want to contribute!

Survey
Large Language Models
Large Vision Models
Large Audio Models
Large Multi-modal models
Applications of Large AI Models in Healthcare

Survey

This repository is largely based on the following paper:

Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu, Lin Li, Jiankai Sun, Jiachuan Peng, Peilun Shi, Ruiyang Zhang, Yinzhao Dong, Kyle Lam, Frank P.-W. Lo, Bo Xiao, Wu Yuan, Dong Xu, and Benny Lo

If you find this repository helpful, please consider citing:

@article{qiu2023large,
  title={Large AI Models in Health Informatics: Applications, Challenges, and the Future},
  author={Qiu, Jianing and Li, Lin and Sun, Jiankai and Peng, Jiachuan and Shi, Peilun and Zhang, Ruiyang and Dong, Yinzhao and Lam, Kyle and Lo, Frank P-W and Xiao, Bo and others},
  journal={arXiv preprint arXiv:2303.11568},
  year={2023}
}

Large Language Models

Healthcare Domain

KeBioLM: Improving Biomedical Pretrained Language Models with Knowledge [Paper]
BioELMo: Probing Biomedical Embeddings from Language Models [Paper]
BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model [Paper]
ClinicalT5: A Generative Language Model for Clinical Text [Paper]
GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records [Paper]
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [Paper]
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [Paper]
Capabilities of GPT-4 on Medical Challenge Problems [Paper]
BioBERT: a pre-trained biomedical language representation model for biomedical text mining [Paper]
Publicly Available Clinical BERT Embeddings [Paper]
BioMegatron: Larger Biomedical Domain Language Model [Paper]
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [Paper]
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction [Paper]
BioELECTRA:Pretrained Biomedical text Encoder using Discriminators [Paper]
LinkBERT: Pretraining Language Models with Document Links [Paper]
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining [Paper]
Large Language Models Encode Clinical Knowledge [Paper]
A large language model for electronic health records [Paper]
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [Paper]
BEHRT: Transformer for Electronic Health Records [Paper]
RadBERT: Adapting Transformer-based Language Models to Radiology [paper] [HuggingFace]
Highly accurate protein structure prediction with AlphaFold [Paper] [Code]
Accurate prediction of protein structures and interactions using a three-track neural network [Paper]
Protein complex prediction with AlphaFold-Multimer [Paper]
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours [Paper] [Code]
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle [Paper] [Code]
Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold [Paper] [Code]
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization [Paper] [Code]
ManyFold: an efficient and flexible library for training and validating protein folding models [Paper] [Code]
ColabFold: making protein folding accessible to all [Paper] [Code]
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [Paper] [Code]
ProGen: Language Modeling for Protein Generation [Paper] [Code]
ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing [Paper] [Code]
Evolutionary-scale prediction of atomic level protein structure with a language model [Paper]
High-resolution de novo structure prediction from primary sequence [Paper] [Code]
Single-sequence protein structure prediction using a language model and deep learning [Paper]
Improved the Protein Complex Prediction with Protein Language Models [Paper]
MSA Transformer [Paper] [Code]
Deciphering antibody affinity maturation with language models and weakly supervised learning [Paper]
xTrimoABFold: De novo Antibody Structure Prediction without MSA [Paper]
scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data [Paper] [Code]
Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions [Paper] [Code]
E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction [Paper] [Code]

General Domain

Chatgpt: Optimizing language models for dialogue [Blog]
LLaMA: Open and Efficient Foundation Language Models [Paper]
Scaling Instruction-Finetuned Language Models [Paper]
PaLM: Scaling Language Modeling with Pathways [Paper]
Training Compute-Optimal Large Language Models [Paper]
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model [Paper]
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model [Paper]
LaMDA: Language Models for Dialog Applications [Paper]
OPT: Open Pre-trained Transformer Language Models [Paper]
Training language models to follow instructions with human feedback [Paper]
Scaling Language Models: Methods, Analysis & Insights from Training Gopher [Paper]
Multitask prompted training enables zero-shot task generalization [Paper]
Language Models are Few-Shot Learners [Paper]
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [Paper]
RoBERTa: A Robustly Optimized BERT Pretraining Approach [Paper]
Language Models are Unsupervised Multitask Learners [Paper]
Improving language models by retrieving from trillions of tokens [Paper]
WebGPT: Browser-assisted question-answering with human feedback [Paper]
Improving alignment of dialogue agents via targeted human judgements [Paper]
Improving Language Understanding by Generative Pre-Training [Paper]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [Paper]

Large Vision Models

Healthcare Domain

Med3d: Transfer learning for 3d medical image analysis [Paper] [Code]
Models genesis: Generic autodidactic models for 3d medical image analysis [Paper] [Code]
MICLe: Big self-supervised models advance medical image classifications [Paper] [Code]
C2l: Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs By Comparing Image Representations [Paper] [Code]
MoCo-CXR: MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models [Paper] [Code]
Transunet: Transformers make strong encoders for medical image segmentation [Paper] [Code]
Transfuse: Fusing transformers and cnns for medical image segmentation [Paper] [Code]
Medical transformer: Gated axial-attention for medical image segmentation [Paper] [Code]
UNETR: Transformers for 3D Medical Image Segmentation [Paper] [Code]
Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [Paper] [Code]
Swin-unet: Unet-like pure transformer for medical image segmentation [Paper] [Code]

General Domain

CNNs:

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [paper]
Big Transfer (BiT): General Visual Representation Learning [paper]
Designing Network Design Spaces [paper]
Self-supervised Pretraining of Visual Features in the Wild [paper]
EfficientNetV2: Smaller Models and Faster Training [paper]
A ConvNet for the 2020s [paper]
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions [paper]

Vision Transformers:

Generative Pretraining From Pixels [paper]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [paper]
Transformer in Transformer [paper]
Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows [paper]
Training data-efficient image transformers & distillation through attention [paper]
Self-supervised Models are Good Teaching Assistants for Vision Transformers [paper]
Scaling Vision with Sparse Mixture of Experts [paper]
Going Deeper With Image Transformers [paper]
Masked Autoencoders Are Scalable Vision Learners [paper]
Swin Transformer V2: Scaling Up Capacity and Resolution [paper]
Scaling Vision Transformers [paper]
Efficient Self-supervised Vision Transformers for Representation Learning [paper]
Scaling Vision Transformers to 22 Billion Parameters [paper]

CNNs + ViTs:

CoAtNet: Marrying Convolution and Attention for All Data Sizes [paper]
LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference [paper]
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases [paper]

Large Audio Models

Healthcare Domain

General Domain

wav2vec: Unsupervised Pre-training for Speech Recognition [Paper] [Blog]
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training [Paper]
AudioLM: a Language Modeling Approach to Audio Generation [Paper] [Project] [Blog]
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [Paper] [HuggingFace]
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale [Paper] [Blog] [HuggingFace]
MusicLM: Generating Music From Text [Paper] [Project] [Code]
Diffsound: Discrete Diffusion Model for Text-to-sound Generation [Paper] [Project] [Code]
AudioGen: Textually Guided Audio Generation [Paper] [Project]
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision [Paper] [Code] [HuggingFace]
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages [Paper] [Blog]

Large Multi-modal Models

Healthcare Domain

GPT-4 Technical Report [Paper]
Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [Paper]
Contrastive Learning of Medical Visual Representations from Paired Images and Text [Paper] [Code]
Gloria: A multimodal global-local representation learning framework for labelefficient medical image recognition [Paper] [Code]
RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [Paper]

General Domain

Representation learning:

Learning Transferable Visual Models From Natural Language Supervision [paper]
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision [paper]
Florence: A New Foundation Model for Computer Vision [paper]
Grounded Language-Image Pre-Training [paper]
WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training [paper]
FLAVA: A Foundational Language and Vision Alignment Model [paper]
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision [paper]
FILIP: Fine-grained Interactive Language-Image Pre-Training [paper]
Combined Scaling for Open-Vocabulary Image Classification [paper]
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation [paper]
PaLI: A Jointly-Scaled Multilingual Language-Image Model [paper]
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information [paper]
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [paper]
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm [paper]
Language Is Not All You Need: Aligning Perception with Language Models [paper]
PaLM-E: An Embodied Multimodal Language Model [paper]
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models [paper]

Text-to-image generation:

Zero-Shot Text-to-Image Generation [paper]
High-Resolution Image Synthesis With Latent Diffusion Models [paper]
Hierarchical Text-Conditional Image Generation with CLIP Latents [paper]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models [paper]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [paper]
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [paper]

Applications of Large AI Models in Healthcare

Note that some of the following models were not targeted at healthcare applications initially but may have the potential to be transferred to the healthcare domain or inspire future development.

Bioinformatics

Highly accurate protein structure prediction with AlphaFold [Paper] [Code]
Accurate prediction of protein structures and interactions using a three-track neural network [Paper]
Protein complex prediction with AlphaFold-Multimer [Paper]
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours [Paper] [Code]
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle [Paper] [Code]
Uni-Fold: An Open-Source Platform for Developing Protein Folding Models beyond AlphaFold [Paper] [Code]
OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization [Paper] [Code]
ManyFold: an efficient and flexible library for training and validating protein folding models [Paper] [Code]
ColabFold: making protein folding accessible to all [Paper] [Code]
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [Paper] [Code]
ProGen: Language Modeling for Protein Generation [Paper] [Code]
ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing [Paper] [Code]
Evolutionary-scale prediction of atomic level protein structure with a language model [Paper]
High-resolution de novo structure prediction from primary sequence [Paper] [Code]
Single-sequence protein structure prediction using a language model and deep learning [Paper]
Improved the Protein Complex Prediction with Protein Language Models [Paper]
MSA Transformer [Paper] [Code]
Deciphering antibody affinity maturation with language models and weakly supervised learning [Paper]
xTrimoABFold: De novo Antibody Structure Prediction without MSA [Paper]
scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data [Paper] [Code]
Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions [Paper] [Code]
E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction [Paper] [Code]
SMILES-BERT: large scale unsupervised pre-training for molecular property prediction [Paper] [Code]
SMILES Transformer: Pre-trained molecular fingerprint for low data drug discovery [Paper] [Code]
MolBert: Molecular representation learning with language models and domain-relevant auxiliary tasks [Paper] [Code]
AGBT: Algebraic graph-assisted bidirectional transformers for molecular property prediction [Paper] [Code]
GROVER: Self-supervised graph transformer on large-scale molecular data [Paper] [Code]
Molgpt: molecular generation using a transformer-decoder model [Paper] [Code]
A Model to Search for Synthesizable Molecules [Paper] [Code]
Transformer neural network for protein-specific de novo drug generation as a machine translation problem [Paper]
Deepconv-dti: Prediction of drug-target interactions via deep learning with convolution on protein sequences [Paper] [Code]
Graphdta: predicting drug–target binding affinity with graph neural networks [Paper] [Code]
Moltrans: molecular interaction transformer for drug–target interaction prediction [Paper] [Code]
Extracting Predictive Representations from Hundreds of Millions of Molecules [Paper] [Code]
ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties [Project] [Paper]
MPG: Learn molecular representations from large-scale unlabeled molecules for drug discovery [Paper]
MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction [Paper] [Code]
PanGu Drug Model: Learn a Molecule Like a Human [Project] [Paper]
DrugBAN: Interpretable bilinear attention network with domain adaptation improves drug–target prediction [Paper] [Code]
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery [Paper] [Code]

Medical Diagnosis and Decision-making

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [Paper]
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [Paper]
BEHRT: Transformer for Electronic Health Records [Paper]
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction [Paper]
RadBERT: Adapting Transformer-based Language Models to Radiology [paper] [HuggingFace]

Medical Imaging and Vision

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [Paper]
Med3d: Transfer learning for 3d medical image analysis [Paper] [Code]
Models genesis: Generic autodidactic models for 3d medical image analysis [Paper] [Code]
MICLe: Big self-supervised models advance medical image classifications [Paper] [Code]
C2l: Comparing to Learn: Surpassing ImageNet Pretraining on Radiographs By Comparing Image Representations [Paper] [Code]
ConVIRT: Contrastive learning of medical visual representations from paired images and text [Paper] [Code]
Gloria: A multimodal global-local representation learning framework for labelefficient medical image recognition [Paper] [Code]
MoCo-CXR: MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models [Paper] [Code]
Transunet: Transformers make strong encoders for medical image segmentation [Paper] [Code]
Transfuse: Fusing transformers and cnns for medical image segmentation [Paper] [Code]
Medical transformer: Gated axial-attention for medical image segmentation [Paper] [Code]
UNETR: Transformers for 3D Medical Image Segmentation [Paper] [Code]
Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation [Paper] [Code]
Swin-unet: Unet-like pure transformer for medical image segmentation [Paper] [Code]

Medical Informatics

DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [Paper]
Capabilities of GPT-4 on Medical Challenge Problems [Paper]
BioBERT: a pre-trained biomedical language representation model for biomedical text mining [Paper]
Publicly Available Clinical BERT Embeddings [Paper]
BioMegatron: Larger Biomedical Domain Language Model [Paper]
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [Paper]
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction [Paper]
BioELECTRA:Pretrained Biomedical text Encoder using Discriminators [Paper]
LinkBERT: Pretraining Language Models with Document Links [Paper]
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining [Paper]
Large Language Models Encode Clinical Knowledge [Paper]
A large language model for electronic health records [Paper]
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [Paper]
BEHRT: Transformer for Electronic Health Records [Paper]

Medical Education

GPT-4 Technical Report [Paper]
Empowering Beginners in Bioinformatics with ChatGPT [Paper]

Public Health

Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning [Paper]
Clustering Egocentric Images in Passive Dietary Monitoring with Self-Supervised Learning [Paper]
ClimaX: A foundation model for weather and climate [Paper]

Medical Robotics

Decision Transformer: Reinforcement Learning via Sequence Modeling [Paper] [Code]
R3M: A Universal Visual Representation for Robot Manipulation [Paper] [Project] [Code]
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play [Paper] [Project]
PaLM-E: An Embodied Multimodal Language Model [Paper] [Project] [Blog]
A Generalist Agent [Paper] [Blog]
CLIPort: What and Where Pathways for Robotic Manipulation [Paper] [Project] [Code]
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation [Paper] [Project] [Code]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [Paper] [Project] [Code]
VIMA: General Robot Manipulation with Multimodal Prompts [Paper] [Project] [Code]
RT-1: Robotics Transformer for Real-World Control at Scale [Paper] [Project] [Code]
ChatGPT for Robotics: Design Principles and Model Abilities [Paper] [Blog] [Code]

yuanhy1997 / awesome-healthcare-foundation-models Goto Github PK