Giter VIP home page Giter VIP logo

misato-dataset-pypi's Introduction

MISATO - Machine learning dataset of protein-ligand complexes for structure-based drug discovery

python pytorch lightning

🌎 Where we are:

  • Quantum Mechanics: 19443 ligands, curated and refined
  • Molecular Dynamics: 16972 simulated protein-ligand structures, 10 ns each
  • AI: pytorch dataloaders, 3 base line models for MD and QM and binding affinity prediction

:electron: Vision:

We are a drug discovery community project πŸ€—

  • highest possible accuracy for ligand molecules
  • represent the systems dynamics in reasonable timescales
  • innovative AI models for drug discovery predictions

Lets crack the 100+ ns MD, 30000+ protein-ligand structures and a whole new world of AI models for drug discovery together.

Check out the paper!

Alt text

πŸ’œ Community

Want to get hands-on for drug discovery using AI?

Join our discord server!

Check out our Hugging Face spaces to run and visualize the adaptability model and to perform QM property predictions.

πŸ“ŒΒ Β Introduction

In this repository, we show how to download and apply the Misato database for AI models. You can access the calculated properties of different protein-ligand structures and use them for training in Pytorch based dataloaders. We provide a small sample of the dataset along with the repo.

You can freely download the FULL MISATO dataset from Zenodo using the links below:

  • MD (133 GiB)
  • QM (0.3 GiB)
  • electronic densities (6 GiB)
  • MD restart and topology files (55 GiB)
wget -O data/MD/h5_files/MD.hdf5 https://zenodo.org/record/7711953/files/MD.hdf5
wget -O data/QM/h5_files/QM.hdf5 https://zenodo.org/record/7711953/files/QM.hdf5

Start with the notebook src/getting_started.ipynb to :

  • Understand the structure of our dataset and how to access each molecule's properties.
  • Load the PyTorch Dataloaders of each dataset.
  • Load the PyTorch lightning Datamodules of each dataset.

πŸš€Β Β Quickstart

We recommend to pull our MISATO image from DockerHub or to create your own image (see docker/). The images use cuda version 11.8. We recommend to install on your own system a version of CUDA that is a least 11.8 to ensure that the drivers work correctly.

# clone project
git clone https://github.com/t7morgen/misato-dataset.git
cd misato-dataset

For singularity use:

# get the container image
singularity pull docker://sab148/misato-dataset
singularity shell misato.sif

For docker use:

sudo docker pull sab148/misato-dataset:latest
bash docker/run_bash_in_container.sh

Project Structure

β”œβ”€β”€ data                   <- Project data
β”‚   β”œβ”€β”€MD 
β”‚   β”‚   β”œβ”€β”€ h5_files           <- storage of dataset
β”‚   β”‚   └── splits             <- train, val, test splits
β”‚   └──QM
β”‚   β”‚   β”œβ”€β”€ h5_files           <- storage of dataset
β”‚   β”‚   └── splits             <- train, val, test splits
β”‚
β”œβ”€β”€ src                    <- Source code
β”‚   β”œβ”€β”€ data                    
β”‚   β”‚   β”œβ”€β”€ components           <- Datasets and transforms
β”‚   β”‚   β”œβ”€β”€ md_datamodule.py     <- MD Lightning data module
β”‚   β”‚   β”œβ”€β”€ qm_datamodule.py     <- QM Lightning data module
β”‚   β”‚   β”‚
β”‚   β”‚   └── processing           <- Skripts for preprocessing, inference and conversion
β”‚   β”‚      β”œβ”€β”€...    
β”‚   β”œβ”€β”€ getting_started.ipynb     <- notebook : how to load data and interact with it
β”‚   └── inference.ipynb           <- notebook how to run inference
β”‚
β”œβ”€β”€ docker                    <- Dockerfile and execution script 
└── README.md



Installation using your own conda environment

In case you want to use conda for your own installation please create a new misato environment.

In order to install pytorch geometric we recommend to use pip (within conda) for installation and to follow the official installation instructions:pytorch-geometric/install

Depending on your CUDA version the instructions vary. We show an example for the CUDA 11.8.

conda create --name misato python=3
conda activate misato
conda install -c anaconda pandas pip h5py
pip3 install torch --index-url https://download.pytorch.org/whl/cu118 --no-cache
pip install joblib matplotlib
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
pip install pytorch-lightning==1.8.3
pip install torch-geometric
pip install ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
conda install -c conda-forge nb_conda_kernels

To run inference for MD you have to install ambertools. We recommend to install it in a separate conda environment.

conda create --name ambertools python=3
conda activate ambertools
conda install -c conda-forge ambertools nb_conda_kernels
pip install h5py jupyter ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7

Citation

If you found this work useful please consider citing the article.

misato-dataset-pypi's People

Contributors

t7morgen avatar till7m avatar sab148 avatar seriussuires avatar georgefwt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.