- Quantum Mechanics: 19443 ligands, curated and refined
- Molecular Dynamics: 16972 simulated protein-ligand structures, 10 ns each
- AI: pytorch dataloaders, 3 base line models for MD and QM and binding affinity prediction
We are a drug discovery community project π€
- highest possible accuracy for ligand molecules
- represent the systems dynamics in reasonable timescales
- innovative AI models for drug discovery predictions
Lets crack the 100+ ns MD, 30000+ protein-ligand structures and a whole new world of AI models for drug discovery together.
Want to get hands-on for drug discovery using AI?
Check out our Hugging Face spaces to run and visualize the adaptability model and to perform QM property predictions.
In this repository, we show how to download and apply the Misato database for AI models. You can access the calculated properties of different protein-ligand structures and use them for training in Pytorch based dataloaders. We provide a small sample of the dataset along with the repo.
You can freely download the FULL MISATO dataset from Zenodo using the links below:
- MD (133 GiB)
- QM (0.3 GiB)
- electronic densities (6 GiB)
- MD restart and topology files (55 GiB)
wget -O data/MD/h5_files/MD.hdf5 https://zenodo.org/record/7711953/files/MD.hdf5
wget -O data/QM/h5_files/QM.hdf5 https://zenodo.org/record/7711953/files/QM.hdf5
Start with the notebook src/getting_started.ipynb to :
- Understand the structure of our dataset and how to access each molecule's properties.
- Load the PyTorch Dataloaders of each dataset.
- Load the PyTorch lightning Datamodules of each dataset.
We recommend to pull our MISATO image from DockerHub or to create your own image (see docker/). The images use cuda version 11.8. We recommend to install on your own system a version of CUDA that is a least 11.8 to ensure that the drivers work correctly.
# clone project
git clone https://github.com/t7morgen/misato-dataset.git
cd misato-dataset
For singularity use:
# get the container image
singularity pull docker://sab148/misato-dataset
singularity shell misato.sif
For docker use:
sudo docker pull sab148/misato-dataset:latest
bash docker/run_bash_in_container.sh
βββ data <- Project data
β βββMD
β β βββ h5_files <- storage of dataset
β β βββ splits <- train, val, test splits
β βββQM
β β βββ h5_files <- storage of dataset
β β βββ splits <- train, val, test splits
β
βββ src <- Source code
β βββ data
β β βββ components <- Datasets and transforms
β β βββ md_datamodule.py <- MD Lightning data module
β β βββ qm_datamodule.py <- QM Lightning data module
β β β
β β βββ processing <- Skripts for preprocessing, inference and conversion
β β βββ...
β βββ getting_started.ipynb <- notebook : how to load data and interact with it
β βββ inference.ipynb <- notebook how to run inference
β
βββ docker <- Dockerfile and execution script
βββ README.md
In case you want to use conda for your own installation please create a new misato environment.
In order to install pytorch geometric we recommend to use pip (within conda) for installation and to follow the official installation instructions:pytorch-geometric/install
Depending on your CUDA version the instructions vary. We show an example for the CUDA 11.8.
conda create --name misato python=3
conda activate misato
conda install -c anaconda pandas pip h5py
pip3 install torch --index-url https://download.pytorch.org/whl/cu118 --no-cache
pip install joblib matplotlib
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
pip install pytorch-lightning==1.8.3
pip install torch-geometric
pip install ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
conda install -c conda-forge nb_conda_kernels
To run inference for MD you have to install ambertools. We recommend to install it in a separate conda environment.
conda create --name ambertools python=3
conda activate ambertools
conda install -c conda-forge ambertools nb_conda_kernels
pip install h5py jupyter ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
If you found this work useful please consider citing the article.
misato-dataset's People
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.