Giter VIP home page Giter VIP logo

edn's Introduction

EDN


Overview

This repository provides a PyTorch implementation of the EDN architecture presented in "Protein model quality assessment using rotation-equivariant transformations on point clouds" (Proteins,arXiv). EDN is an equivariant neural network designed to predict the accuracy of a protein model. We tested EDN as part of the blind prediction experiment on model quality assessment in CASP 14 (https://predictioncenter.org/casp14/index.cgi).

This document contains instructions on how to use the EDN architecture for general training and inference. In addition, we provide specific instructions on how to train and evaluate a network using protein model datasets from CASP 5-14. The EDN model that we used for the blind structure prediction as part of CASP14 was built with TensorFlow and is available as a webserver.

EDN builds on tensor field networks and the PAUL network for protein complexes.

Installation

Create a conda environment

conda create -n edn python=3.9 pip
conda activate edn

Install torch

Install appropriate versions of torch and attendant libraries. Please set the adequate version of CUDA for your system. The instructions shown are for CUDA 11.7. If you want to install the CPU-only version, use CUDA="".

TORCH="1.13.0"
CUDA="cu117"
pip install torch==${TORCH}+${CUDA} -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric

pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html

Install pytorch-lightning and other generic dependencies

pip install pytorch-lightning python-dotenv wandb

Install e3nn

In addition, we need to install an EDN-compatible version of the e3nn library (https://github.com/e3nn/e3nn). Please note that this specific version is only provided for compatability, further development should be done using the main e3nn branch.

pip install git+ssh://[email protected]/drorlab/e3nn_edn.git

Install atom3d

We use the LMDB data format from Atom3D (https://www.atom3d.ai/) for fast random access. To install the atom3d package:

pip install atom3d

Usage

Creating an LMDB input dataset

The code expects protein models in LMDB format as input. To convert a set of PDB files (here files in data/pdbs) to the LMDB format, run:

python -m atom3d.datasets data/pdbs example_lmdb -f pdb

Training

To train a network on a CPU using an example LMDB file located at data/example_data, run the following command:

python -m edn.train data/example_data data/example_data --batch_size=2 --accumulate_grad_batches=32 --learning_rate=0.001 --max_epochs=6 --output_dir out/model --num_workers=4

Note that this will run quite slowly. To run faster, consider using a GPU (see below).

Inference

To make predictions, the general format is as follows:

python -m edn.predict input_dir checkpoint.ckpt output.csv [--nolabels]

For example, to predict on the example LMDB file included in the repository, using dummy weights:

python -m edn.predict data/example_data data/sample_weights.ckpt output.csv --nolabels

The expected output in output.csv for the above command would be (with possible fluctuation in up to 7th decimal place):

id,target,pred
T0843-Alpha-Gelly-Server_TS3.pdb,0.0000000,0.4594232
T0843-BioSerf_TS1.pdb,0.0000000,0.5363037

Using a GPU

You can enable a GPU with the --gpus flag. It is also recommended to provision additional CPUs with the --num_workers flags (more is better). The GPU should have at least 12GB of memory. A training example:

python -m edn.train data/example_data data/example_data --batch_size=2 --accumulate_grad_batches=32 --learning_rate=0.001 --max_epochs=6 --output_dir out/model --gpus=1 --num_workers=4

To run inference using a GPU:

python -m edn.predict data/example_data data/sample_weights.ckpt output.csv --nolabels --gpus=1 --num_workers=4

Training and testing EDN on CASP datasets

In order to facilitate training and evaluation on the structural model sets from previous CASP experiments, we make pre-processed CASP 5-14 datasets available in LMDB format. These can be accessed at https://drive.google.com/drive/u/1/folders/1ssJpmdCKcPZo5iQfo2_f5iEwn1Z6zx32. The structural models were downloaded from the official CASP website at https://predictioncenter.org/.

Assuming that the CASP datasets are placed in the data folder, you can run the following command to train an EDN network with the CASP 5-10 datasets on a GPU as follows:

python -m edn.train data/casp5_to_10/data/train data/casp5_to_10/data/val --batch_size=2 --accumulate_grad_batches=32 --learning_rate=0.001 --max_epochs=1 --output_dir out/model --gpus=1 --num_workers=4

The output files are written to the folder out/model.

To test the trained network on a CASP dataset (here the stage 2 scoring set from CASP 11) using a GPU, run the following command:

python -m edn.predict data/casp11_stage2/data out/model/checkpoints/last.ckpt output.csv --nolabels --gpus=1 --num_workers=4

edn's People

Contributors

psuriana avatar seismann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

edn's Issues

Pretrained model weights

I'm interested in using the model that generated the results in your published work, but it seems that you have not made that available from what I can tell. Is it actually the case that they're not available?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.