Giter VIP home page Giter VIP logo

ostrokach / proteinsolver Goto Github PK

View Code? Open in Web Editor NEW
51.0 2.0 8.0 277.05 MB

Graph neural network for generating novel amino acid sequences that fold into proteins with predetermined topologies.

Home Page: http://proteinsolver.org

License: MIT License

Dockerfile 0.03% Shell 0.03% Python 1.10% CSS 0.01% HTML 0.01% Jupyter Notebook 98.57% C 0.26%
protein-sequence protein-structure protein-design graph-neural-networks protein bioinformatics structural-biology

proteinsolver's Introduction

ProteinSolver

gitlab docs poster binder conda pipeline status coverage report

Description

ProteinSolver is a deep neural network which learns to solve (ill-defined) constraint satisfaction problems (CSPs) from training data. It has shown promising results both on a toy problem of learning how to solve Sudoku puzzles and on a real-world problem of designing protein sequences that fold into a predetermined geometric shape.

Demo notebooks

The following notebooks can be used to explore the basic functionality of proteinsolver.

Notebook name MyBinder Description
20_sudoku_demo.ipynb binder Use a pre-trained network to solve a single Sudoku puzzle.
06_sudoku_analysis.ipynb binder Evaluate a network trained to solve Sudoku puzzles using the validation
and test datasets.
(This notebook is resource-intensive and is best ran on a machine with a GPU).
20_protein_demo.ipynb binder Use a pre-trained network to design sequences for a single protein geometry.
06_protein_analysis.ipynb binder Evaluate a network trained to reconstruct protein sequences using the
validation and test datasets.
(This notebook is resource-intensive and is best ran on a machine with a GPU).

Other notebooks in the notebooks/ directory show how to perform more extensive validations of the networks and how to train new networks.

Docker images

Docker images with all required dependencies are provided at: https://gitlab.com/ostrokach/proteinsolver/container_registry.

To evaluate a proteinsolver network from a Jupyter notebook, we can run the following:

docker run -it --rm -p 8000:8000 registry.gitlab.com/ostrokach/proteinsolver:v0.1.25 jupyter notebook --ip 0.0.0.0 --port 8000

Installation

We recommend installing proteinsolver into a clean conda environment using the following command:

conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver

Development

First, use conda to install proteinsolver into a new conda environment. This will also install all dependencies.

conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver

Second, run pip install --editable . inside the root directory of this package. This will force Python to use the development version of our code.

cd path/to/proteinsolver
pip install --editable .

Pre-trained models

Pre-trained models can be downloaded using wget by running the following command in the root folder of the proteinsolver repository:

wget -r -nH --cut-dirs 1 --reject "index.html*" "http://models.proteinsolver.org/v0.1/"

For an example of how to use a pretrained ProteinSolver models in downstream applications (such as mutation ΔΔG prediction), see the elaspic/elaspic2 repository, and in particular the src/elaspic2/plugins/proteinsolver module.

Training and validation datasets

Data used to train and validate the "proteinsolver" network to solve Sudoku puzzles and reconstruct protein sequences can be downloaded from http://deep-protein-gen.data.proteinsolver.org/:

wget -r -nH --reject "index.html*" "http://deep-protein-gen.data.proteinsolver.org/"

The generation of the training and validation datasets was carried out in our predecessor project: ostrokach/protein-adjacency-net.

Environment variables

  • DATAPKG_DATA_DIR - Location of training and validation data.

Acknowledgements

References

  • Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM. Fast and flexible protein design using deep graph neural networks. Cell Systems (2020); 11: 1–10. doi: 10.1016/j.cels.2020.08.016

proteinsolver's People

Contributors

ostrokach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

proteinsolver's Issues

proteinsolver tutorial

A tool without a tutorial is like a gun without a bullet, where can I access to your protein solver tutorial?

Provide requirements list

Hi,
I am trying to run a protocol inspired by the example20_protein_demo.ipynb for a few backbones. I followed the instructions provided in installation but not all requirements are readily available.
For example, if I try to import proteinsolver

import proteinsolver
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/projects/p31346/cassie/proteinsolver/proteinsolver/__init__.py", line 4, in <module>
    from . import *
  File "/projects/p31346/cassie/proteinsolver/proteinsolver/utils/__init__.py", line 3, in <module>
    from .scatter import *
  File "/projects/p31346/cassie/proteinsolver/proteinsolver/utils/scatter.py", line 1, in <module>
    import torch_geometric
ModuleNotFoundError: No module named 'torch_geometric'

If I import torch_geometric
pip install torch_geometric
and then try to import protein solver again

import proteinsolver
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/projects/p31346/cassie/proteinsolver/proteinsolver/__init__.py", line 4, in <module>
    from . import *
  File "/projects/p31346/cassie/proteinsolver/proteinsolver/utils/__init__.py", line 3, in <module>
    from .scatter import *
  File "/projects/p31346/cassie/proteinsolver/proteinsolver/utils/scatter.py", line 1, in <module>
    import torch_geometric
  File "/home/ajf4103/anaconda3/envs/proteinsolver/lib/python3.7/site-packages/torch_geometric/__init__.py", line 4, in <module>
    import torch_geometric.data
  File "/home/ajf4103/anaconda3/envs/proteinsolver/lib/python3.7/site-packages/torch_geometric/data/__init__.py", line 1, in <module>
    from .data import Data
  File "/home/ajf4103/anaconda3/envs/proteinsolver/lib/python3.7/site-packages/torch_geometric/data/data.py", line 9, in <module>
    from torch_sparse import SparseTensor
ModuleNotFoundError: No module named 'torch_sparse'

When I try to install torch_sparse, I find a major error that I couldn't solve:

Collecting torch_sparse
  Using cached torch_sparse-0.6.14.tar.gz (51 kB)
  Preparing metadata (setup.py) ... done
... many lines ...

error: command 'gcc' failed with exit status 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> torch_sparse


note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Would it be possible to provide the packages along with their versions as a requirement file to be installed?
Thanks,
Állan

How I get the data?

Hi,
I read the paper related to proteinsolver, and I've tried to run proteinsolver deoms with jupyter-notebook. But I don't know to get dat...
How do I get the data for running jupyter-notebook?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.