Giter VIP home page Giter VIP logo

sprof's Introduction

SPROF: To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

Getting Start

These instructions will get you a copy of the project up and running on your local machine.

Environment

SPROF has been implemented in Python3.

Requirements

Installing requirements:

pip3 install -r requirements.txt

or avoiding problems in multiple Python environments:

python3 -m pip install -r requirements.txt

if you want to generate features by 'raw_feature_generate.py', install dssp by

apt-get install dssp

Data Preprocess

This repository has include the CASP13 test set data on 'input/casp_test/', 'target/casp_test/' and raw features on 'raw_features/', raw pdbs on 'raw_pdbs/'.

if you want to generate other data for training/test:

Firstly, install dssp by 'apt-get install dssp' and deposite your pdbs(the examples of pdb in correct format are on 'raw_pdbs/') on 'raw_pdbs/'.

Secondly you should make a list file namd 'xxx.txt' (for example) which contains all your training/test pdbs' name.

Thirdly use the 'raw_feature_generate.py' to generate raw features:

usage: raw_feature_generate.py [-h] [--preprocess_list PREPROCESS_LIST]
--preprocess_list PREPROCESS_LIST
                        The path of your preprocess list.

The generated raw-features will be deposit on 'raw_features', use 'preprocess.py' to generate features for model training/test:

usage: preprocess.py [-h] [--preprocess_list  PREPROCESS_LIST] 
                [--pdb_path PDB_PATH] [--features_path FEATURES_PATH]
                [--input_path INPUT_PATH] [--target_path TARGET_PATH]

optional arguments:
  -h, --help            show this help message and exit
  --preprocess_list PREPROCESS_LIST
                        he path of a preprocess pdb list.
  --pdb_path PDB_PATH
                        the path of pdb
  --features_path FEATURES_PATH
                        the path of features
  --input_path INPUT_PATH
                        the path of input to save
  --target_path TARGET_PATH
                        the path of target to save

The generated input&target data for training/test will be deposit on the input_path and target_path.

Training

Command line:

usage: train.py [-h] [--batch_size BATCH_SIZE] [--learning_rate LEARNING_RATE]
                [--maximum_epoch MAXIMUM_EPOCH]
                [--sequential_features SEQUENTIAL_FEATURES]
                [--pairwise_features PAIRWISE_FEATURES]
                [--target_output TARGET_OUTPUT] 
                [--train_list TRAIN_LIST] [--models_name MODELS_NAME]

optional arguments:
  -h, --help            show this help message and exit
  --learning_rate LEARNING_RATE
                        The learning rate of ADAM optimization.
  --maximum_epoch MAXIMUM_EPOCH
                        The maximum epoch of training
  --sequential_features SEQUENTIAL_FEATURES
                        The full path of sequential features of training data.
  --pairwise_features PAIRWISE_FEATURES
                        The full path of pairwise features of training data.
  --target_output TARGET_OUTPUT
                        The full path of the target outputs.
  --train_list TRAIN_LIST
                        The full path of the train list.
  --models_name MODELS_NAME
                        The name of models.

Start Training:

The input data of training is too large to upload. To get these data, please refer to 'Data Preprocess' and use the 'train_list' as preprocess_list, if any problem, contact [email protected].

Example of training:

  • Using default paramters to train:
    python3 train.py 
  • Using different paramters to train:
    python3 train.py --learning_rate=0.0005 --maximum_epoch=40 --sequential_features='input/casp_test/sequential_features' --pairwise_features='input/casp_test/pairwise_features' --target_output='target/casp_test' --train_list='train_list' --models_name='models'

Then you can obtain several models and save them in the your models_name directory.

Testing

Command line:

usage: test.py [-h] [--sequential_features SEQUENTIAL_FEATURES]
               [--pairwise_features PAIRWISE_FEATURES]
               [--target_output TARGET_OUTPUT] [--feature_list FEATURE_LIST]
               [--models_path MODELS_PATH]

optional arguments:
  -h, --help            show this help message and exit
  --sequential_features SEQUENTIAL_FEATURES
                        The full path of sequential features of test data.
  --pairwise_features PAIRWISE_FEATURES
                        The full path of pairwise features of test data.
  --target_output TARGET_OUTPUT
                        The full path of the target outputs.
  --feature_list FEATURE_LIST
                        The full path of the test list.
  --models_path MODELS_path
                        parent path of your models.

Usage:

The CASP13 test set has been included in the repository, so you can test our model's perpormance easily.

  • Using default paramters to test:
    python3 test.py
  • Using different paramters test
    python3 test.py --sequential_features='input/casp_test/sequential_features' --pairwise_features='input/casp_test/pairwise_features' --target_output='target/casp_test' --train_list='train_list' --models_path='models'

Cite

If you find this work useful in your research, please consider citing the paper:
"To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map" https://pubs.acs.org/doi/10.1021/acs.jcim.9b00438

Contact

[email protected]or [email protected]

sprof's People

Contributors

yuedongyang avatar sysucs avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.