Giter VIP home page Giter VIP logo

abdulsalam-bande / swifty Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 8.49 MB

This is a work to improve molecular docking speed. Normally docking a ligand on a target protein is done with some very complex functions and it is often slow. This work uses Neural Networks to model ligands on target proteins to measure whether they are active or not.

Python 99.92% Jupyter Notebook 0.08%
machine-learning bioinformatics protein ligand docking natural-language-processing deep-learning neural-network python pytorch

swifty's Introduction

Swift Dock ๐Ÿš€

In this study, we explored various machine learning (ML) models to forecast docking scores of ligands for specific target proteins, aiming to reduce the need for extensive docking calculations. Our primary goal? Find a regression model that can determine the docking scores of ligands from a chemical library in relation to a target protein. We achieve this with data from explicit docking of a select few molecules.

Among the ML models:

  • ๐Ÿง  An LSTM-based Neural Network (common in Natural Language Processing tasks like speech recognition). Combined with an attention mechanism, it effectively extracts ligand data. We used Pytorch for this.
  • ๐ŸŒณ Models like XGBoost, Decision Tree Regression, and Stochastic Gradient Descent from libraries like XGBoost and scikit-learn.

Setting up the Environment ๐Ÿ› ๏ธ

  1. Ensure Python 3.7 is installed ๐Ÿ
  2. Create a virtual environment and execute pip install -r requirements.txt ๐Ÿ“ฆ
  3. Navigate to 'swifty' and run sudo chmod -R 777 logs ๐Ÿ“‘

Setting up the Environment - Apple Silicon ๐ŸŽ

  1. Ensure Python 3.8 is installed ๐Ÿ
  2. Create a virtual environment and execute pip -r apple-silcon-requirements.txt ๐Ÿ“ฆ
  3. Navigate to 'swifty' and run sudo chmod -R 777 logs ๐Ÿ“‘

Training Using LSTM ๐Ÿง 

Build & Validate ๐Ÿ› ๏ธ

  1. Add your target to the 'dataset' folder. Follow the format in sample_input.csv.
  2. Example: Lets say you want to train the lstm model for sample_input for mac descriptor and a training set size of 50 without cross validation. First, Navigate to src/models and run the below command. Note: All possible descriptors are mac, onehot, and morgan_onehot_mac:

Command

python main_lstm.py --input sample_input --descriptors mac --training_sizes 50 --cross_validation False 

Command Format

python main_lstm.py --input <YOUR_INPUT_FILE> --descriptors <DESCRIPTOR> --training_sizes <TRAINING_SIZE> --cross_validation <CROSS_VALIDATION> 

This will produce a result directory with 5 categories. Each file follows the format: lstm_target_descriptor_training_size.

  • project_info: Details like training size and durations.
  • serialized_models: Trained model post-training.
  • test_predictions: Each docking score and corresponding model prediction.
  • testing_metrics: Metrics such as R-squared, mean absolute error from testing.
  • validation_metrics: Metrics from 5-fold cross-validation (only if --cross_validation True).

More examples

  1. Training Using Multiple Descriptors
python main_lstm.py --input sample_input --descriptors mac morgan_onehot_mac --training_sizes 50 --cross_validation False 
  1. Training Using Multiple Descriptors and Multiple Training set sizes
python main_lstm.py --input sample_input --descriptors mac morgan_onehot_mac --training_sizes 50 100 --cross_validation False 
  1. Training Using Multiple Descriptors, Multiple Training set sizes and Multiple Targets
python main_lstm.py --input sample_input sample_input_2 --descriptors mac morgan_onehot_mac --training_sizes 50 100 --cross_validation False 

Making Predictions with LSTM ๐ŸŽฏ

Run

python lstm_inference.py --input_file <YOUR_INPUT_FILE> --output_dir <YOUR_OUTPUT_DIRECTORY> --model_name <YOUR_MODEL_NAME>

Ensure than <YOUR_INPUT_FILE> follows the format of molecules_for_prediction.csv in the 'dataset' folder. Example

python lstm_inference.py --input_file molecules_for_prediction.csv --output_dir prediction_results --model_name lstm_target_mac_50_model.pt

Training Using other models (from scikit-learn) ๐ŸŒณ

  1. Add your target to the 'dataset' folder. It should match the format of sample_input.csv
  2. Run this command to prepare the dataset

Example

python create_fingerprint_data.py --input sample_input --descriptors mac

Command Format

python create_fingerprint_data.py --input <YOUR_INPUT_FILE> --descriptors <DESCRIPTOR>

More examples For creating the datasets

Crate dataset for training using Multiple Descriptors

python create_fingerprint_data.py --input sample_input --descriptors mac morgan_onehot_mac
  1. Run this to train
python main_ml.py --input sample_input --descriptors mac --training_sizes 50 --regressor sgreg

Command Format

python main_ml.py --input <YOUR_INPUT_FILE> --descriptors <DESCRIPTOR> --training_sizes  <TRAINING_SIZE> --regressor  <REGRESSOR>

Note: All possible descriptors are mac, morgan_onehot_mac and onehot. All possible regressors are sgreg, xgboost and decision_tree

More examples

  1. Training Using Multiple Descriptors
python main_ml.py --input sample_input --descriptors mac  morgan_onehot_mac --training_sizes 50 --regressor sgreg
  1. Training Using Multiple Descriptors and Multiple Training set sizes
python main_ml.py --input sample_input --descriptors mac morgan_onehot_mac --training_sizes 50 100 --regressor sgreg
  1. Training Using Multiple Descriptors, Multiple Training set sizes and Multiple Models
python main_ml.py --input sample_input --descriptors mac morgan_onehot_mac --training_sizes 50 100 --regressor sgreg xgboost

This will give you a result directory with similar categories and file formats as mentioned in the LSTM section.

Making Predictions with other Models ๐ŸŽฏ

  1. Your input CSV should match the format of molecules_for_prediction.csv in the 'dataset' folder.
  2. Run
python other_models_inference.py --input_file <YOUR_INPUT_FILE> --output_dir <YOUR_OUTPUT_DIRECTORY> --model_name <YOUR_MODEL_NAME>

Ensure than <YOUR_INPUT_FILE> follows the format of molecules_for_prediction.csv in the 'dataset' folder.

swifty's People

Contributors

abdulsalam-bande avatar badays avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

mahfila

swifty's Issues

log file missing

while running the command sudo chmod -R 777 logs i got an error mesage -No such file or directory.Can you please specify log file location?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.