-----------------------------------------------------------------------------------------------------------------
A deep learning method was developed to predict drug-likeness based on the graph convolutional attention network (D-GCAN) directly from molecular structures. The model combined the advantages of graph convolution and attention mechanism. D-GCAN is a promising tool to predict drug-likeness for selecting potential candidates and accelerating the process of drug discovery by excluding unpromising candidates and avoiding unnecessary biological and clinical testing.
The drug-likeness has been widely used as a criterion to distinguish drug-like molecules from non-drugs. Developing reliable computational methods to predict drug-likeness of candidate compounds is crucial to triage unpromising molecules and accelerate the drug discovery process.
conda install -c conda-forge rdkit
The Discussion folder contains the scripts for evaluating the classification performance. We compared sevaral common methods widely used in drug-likeness prediction, such as GNN,RF, CNN,SVC,and GPC.
If you want to retrain the model, please put the molecule's SMILES files in to data directory and run D-GCAN. The test set can be replaced by changing the path. It is recommended to retrain the model before predicting. The process will take less than 15 minutes. It is as simple as
import train
test = train.train('../dataset/bRo5.txt',
radius = 1,
dim = 52,
layer_hidden = 4,
layer_output = 10,
dropout = 0.45,
batch_train = 8,
batch_test = 8,
lr =3e-4,
lr_decay = 0.85,
decay_interval = 25,
iteration = 140,
N = 5000,
dataset_train='../dataset/data_train.txt')
If you want to make the prediction of druglikeness of unknown molecule, it can be made as follow
import predict
test = predict.predict('../dataset/bRo5.txt',
radius = 1,
property = True, #True if drug-likeness is known
dim = 52 ,
layer_hidden = 4,
layer_output = 10,
dropout = 0.45,
batch_train = 8,
batch_test = 8,
lr = 3e-4,
lr_decay = 0.85,
decay_interval = 25 ,
iteration = 140,
N = 5000)
or you can run run.py and modify the hyperparameters of the neural network to optimize the model .
The D-GCAN-screened GDB-13 database (S-GDB13) is a more drug-like database and can be used to find new drug candidates.
As described in paper, the prediction of drug-likeness was deeply influenced by the dataset, especially the negative set. If necessary, retrain the model on your dataset.
Jinyu Sun E-mail: [email protected]
@article{10.1093/bioinformatics/btac676,
author = {Sun, Jinyu and Wen, Ming and Wang, Huabei and Ruan, Yuezhe and Yang, Qiong and Kang, Xiao and Zhang, Hailiang and Zhang, Zhimin and Lu, Hongmei},
title = "{Prediction of Drug-likeness using Graph Convolutional Attention Network}",
journal = {Bioinformatics},
year = {2022},
month = {10}
}