Giter VIP home page Giter VIP logo

flag's Introduction

FLAG ICLR23

Molecule Generation For Target Protein Binding With Structural Motifs

Designing ligand molecules that bind to specific protein binding sites is a fundamental problem in structure-based drug design. Although deep generative models and geometric deep learning have made great progress in drug design, existing works either sample in the 2D graph space or fail to generate valid molecules with realistic substructures. To tackle these problems, we propose a Fragment-based Lig And Generation framework (FLAG), to generate 3D molecules with valid and realistic substructures fragment-by-fragment. In FLAG, a motif vocabulary is constructed by extracting common molecular fragments (i.e., motif) in the dataset. At each generation step, a 3D graph neural network is first employed to encode the intermediate context information. Then, our model selects the focal motif, predicts the next motif type, and attaches the new motif. The bond lengths/angles can be quickly and accurately determined by cheminformatics tools. Finally, the molecular geometry is further adjusted according to the predicted rotation angle and the structure refinement. Our model not only achieves competitive performances on conventional metrics such as binding affinity, QED, and SA, but also outperforms baselines by a large margin in generating molecules with realistic substructures.

📢 News

Install conda environment via conda yaml file

conda env create -f flag_env.yaml
conda activate flag_env

Datasets

Please refer to README.md in the data folder.

Dataset Preprocessing and motif vocab construction

python build_vocab.py

Training

python train.py

Sampling

python motif_sample.py

FLAG demo with checkpoints

Demo: https://huggingface.co/spaces/Zaixi/ICLR_FLAG

Checkpoints: https://drive.google.com/drive/folders/1NI-Tl7YzyMsfljEZXaTxbpuiO7lvUBt9?usp=drive_link

Generated Molecules for CrossDocked dataset

The generated molecular structures for 100 protein targets are stored in flag_gen.pt

The index file is test_index.pkl

Reference

@inproceedings{
zhang2023molecule,
title={Molecule Generation For Target Protein Binding with Structural Motifs},
author={ZAIXI ZHANG and Shuxin Zheng and Yaosen Min and Qi Liu},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=Rq13idF0F73}
}

flag's People

Contributors

minju-hits avatar zaixizhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

flag's Issues

how to solve the processing data problem

Hello author, thank you very much for your great work. When I run the file “train.py” and process data, there are often cases of skipping, skipping all data. Could you please give me a method to solve that?
1687763877753

dataset missing

hi, author, I'm still having trouble getting the trainer running.
I saw that in a closed issue, people say there're dataset files missing. But they remained missing till now.
And I've tried every way and looked through the codes, while I still cannot find or generate those files, including './data/cross docked_pocket10/index.pt', './data/pdbbind_pocket10/*', ''/n/holyscratch01/mzitnik_lab/zaixizhang/pdbbind_pocket10/index.pt''.
Please shed some light on me.

how to build the dataset

hi, thank you for nice work.
May i know how you build the dataset files like pdbbind_pocket10_xxx
Would you please share your code?

how to solve this training error?

Indexing: 1%|█ | 2200/166398 [00:20<25:42, 106.46it/s]
Traceback (most recent call last):
File "train.py", line 63, in
dataset, subsets = get_dataset(config=config.dataset, transform=transform, )
File "/data/zhouzihan/FLAG-main/utils/datasets/init.py", line 11, in get_dataset
dataset = PocketLigandPairDataset(root, *args, **kwargs)
File "/data/zhouzihan/FLAG-main/utils/datasets/pl.py", line 64, in init
self._precompute_name2id()
File "/data/zhouzihan/FLAG-main/utils/datasets/pl.py", line 132, in _precompute_name2id
data = self.getitem(i)
File "/data/zhouzihan/FLAG-main/utils/datasets/pl.py", line 153, in getitem
data = self.transform(data)
File "/home/ahmu/ENTER/envs/flag_env/lib/python3.8/site-packages/torch_geometric/transforms/compose.py", line 24, in call
data = transform(data)
File "/data/zhouzihan/FLAG-main/utils/transforms.py", line 471, in call
bfs_perm, bfs_focal = self.get_bfs_perm_motif(data['moltree'], self.vocab)
File "/data/zhouzihan/FLAG-main/utils/transforms.py", line 449, in get_bfs_perm_motif
node.wid = vocab.get_index(node.smiles)
File "/data/zhouzihan/FLAG-main/utils/mol_tree.py", line 24, in get_index
return self.vmap[smiles]
KeyError: 'C1CC2CCC(O1)O2'

motif_sample.py

I downloaded the source code and Checkpoints files.
I want to use the motif_sample.py, but it stops with the following error.

$ python motif_sample.py

[2024-03-24 01:11:33,984::sample::INFO] Namespace(config='./configs/sample.yml', data_id=1, device='cuda:0', num_workers=64, outdir='./outputs', vocab_path='vocab.txt')
[2024-03-24 01:11:33,984::sample::INFO] {'dataset': {'name': 'pl', 'path': './data/pdbbind_pocket10', 'split': './data/split_by_name.pt'}, 'model': {'checkpoint': './checkpoints/pretrained.pt', 'hidden_channels': 256, 'random_alpha': False}, 'sample': {'seed': 2024, 'num_samples': 100, 'num_retry': 5, 'max_steps': 12, 'batch_size': 10, 'num_workers': 4, 'n_samples': 5}}
[2024-03-24 01:11:33,984::sample::INFO] Loading data...
Segmentation fault (core dumped)

I checked the code in motif_sample.py and there is an error at line 507 where
data = testset[args.data_id]

Do you have any ideas to solve this?

Can you provide your pre-trained model?

Hello author, thank you very much for your great work. May I ask if it is convenient for you to provide your pre-trained model? I saw in the sample.yml that it is the file "./pretrained/model.pt". If you could provide this trained model file, I would be very grateful. Thank you.

enum_assemble not found

The error "cands = enum_assemble(self, neighbors)" on line 91 of the mol_tree.py file in the utils folder is occurring because the method enum_assemble is not defined. I also did not find any import statement for this method. Could you please let me know where I can find this method? Thank you.

Sampling Code Not Working

We are trying to reproduce the results from the original FLAG paper. We have been able to tweak the training code to make it work, but we still bump into some knotty issues during the sampling/generation stage. Following the original instructions from README.md, we start the sampling process by running the following command:

python motif_sample.py

The Python interpreter gives the following error:

Traceback (most recent call last):
  File "motif_sample.py", line 18, in <module>
    from models.maskfill import MaskFillModel
ModuleNotFoundError: No module named 'models.maskfill'

Then I came to realize: there is no such file as ./models/maskfill.py in the Github repo. I googled for the file and found a file with the same name in the 3DSBDD repo (https://github.com/luost26/3D-Generative-SBDD/blob/main/models/maskfill.py). However, the class __init__() function arguments do not match. In motif_sample.py:412:

    model = MaskFillModel(
        ckpt['config'].model,
      	protein_atom_feature_dim=protein_featurizer.feature_dim,
        ligand_atom_feature_dim=ligand_featurizer.feature_dim,
        vocab=vocab,
        weight=weight,
    ).to(args.device)

The vocab and weight arguments are non-existent in the 3DSBDD version of maskfill.py. I assume the FLAG authors have made substantial changes in the maskfill.py file, but happen not to upload it to Github. I cannot proceed with my experiment reproduction beyond this point before the FLAG version of maskfill.py is uploaded.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.