sc8668 / rtmscore Goto Github PK

View Code? Open in Web Editor NEW

82.0 82.0 17.0 56.67 MB

License: MIT License

Python 99.44% Shell 0.56%

rtmscore's People

Contributors

Stargazers

Watchers

Forkers

milesyyh zzzzzx-1115 pmorerio dingluoxmu zchwang hanwww1122 confusedant theangle134 highdxy zwang0805 allenwang233 nireus-lgx rxdcpu1 piervitocreanza guydurant tianxialiufang

rtmscore's Issues

Additive terms in MDN outputs

RTMScore/RTMScore/model/model2.py

Line 531 in 79475c4

sigma = F.elu(self.z_sigma(C))+1.1

Hello, thanks for sharing such great code!

What is the meaning of the +1.1 and +1 in the the output for sigma and mu in the mixture density network? Is it some kind of prior knowledge you incorporate in the model? Or simply some numeric regularization?

Thanks in advance for you answer.

The AUC values presented in Table 6 and Figure 6A seem to be inconsistent.

The AUC value for RTMScore1 shown in Table 6 is 0.83, whereas in Figure 6A, the AUC appears to be below 0.8. These two results seem to be inconsistent.
Did I overlook any details?

Best regards.

data preprocess script for training on new dataset

Thank you for your nice work, I want to train the model on my own dataset, could you please provide me the script to build the graph from pdb files?

Some of the ligands in CASF core set cannot be read by RDKit successfully

Hi Chao Shen, thanks for your work! I tried to evaluate RTMScore on CASF benchmark and it turned out that some of the ligands in CASF core set cannot be read by RDKit successfully thus the proceeding steps could not be done. If I added sanitize=False to Chem.MolFromMol2Block and Chem.MolFromMolBlock in RTMScore/RTMScore/data/data.py line 149 and 153, RDKit could read those ligands but I noticed that this would cause a slight difference to the final prediction. So I wonder if you ever encountered and how you handle this problem. Thanks.

sanity checking on new target

hi Chao Shen,

i am creating another issue as this is different from the other one.

in your paper you mentioned that the performance varies depending on the protein. in real-world use-cases, we want to screen ligands against a specific target. so, there's uncertainty about whether the method will do well.

do you have any recommendations for how we can sanity check whether RTMScore can be used reliably for a new protein?

thanks

Pretrained models - question

Hi, what is the difference between the three pretrained models provided in the folder?

RTMScore的score的打分如何查看？

您好，感谢您开发和开源了RTMScore！我用RTMScore对我的pose进行rescore，现在要对选择打分最好的结果进行数据分析。我看结果都是正数，请问数值越大代表打分越好还是数值越小，代表打分越好？我是做筛选的，您方法部分看不懂，希望能得到您的回复。

Meaning of the distance threshold

RTMScore/RTMScore/model/utils.py

Line 525 in 79475c4

mdn = mdn[th.where(dist <= model.dist_threhold)[0]]

Hi, what is the meaning of the distance threshold in the loss? Does it implies that distances are modeled only in the 7 Angstrom range?
Just as a curiosity, what happens if that threshold is increased?

RTMScore/scripts/train_model.py

Line 34 in 79475c4

args["dist_threhold"] = 7.

Thanks in advance for your time in answering.
Best,
P.

环境问题

抱歉，确实整得有点崩溃，requirements_pip.txt因为我本地没那些文件，requirements_conda.txt会因为一些包找不到而不安装，所以我就根据sh报错来一个一个安装，最后都是一些版本之间冲突的问题，无法正常运行。请求一份可用的requirements.txt可以吗？感谢！确实想试一下RTM。

Graph files on zenodo

Hi, thank you again for sharing the code.
I am trying to use the train_model.py script along with the dataset downloaded from https://zenodo.org/record/6623202#.Y0aQwtJBxH4. The files required are not there, e.g. v2020_train_p.bin. Can you help on this?

Thank you in advance,

Pietro

How to Batch Process RTMScore for Multiple Models in PDB and SDF Files

Hello,I've been exploring the use of RTMScore due to its remarkable ability in scoring functions for protein-ligand docking. Currently, I'm trying to apply it to a project of mine. However, I'm facing a challenge. I have a protein PDB file and a ligand SDF file, each containing 3000 models, with each protein model corresponding to a ligand model.
The issue I'm encountering is the inefficiency in processing each model. Currently, I'm extracting each model as a temporary, individual protein-ligand system and then running RTMScore for scoring. This method is proving to be extremely time-consuming.
Is there a way to directly score all the models in the entire protein PDB and ligand SDF files without having to extract and score each model pair one by one? Any solution or method that could streamline this process and improve efficiency would be incredibly helpful.
Thank you for your assistance and looking forward to any suggestions you may have.

使用新的口袋和配体测试Score为0

您好，按照步骤可以完全复现源码里提供的测试蛋白1qkt及配体对应的打分结果，可是使用新的蛋白口袋文件和配体文件测试结果score却一直为0，能麻烦指导下可能的原因吗？非常感谢您这么有意义的工作！

Question about csaf2016_docking.py

In calculating the csaf2016_docking.py file, I'm confused as to whether only the protein-ligand complex scores in the CASF coreset and in the pdbbind-v2020 dataset are being calculated?

How should we preprocess .pdbqt files before using RTMScore?

We are now considering using RTMScore to rank differenct results given by AutoDock Vina (the dataset is PDBBind). Unfortunately, the output of Vina is always .pdbqt file, which cannot be directly treated as input of RTMScore.

So could you please give us advice on what we should do? We have already used Open Babel to convert .pdbqt to .sdf, but there are a lot of confusing bugs......

Thank you!

Could you provide raw data of pdbbind2020?

In the paper pdb are prepared by Schrodinger 2020, would you please provide the prepared raw pdb data to me? and the corresponding pdb id of train and validation split?

By the way, have you guys ever thought about the angle between ligand atom and residue atom? and specific non-covalent bond interaction, for instance, hydrogen-bond, salt-bridge, and hydrophobic interaction?

Thanks for your wonderful job.

casf对接和筛选能力

您好，感谢您开源RTMScore，他非常棒，在您的论文中看到它表现出很高的对接和筛选能力，我想在casf中复现您的结果，很遗憾，可能是我那里做错了，导致没有复现出您的结果。您能告诉我您对cas-2016f中的decoys_docking数据和decoys_screening数据做了哪些处理吗？如果您能提供你们对casf的测试数据，将不胜感激。谢谢。

rdkit cannot load some mol2 files

Dear Dr. Shen, thank you for sharing the source code of RTMScore. I’m trying to process decoys_docking set of CASF-2016, but I found that there are some mol2 files that cannot be successfully read by RDkit (fail to be sanitized and return None when executing “ligand_mol2 = Chem.MolFromMol2File(ligand_mol2_path, removeHs=True)”). Are there any solutions to solve this problem, or you just skipped those mol2 files that cannot be successfully loaded?

integrating w/ docking

hello chao shen,

i randomly chanced upon your repo last night and i read your paper. it is well written and i'm interested to explore your method.

have you explored the use of RTMScore to directly generate the docking pose (rather than using the docking program's method, eg monte-carlo sampling for Autodock Vina)?
alternatively, use RTMScore as direct scoring function instead of the docking program's scoring function? or perhaps some combination of the two scoring functions?

or do you think it is better to dock with the established programs and then use RTMScore to re-score the docked poses? my concern is the computation time since there's some redundant/duplicate work being done

thanks

error of create conda env

Hi there.
I have an issur of creating the conda env based on your docs.
It showed that many of dependent libraries are not found in conda repo.
If I create a new conda env and install libraries manulla, it is failed for example script cannot import RTMScore from the directory.

How you may drop me a line when you are avaiable.

Gracias.