Giter VIP home page Giter VIP logo

smi2sdf3d's Introduction

smi2sdf3d

3D diverse conformers generation using rdkit

Algorithm and parameters come from:

@article{DBLP:journals/jcisd/EbejerMD12,
  author    = {Jean{-}Paul Ebejer and Garrett M. Morris and
               Charlotte M. Deane},
  title     = {Freely Available Conformer Generation Methods:
               How Good Are They?},
  journal   = {Journal of Chemical Information and Modeling},
  volume    = {52},
  number    = {5},
  pages     = {1146--1158},
  year      = {2012},
  url       = {https://doi.org/10.1021/ci2004658},
  doi       = {10.1021/ci2004658},
}

smi2sdf3d's People

Contributors

gedeck avatar unixjunkie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

smi2sdf3d's Issues

try rdkit's ETKDG

would allow to throw away the FF optimization pass, if I understand well

faster rmsd pruning

it might be possible to use the lower bound of rmsd to do the pruning:
if the lower bound is already over the threshold, then we don't need to compute the exact rmsd

proper parsing of CLI args

Hi Francois,

Comments and contributions are welcome.

Even auditing the code for correctness is welcome since it is
doing some scientific computation.

I have some stylistic comments.

  1. Why do you have the "RobustSmilesMolSupplier"?

The two differences I see from the built-in SMILES supplier are that (a) it doesn't skip the first line and (b) it gives access to the name even if the SMILES cannot be parsed.

However, titleLine=False handles case (a) and the return values are ignored if case (b) occurs.

  1. A function like "how_many_conformers()" is by convention more often named "get_num_conformers()".

  2. As a minor point, you use "n_confs" and "nb_rot_bonds" and "nprocs". You might want to be consistent.

  3. You might want to look into using argparse to parse your command-line. It would look something like:

import argparse

parser = argparse.ArgumentParser(
description="generate up to N low energy conformers from 2D input")
parser.add_argument("n_confs", metavar="N", type=int, help="number of conformers")
parser.add_argument("smiles_filename", metavar="input.smi")
parser.add_argument("sdf_filename", metavar="output.sdf")

def main(argv=None):
args = parser.parse_args(argv)
if args.N < 1:
parser.error("N must be positive")
n_confs = args.n_confs
smiles_filename = args.smiles_filename
sdf_filename = args.sdf_filename

if name == "main":
main()

This would also give you a "-h"/"--help" option.

If you do this then it's easy to switch to having a default value for N, and to allow people to specify a different value using "-n" or some other command-line option. Similarly, you can lets users set "rmsd_threshold" using "-r"/"--rmsd-threshold" and "nprocs" using "-j". It would look something like:

parser = argparse.ArgumentParser(
description="generate up to N low energy conformers from 2D input")
parser.add_argument("-n", "--n-confs", metavar="N", type=int, default=20,
help="number of conformers (default: 20)")
parser.add_argument("-r", "--rmsd-threshold", metavar="FLT", type=float, default=0.35,
help="minimum RMSD to distinguish two conformers")
parser.add_argument("-j", "--n-jobs", metavar="N", type=int, default=1,
help="number of jobs (default: 1)")
parser.add_argument("smiles_filename", metavar="input.smi")
parser.add_argument("sdf_filename", metavar="output.sdf")

However, I see that "nprocs" doesn't actually do anything at this point.

The "-j" option is short for "number of jobs" and is the command-line argument often used for this value.

  1. You can also write "while len(kept) < n_confs and len(conf_energies) > 0:" as "while len(kept) < n_confs and conf_energies:". The difference is "must have at least 1 energy" vs. "must have energies".

  2. You might consider moving the processing code into its own function so that people can import the code as a library and call the function directly:

def generate_conformers(reader, writer, n_confs=20, rmsd_threshold=0.35, verbose=True)
for name, mol in reader:
if mol is None:
continue
...
if verbose:
print("generating starting conformers ...")
...
writer.write(res, confId=cid)

def main():
...
with closing(Chem.SDWriter(output_sdf)) as writer:
reader = RobustSmilesMolSupplier(input_smi)
generate_conformers(reader, writer, n_confs, rmsd_threshold)

(I usually have a "--quiet" option in my tools to disable verbose progress reporting, which would pass in verbose=False.)

  1. You might consider renaming "rmsd_filter()" to "remove_neighbors()" because "rmsd_filter" is very generic.

  2. Also, if you change "(e, conf) = conf_energies.pop(0)" to "(e, ref_conf) = conf_energies.pop(0)" then it's more clear that you want to use the first conformer as the reference conformer.

Cheers,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.