Giter VIP home page Giter VIP logo

Comments (6)

tovrstra avatar tovrstra commented on June 21, 2024

Thanks for noticing these issues. The M END seems to be missing indeed.

I'm a bit confused with your second point. When connectivity information is present in the IOData instance, it should be dumped. Can you given an example showing this problem? That may clarify the issue.

from iodata.

FanwangM avatar FanwangM commented on June 21, 2024

Yeah, you make the point. I didn't make it clear. Because when connectivity information is not available, such as in XYZ file format, there would be no such information dumped to SDF file. So, I think we need to figure out a good way of generating the connectivity. I know open babel handles this very well.

@tovrstra

from iodata.

PaulWAyers avatar PaulWAyers commented on June 21, 2024

It's a bit of mission creep....we can dump connectivity when we have it, but defining it would be a utility external to IOData I think. It's implicit in GOpt, and we could use that to define connectivity to the extent we need it. Perhaps would require splitting off a utility from GOpt for connectivity.

from iodata.

tovrstra avatar tovrstra commented on June 21, 2024

I agree with Paul. IOData does (at least for now) not attempt to guess where bonds are because it goes beyond the original scope of reading and writing data.

If we decide to extend the scope, there should also be some discussion on how far we'd like to go. I'll try to make a few guesses. Just detecting connectivity (without trying to guess the types of bonds) can be done with relatively little code (~15 lines) and a table of covalent radii. For PDB files, that would be fine. However, not for the SDF format, because it also describes the type of bond to represent a Lewis structures. Trying to guess a Lewis structure from the connectivity is quite complex and existing algorithms tend to break on exotic molecules. (Even humans don't always agree.)Such an algorithm would go quite far beyond the scope of IOData. Openbabel, RDKit and OpenEye have advanced solutions for this. You can also try to use variations in bond length to detect the bond order, but that would require well-optimized geometries. Effects from level of theory, basis set or just internal strain may be enough to break the algorithm.

In any case, I'd suggest to fix one thing at a time. If you can make a PR fixing the M END issue, that would already be very welcome, irrespective of the connectivity discussion.

from iodata.

PaulWAyers avatar PaulWAyers commented on June 21, 2024

So to be clear, I wouldn't be averse to having a stand-alone utility that had the functionality:

  • Input: iodata instance without connectivity.
  • Output: iodata instance with connectivity generated by RDKit, OpenBabel, etc. Or even just covalent radii and interatomic distances.

I wouldn't want to include this in iodata (except maybe the last one) because it adds major external dependencies and goes beyond the simple mandate of iodata which is averse to "internal computation", and focused on actual input/output. Once one starts trying to duplicate RDKit and OpenBabel then one has gone down a whole different (very interest, but very difficult) rabbit hole.

It is a fascinating problem, though. I thought a little bit about the problem of generating atom/bond types this morning (for fun) and, wow, what a mess. Especially as we are interested in structures that are not necessarily equilibrium structures, coming up with anything sensible would be very difficult, except maybe for relatively simple organic compounds and inorganic molecules involving only elements from Groups 1,2, 16, 17, 18. Even in such easy cases, what one does with things like sulfur hexafluoride? One would almost need to run a semiempirical calculation (or minimal basis set HF) and then post-process the data to be reliable, and then one is really truly in the HORTON landscape, not merely IOData.

Also, (for now) iodata is mostly (not exclusively, obviously) focussed on ab initio quantum calculations, and having atom/bond types end up being mostly useful for molecular mechanics and some types of semiempirical calculations.

from iodata.

FanwangM avatar FanwangM commented on June 21, 2024

Thanks for the comments! @PaulWAyers @tovrstra
Given this problem is beyond the scope of our IOData, let's leave this for RDKit or OpenBabel.

It makes things very clear to time for now. I will fix the missing tag issues shortly and make a new PR.

from iodata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.