Giter VIP home page Giter VIP logo

Comments (24)

peastman avatar peastman commented on May 19, 2024

I would strongly object to changing how we write PDB files. PDB is a well defined file format, and that includes atom and residue names. A file that uses different names is simply an invalid PDB file. The fact that AMBER can't read standard PDB files is a serious bug in their code, and they need to fix it.

I think all of us in this field need to be less tolerant of programs that produce invalid PDB files, because that's the only way we'll get them to change. For example, Gromacs used to use incorrect atom and residue names, but in recent versions they've changed to always write PDB files with the correct (standard) names. Let's try to get AMBER to do the same. That's the best thing for their users, and for the whole field.

from openmm.

jlmaccal avatar jlmaccal commented on May 19, 2024

in would tend to agree with Peter. These issues are annoying and would be great for us to not contribute to the problem.

I think a reasonable middle ground would be to write a simple conversion utility that can map between the two, rather than having OpenMM write non-standard files. I also have several cases where I would like to round trip things between Amber and OpenMM (in my case to get per-residue energies out of snapshots).

I think this:

Allowing PDBFile objects to be created from OpenMM Topology objects and position arrays (or State objects) would also be extremely useful.

would be great though.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

There are a number of programs that (ab)use the PDB format or mol2 format to store nonstandard package-specific information, and have done so for decades. Adopting an attitude of "fuck you" toward these programs with enormous established user bases will not be helpful to OpenMM.

Does anybody have a useful suggestion on how to actually interoperate with AMBER besides trying to convince the entire AMBER developer community to break all backwards-compatibility by only working with PDB version 3.30 conformant files? You're welcome to go to their next developer meeting and lobby for that, but this will not actually accomplish anything in the short term.

Any useful input on the other proposal that PrmtopFile and InpcrdFile could be extended to write AMBER-format input files? Note that this is not an invitation to rant about how they should be using XML instead, but rather a request for some constructive discussion to come up with a way in which we can implement interoperability within the framework, goals, and ideals of OpenMM.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

Also, I will note that PDBFile was built specifically with the flexibility to use XML-defined translation tables. This could be very simply modified to write AMBER-naming-convention-conformant files with a simple extension of the interface. Yes, it could spit out an annoying warning that the resulting PDB file is not compliant with the PDB version 3.30 standard now universally adopted. Yes, this could only be accessed through an optional argument. But it appears that it would be straightforward to allow this. I don't consider this "contributing to the problem", but merely "an interim solution until the larger problem can be rectified".

from openmm.

rmcgibbo avatar rmcgibbo commented on May 19, 2024

I don't want to take a position either way here, but from an implementation perspective, one option is to look at the csv library in the python stdlib. csv is a format which many programs read and write, all using slightly different formats (unlike PDB, there is no actual csv standard). The python csv library has this "dialect" mechanism for trying to deal with this problem. PEP305 also has some high level discussion on the dialect issue.

Also, it is possible to write a PDB using PDBFile from a Topology and positions array, already, using PDBFile.writeModel. The code in PDBReporter already does this. If there was interest, I think PDBFile could be re-factored to make this a little easier, by making PDBFile act like a "file-like" object, supporting the PEP343 context manager protocol, but that is a subject for a separate discussion. The biggest change would be that writeModel would not be a staticmethod.

from openmm.

peastman avatar peastman commented on May 19, 2024

No one suggested that AMBER should "break all backwards-compatibility". They certainly should continue to read the files generated by previous versions of their program. But that's no reason to not also support standard PDB files. It has been over 6 years since PDB 3.0 was released, and it's absurd that they still haven't added support for it. AMBER can't even handle files downloaded directly from RCSB. They can't expect everyone else in the field to implement workarounds for their bugs.

What would be the reason for creating a PDBFile object from a topology and set of coordinates? What would you do with that object?

Writing inpcrd files would be pretty easy, so that would be a reasonable feature. But wouldn't mdcrd be more useful? Writing prmtop files would be a lot more work, but it's also a possibility.

Also, I will note that PDBFile was built specifically with the flexibility to use XML-defined translation tables

It uses that when reading, not writing. We try as much as possible to deal with whatever files we come across and convert them into something more standard. This is a widely recognized principle in software engineering whenever you need to communicate using a standardized protocol: be as tolerant as possible in the input you accept, and be as strict as possible in the output you generate.

from openmm.

VijayPande avatar VijayPande commented on May 19, 2024

How about we define an AMBER PDB file type and have a (maybe derived) class for writing that? It seems that this really is a file type to itself.

Thanks,

Vijay

Sent from my phone. Sorry for any brevity or unusual tone.

On Jun 17, 2013, at 6:22 PM, John Chodera [email protected] wrote:

There are a number of programs that (ab)use the PDB format or mol2 format to store nonstandard package-specific information, and have done so for decades. Adopting an attitude of "fuck you" toward these programs with enormous established user bases will not be helpful to OpenMM.

Does anybody have a useful suggestion on how to actually interoperate with AMBER besides trying to convince the entire AMBER developer community to break all backwards-compatibility by only working with PDB version 3.30 conformant files? You're welcome to go to their next developer meeting and lobby for that, but this will not actually accomplish anything in the short term.

Any useful input on the other proposal that PrmtopFile and InpcrdFile could be extended to write AMBER-format input files? Note that this is not an invitation to rant about how they should be using XML instead, but rather a request for some constructive discussion to come up with a way in which we can implement interoperability within the framework, goals, and ideals of OpenMM.


Reply to this email directly or view it on GitHub.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

AMBER can't even handle files downloaded directly from RCSB

Nor can OpenMM, mind you. The OpenMM app can't handle anything with chain breaks or missing heavy atoms, which is most of the content of the RCSB.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

What would be the reason for creating a PDBFile object from a topology and set of coordinates? What would you do with that object?

Right now, PDBFile provides a means of populating its internal data structures by loading data from a PDB file. You can also write PDB files from a topology and positions combination. But if you start with topology and positions, the only way to get this information into a PDBFile is to write a PDB file to disk and re-read it into PDBFile.

Currently, the only thing you can do with the PDBFile object is retrieve topology, positions, and number of frames, so your'e right, it wouldn't be that useful to add this functionality unless the API was expanded to allow manipulating the contents. Right now, I am abusing PDBFile by manipulating the internal PdbStructure contents to try to write out AMBER-compatible files, but this is not a good general solution---hence my query here about how best to allow OpenMM to interoperate with AMBER.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

Also, I will note that PDBFile was built specifically with the flexibility to use XML-defined translation tables

It uses that when reading, not writing. We try as much as possible to deal with whatever files we come across and convert them into something more standard. This is a widely recognized principle in software engineering whenever you need to communicate using a standardized protocol: be as tolerant as possible in the input you accept, and be as strict as possible in the output you generate.

This is reasonable, but we still need a way to interoperate with AMBER.

The python csv library has this "dialect" mechanism for trying to deal with this problem. PEP305 also has some high level discussion on the dialect issue.

This is one possible solution: Allowing PDBFile to speak different "dialects" or write files with different "flavors". OpenEye's OEChem tools use this scheme to interoperate with different programs by writing PDB or mol2 files in different "flavors", for example. This violates the strictness-of-output principle stated above, however.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

Writing inpcrd files would be pretty easy, so that would be a reasonable feature. But wouldn't mdcrd be more useful?

I believe you need inpcrd files to initiate AMBER simulations using sander. mdcrd files are formatted text trajectories with degraded precision generated as output only, intended for analysis and not restarting simulations. I do hope to add that functionality to the MDTraj reader @rmcgibbo has been working on at some point.

Writing prmtop files would be a lot more work, but it's also a possibility.

This appears to be the only available route for bidirectional compatibility with AMBER, but has several drawbacks. Notably, inpcrd/prmtop files cannot be re-read into LEaP, so there will be no way to manipulate the system in AMBER. For example, if one wanted to simulate a protein in implicit solvent in OpenMM and then load this into AMBER for explicit solvent simulation, the system would have to be solvated in a cubic box in OpenMM and written as an inpcrd/prmtop pair, rather than the truncated octahedron preferred in LEaP for simulation in AMBER.

For extending AmberInpcrdFile, we could choose at least three possible options:
A. Following PDBFile, add a @staticmethod to write an inpcrd file:

writeFile(positions, file=sys.stdout, boxVectors=None, title=None)

B. Modify the constructor to allow construction from an input file or construction from positions/boxVectors:

def __init__(self, file=None, loadVelocities=False, loadBoxVectors=False, positions=None, boxVectors=None, title=None)

C. Create an AmberInpcrdFileFactory class that generates an AmberInpcrdFile class via static methods:

@staticmethod
generateAmberInpcrdFile(positions, boxVectors=None, title=None)

For extending AmberPrmtopFile, we could also choose at least three options:
A. Following PDBFile, add a @staticmethod to write an prmtop file:

   writeFile(system, topology, file=sys.stdout)

Note that not all System objects can be written as AMBER prmtop files. An exception should be thrown if the System object contains forces that cannot be represented in AMBER prmtop files.
B. Modify the constructor to allow construction from an input file or construction from system/topology objects:

   def __init__(self, file=None, system=None, topology=None)

C. Create an AmberPrmtopFileFactory class that generates an AmberPrmtopFile class via static methods:

   @staticmethod
   generateAmberPrmtopFile(system=None, topology=None)

I'm happy to do the coding here (if desired) since I need this interoperability ASAP, but I'd very much like some useful feedback on the most consistent API and implementation so that this would actually be useful to others.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

How about we define an AMBER PDB file type and have a (maybe derived) class for writing that? It seems that this really is a file type to itself.

This could be a workable approach to writing AMBER-compatible PDB files, though it may lead to what might be considered unnecessary duplication of code paths.

from openmm.

mjw99 avatar mjw99 commented on May 19, 2024

Writing inpcrd files would be pretty easy, so that would be a reasonable feature. But wouldn't mdcrd be more useful?

Apologies for the threadjack, but I recently wrote a very basic Python tool for outputting mdcrds to AMBER's netcdf file format (given a prmtop). It can be found here: https://bitbucket.org/mjw99/amberopenmmutils
Feel free to take/incorporate/improve.

I think the ability to write prmtop files would be very useful, and I've even considered writing such code myself, but I think it is very difficult to get right. This is because the format (and leap) has quirks and quite a few esoteric aspects that I've been burnt with in the past. In addition, I think AMBER should have migrated to an XML format for this a while ago, however, there seems to be an internal resistance to this.

That said, the documentation on the parm file format here, http://ambermd.org/formats.html is useful, but recently this has been augmented by Jason Swails http://archive.ambermd.org/201305/att-0256/prmtop.pdf and is more useful.
Also, I know Swails has written a number of python utilities in Python (Parmed.py) for manipulating prmtop files. These are in AmberTools and might be a useful starting point for a parm generator, John.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

Converting OpenMM System+Topology into a prmtop will indeed be challenging because atom, bond, and angle types must be extracted from a System object, and a number of checks must be made before it can be concluded the System can even be correctly represented in a prmtop file.

This is certainly orders of magnitude more difficult than translating atom names to what AMBER expects in a PDB file.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

In fact, it may be simpler to enable Forcefield write an Amber prmtop file, effectively replacing the AmberTools path for setting up systems. We could still load in correct PDB files generated by OpenMM (potentially including solvent), assign parameters from any forcefield OpenMM supports, and write prmtop files (for those forcefields that support the limitations of prmtop files).

Peter, what would you think about this? It avoids the issues you were concerned about.

from openmm.

peastman avatar peastman commented on May 19, 2024

I think it would help to back up for a moment and decide what use cases we're trying to support. "Interoperability" can mean a lot of things. Some examples include

  • Build a system with AmberTools, simulate it with both OpenMM and AMBER, compare the results.
  • Build a system with OpenMM, simulate it with AMBER.
  • Build a system with OpenMM, load it with tleap for further modeling, simulate in OpenMM and/or AMBER.
  • Run a simulation in OpenMM, analyze the trajectory with AmberTools (which particular programs?)

What are the particular things we want to enable?

from openmm.

rmcgibbo avatar rmcgibbo commented on May 19, 2024

On the API-design part of this thread, I want to cast my -1 against writing any more @staticmethods. And -2 on classes named SomethingFactory that are a collection of @staticmethods. This isn't java. I realize that PDBFile uses @staticmethods for writing (writeFile, writeHeader, writeModel), but I think that is a suboptimal design that should not be replicated.

In my opinion, the APIs for file like objects should mirror the python file builtin to the extent possible, as well as other established file-mapped objects in the scientific python ecosystem like tables.File and netCDF4.Dataset. This means that the constructor should take a mode (usually either read, r, or write, q, but some APIs might support append too), e.g. def __init__(self, filename, mode='r', ...). Both reading and writing instance methods -- not static methods -- should be provided on the class. The class should also have an __enter__(self) and __exit__(self, *exc_info) to support PEP343-style access.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

All of the use cases @peastman suggested are useful. I think it should mainly be a matter of prioritizing.

My suggested priority order is:

  1. Build a system with AmberTools, simulate it with both OpenMM and AMBER, compare the results.
  2. Build a system with OpenMM, simulate it with AMBER.
  3. Build a system with OpenMM, load it with tleap for further modeling, simulate in OpenMM and/or AMBER.
  4. Run a simulation in OpenMM, analyze the trajectory with AmberTools (which particular programs?)

We have the functionality for (1) already via the current AmberPrmtopFile/AmberInpcrdFile functionality.

My most pressing need is to enable (2) in some manner: The ability to take a system from OpenMM to be simulated in AMBER. The reason for this is that we have built a modeling pipeline that does a lot of things in OpenMM that would be difficult to do in other codes (adjusting number of waters, modifying protonation states, refinement in implicit and explicit solvent) and now need to prepare input files for the AMBER GPU port to run straightforward MD simulations on leadership computing resources (e.g. large chunks of Blue Waters or Titan). Note that this is only because AMBER currently has a speed advantage for straightforward MD over OpenMM.

(2) is currently not possible because we cannot easily set up an equivalent system in AMBER due to the renaming of residues and atoms; you can't simply read in a PDB file generated by OpenMM into LEaP. "Fixing" LEaP to read PDB v3.30 files and detect protonation states would require a great deal of effort, effectively implementing the same system used in the OpenMM app into LEaP. I've proposed a few ideas already in this thread about how #2 could be achieved (allow writing of AMBER-flavored PDB files; allow writing of prmtop/inpcrd files from system/topology; allow Forcefield to write AMBER prmtop/inpcrd files directly).

(3) would allow more flexibility than #2, but at the cost of the extra hassle of having to go through LEaP before running AMBER's sander directly. Still, this would be desirable, especially if OpenMM was used for extensive implicit solvent modeling before moving systems into AMBER for explicit solvent simulation.

(4) would also be useful to some users due to the currently limited analysis tools in OpenMM. The main analysis tool would likely be AmbertTools ptraj, and AMBER format NetCDF files could be generated by @rmcgibbo's MDTraj project. This would also require either a prmtop file.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

On the API-design part of this thread, I want to cast my -1 against writing any more @staticmethods. And -2 on classes named SomethingFactory that are a collection of @staticmethods. This isn't java. I realize that PDBFile uses @staticmethods for writing (writeFile, writeHeader, writeModel), but I think that is a suboptimal design that should not be replicated.

@rmcgibbo : Can you give an example of how AmberPrmtopFile and AmberInpcrdFile would be refactored to conform to this design? It's not that I don't think this is a good idea---I think I'm just in need of some code use example with the proposed refactoring.

In my opinion, the APIs for file like objects should mirror the python file builtin to the extent possible, as well as other established file-mapped objects in the scientific python ecosystem like tables.File and netCDF4.Dataset. This means that the constructor should take a mode (usually either read, r, or write, q, but some APIs might support append too), e.g. def init(self, filename, mode='r', ...). Both reading and writing instance methods -- not static methods -- should be provided on the class. The class should also have an enter(self) and exit(self, *exc_info) to support PEP343-style access.

Same query here.

from openmm.

peastman avatar peastman commented on May 19, 2024

It sounds to me like writing inpcrd and prmtop files is the best way to go. Some aspects of that will be challenging - what do we do if you have a CustomGBForce? - but a large fraction of systems created by ForceField should be straightforward to translate. Mark, can you describe the particular issues you ran into with writing prmtop files?

We could definitely add a write() method to PDBFile. Note, though, that those static methods were not created arbitrarily! PDBReporter needs to write a file in pieces without ever having all the data in memory at once. It writes the header when you begin the simulation, writes a new model at each report interval, and writes the footer at the end of the simulation. I could have made those into instance methods, but then we would have mixed up the APIs for reading and writing PDB files in a confusing way.

from openmm.

mjw99 avatar mjw99 commented on May 19, 2024

Mark, can you describe the particular issues you ran into with writing prmtop files?

Hi Peter, apologies for the verbose reply, but these were the main issues I found/was aware of, when I was considering writing a prmtop file generator:

  1. The lack of tools within Python for dealing with Fortran format. The formatting of values associated with a %FLAG, MUST adhere to the fortran format specified in its paired %FORMAT. Associated values are very easy to write and parse if you’re using Fortran, but, perversely, I have failed to find a decent Fortran format writer/parser library in any other language.The closest I’ve seen to this is FortranFormat in Java. What I’m getting at here, is that you will probably need to write a robust Fortran format parser/writer in python before you start.

  2. The required set of %FLAG / %FORMAT pairs in a prmtop is not defined. Additional pairs have been added over time, but this not clearly documented or reflected in the %VERSION flag. Conversely, there are some FLAGS that have (in my experience) no use at all, e.g TREE_CHAIN_CLASSIFICATION. A corollary to this is that there is not a fixed number of entries in the the POINTERS list; future values are added onto the end.

  3. The ordering of the %FLAG / %FORMAT pairs in a prmtop is not defined and can be a function of tool that generated it, however, generally, the POINTERS flag must be near the start since that contains information about the number of entries in other flags.

  4. The use of negative values in DIHEDRALS_{INC,WITHOU}_HYDROGEN, on atom indices 3 and 4 to indicate an ignored 1-4 interaction and an improper dihedral is esoteric and confusing. If your output formatting is off, then point 1) comes into play and the sign may not be parsed properly. Also, I don’t completely understand LEaP’s criteria for the sign on index 3.

  5. The ATOM_NAME entries are in theory case sensitive; lowercase is used to indicate GAFF atom types, i.e. CA != ca. The point at which their case sensitivity plays a role (i.e. during the parsing of parm*.dat) has probably passed by the time a prmtop file has been generated, however, it is something to be aware of. I recall using MMTK once to test a system with GAFF atoms and this had issues since it normalised all atomtypes internally to uppercase.

  6. The %VERSION field value seems to have no meaning or use.

  7. The values associated with the flags RADII and SCREEN are a function of the ASCII value in RADIUS_SET.

  8. The logic associated with EXCLUDED_ATOMS_LIST is difficult and tightly coupled to the NUMBER_EXCLUDED_ATOMS list.

  9. The %COMMENT flag is valid (and should be used more often), but some tool do not accept it.

  10. Support for CHARMM topologies requires extra flags (i.e. via CHAMBER) and changes in the formatting specification of existing flags; I would not attempt to support that aspect.

P.S. (re my point 1; a quick google has turned up this, which I was not aware of)

from openmm.

jchodera avatar jchodera commented on May 19, 2024

Writing AMBER prmtop files will definitely be challenging, mainly because the atom types need to be automatically determined from the Force objects. Regarding @mjw99 point (5), the atom type names themselves I don't think are relevant in the internals of AMBER.

The best strategy to generate a reasonable prmtop file may be to use the information in the XML forcefield specification (either directly or via ForceField) to try to reconstruct what the atom types should be and which atoms are assigned which types, but this seems complex. It may be easier to have ForceField generate prmtop files directly from a topology, instead of trying to go from the information inside a System object.

The AmberTools distribution contains a Python tool for reading/writing prmtop files (from Jason Swails) that I had not previously been aware of:

amber12/bin/chemistry/amber/readparm.py

I suggest we make use of this tool to read/write prmtop files for this purpose, since this will presumably continue to be supported by the AMBER developers.

The tool provides a means of building the arcane formatted contents of the prmtop file from structures that reflect the underlying topology and parameter sets (or vice-versa). I think it can also support "chamber" CHARMM-in-AMBER files.

from openmm.

peastman avatar peastman commented on May 19, 2024

Thanks Mark, that's great information to have. And I'll take a look at the AmberTools code.

from openmm.

jchodera avatar jchodera commented on May 19, 2024

I think we're doing much better with Amber interoperability thanks to @swails and ParmEd.

from openmm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.