Comments (24)
I would strongly object to changing how we write PDB files. PDB is a well defined file format, and that includes atom and residue names. A file that uses different names is simply an invalid PDB file. The fact that AMBER can't read standard PDB files is a serious bug in their code, and they need to fix it.
I think all of us in this field need to be less tolerant of programs that produce invalid PDB files, because that's the only way we'll get them to change. For example, Gromacs used to use incorrect atom and residue names, but in recent versions they've changed to always write PDB files with the correct (standard) names. Let's try to get AMBER to do the same. That's the best thing for their users, and for the whole field.
from openmm.
in would tend to agree with Peter. These issues are annoying and would be great for us to not contribute to the problem.
I think a reasonable middle ground would be to write a simple conversion utility that can map between the two, rather than having OpenMM write non-standard files. I also have several cases where I would like to round trip things between Amber and OpenMM (in my case to get per-residue energies out of snapshots).
I think this:
Allowing PDBFile objects to be created from OpenMM Topology objects and position arrays (or State objects) would also be extremely useful.
would be great though.
from openmm.
There are a number of programs that (ab)use the PDB format or mol2 format to store nonstandard package-specific information, and have done so for decades. Adopting an attitude of "fuck you" toward these programs with enormous established user bases will not be helpful to OpenMM.
Does anybody have a useful suggestion on how to actually interoperate with AMBER besides trying to convince the entire AMBER developer community to break all backwards-compatibility by only working with PDB version 3.30 conformant files? You're welcome to go to their next developer meeting and lobby for that, but this will not actually accomplish anything in the short term.
Any useful input on the other proposal that PrmtopFile and InpcrdFile could be extended to write AMBER-format input files? Note that this is not an invitation to rant about how they should be using XML instead, but rather a request for some constructive discussion to come up with a way in which we can implement interoperability within the framework, goals, and ideals of OpenMM.
from openmm.
Also, I will note that PDBFile
was built specifically with the flexibility to use XML-defined translation tables. This could be very simply modified to write AMBER-naming-convention-conformant files with a simple extension of the interface. Yes, it could spit out an annoying warning that the resulting PDB file is not compliant with the PDB version 3.30 standard now universally adopted. Yes, this could only be accessed through an optional argument. But it appears that it would be straightforward to allow this. I don't consider this "contributing to the problem", but merely "an interim solution until the larger problem can be rectified".
from openmm.
I don't want to take a position either way here, but from an implementation perspective, one option is to look at the csv
library in the python stdlib. csv
is a format which many programs read and write, all using slightly different formats (unlike PDB, there is no actual csv standard). The python csv library has this "dialect" mechanism for trying to deal with this problem. PEP305 also has some high level discussion on the dialect issue.
Also, it is possible to write a PDB using PDBFile
from a Topology
and positions array, already, using PDBFile.writeModel
. The code in PDBReporter
already does this. If there was interest, I think PDBFile
could be re-factored to make this a little easier, by making PDBFile
act like a "file-like" object, supporting the PEP343 context manager protocol, but that is a subject for a separate discussion. The biggest change would be that writeModel
would not be a staticmethod.
from openmm.
No one suggested that AMBER should "break all backwards-compatibility". They certainly should continue to read the files generated by previous versions of their program. But that's no reason to not also support standard PDB files. It has been over 6 years since PDB 3.0 was released, and it's absurd that they still haven't added support for it. AMBER can't even handle files downloaded directly from RCSB. They can't expect everyone else in the field to implement workarounds for their bugs.
What would be the reason for creating a PDBFile object from a topology and set of coordinates? What would you do with that object?
Writing inpcrd files would be pretty easy, so that would be a reasonable feature. But wouldn't mdcrd be more useful? Writing prmtop files would be a lot more work, but it's also a possibility.
Also, I will note that PDBFile was built specifically with the flexibility to use XML-defined translation tables
It uses that when reading, not writing. We try as much as possible to deal with whatever files we come across and convert them into something more standard. This is a widely recognized principle in software engineering whenever you need to communicate using a standardized protocol: be as tolerant as possible in the input you accept, and be as strict as possible in the output you generate.
from openmm.
How about we define an AMBER PDB file type and have a (maybe derived) class for writing that? It seems that this really is a file type to itself.
Thanks,
Vijay
Sent from my phone. Sorry for any brevity or unusual tone.
On Jun 17, 2013, at 6:22 PM, John Chodera [email protected] wrote:
There are a number of programs that (ab)use the PDB format or mol2 format to store nonstandard package-specific information, and have done so for decades. Adopting an attitude of "fuck you" toward these programs with enormous established user bases will not be helpful to OpenMM.
Does anybody have a useful suggestion on how to actually interoperate with AMBER besides trying to convince the entire AMBER developer community to break all backwards-compatibility by only working with PDB version 3.30 conformant files? You're welcome to go to their next developer meeting and lobby for that, but this will not actually accomplish anything in the short term.
Any useful input on the other proposal that PrmtopFile and InpcrdFile could be extended to write AMBER-format input files? Note that this is not an invitation to rant about how they should be using XML instead, but rather a request for some constructive discussion to come up with a way in which we can implement interoperability within the framework, goals, and ideals of OpenMM.
—
Reply to this email directly or view it on GitHub.
from openmm.
AMBER can't even handle files downloaded directly from RCSB
Nor can OpenMM, mind you. The OpenMM app can't handle anything with chain breaks or missing heavy atoms, which is most of the content of the RCSB.
from openmm.
What would be the reason for creating a PDBFile object from a topology and set of coordinates? What would you do with that object?
Right now, PDBFile
provides a means of populating its internal data structures by loading data from a PDB file. You can also write PDB files from a topology
and positions
combination. But if you start with topology
and positions
, the only way to get this information into a PDBFile
is to write a PDB file to disk and re-read it into PDBFile
.
Currently, the only thing you can do with the PDBFile
object is retrieve topology, positions, and number of frames, so your'e right, it wouldn't be that useful to add this functionality unless the API was expanded to allow manipulating the contents. Right now, I am abusing PDBFile
by manipulating the internal PdbStructure
contents to try to write out AMBER-compatible files, but this is not a good general solution---hence my query here about how best to allow OpenMM to interoperate with AMBER.
from openmm.
Also, I will note that PDBFile was built specifically with the flexibility to use XML-defined translation tables
It uses that when reading, not writing. We try as much as possible to deal with whatever files we come across and convert them into something more standard. This is a widely recognized principle in software engineering whenever you need to communicate using a standardized protocol: be as tolerant as possible in the input you accept, and be as strict as possible in the output you generate.
This is reasonable, but we still need a way to interoperate with AMBER.
The python csv library has this "dialect" mechanism for trying to deal with this problem. PEP305 also has some high level discussion on the dialect issue.
This is one possible solution: Allowing PDBFile to speak different "dialects" or write files with different "flavors". OpenEye's OEChem tools use this scheme to interoperate with different programs by writing PDB or mol2 files in different "flavors", for example. This violates the strictness-of-output principle stated above, however.
from openmm.
Writing inpcrd files would be pretty easy, so that would be a reasonable feature. But wouldn't mdcrd be more useful?
I believe you need inpcrd
files to initiate AMBER simulations using sander
. mdcrd
files are formatted text trajectories with degraded precision generated as output only, intended for analysis and not restarting simulations. I do hope to add that functionality to the MDTraj reader @rmcgibbo has been working on at some point.
Writing prmtop files would be a lot more work, but it's also a possibility.
This appears to be the only available route for bidirectional compatibility with AMBER, but has several drawbacks. Notably, inpcrd/prmtop
files cannot be re-read into LEaP, so there will be no way to manipulate the system in AMBER. For example, if one wanted to simulate a protein in implicit solvent in OpenMM and then load this into AMBER for explicit solvent simulation, the system would have to be solvated in a cubic box in OpenMM and written as an inpcrd/prmtop
pair, rather than the truncated octahedron preferred in LEaP for simulation in AMBER.
For extending AmberInpcrdFile
, we could choose at least three possible options:
A. Following PDBFile
, add a @staticmethod
to write an inpcrd
file:
writeFile(positions, file=sys.stdout, boxVectors=None, title=None)
B. Modify the constructor to allow construction from an input file or construction from positions/boxVectors
:
def __init__(self, file=None, loadVelocities=False, loadBoxVectors=False, positions=None, boxVectors=None, title=None)
C. Create an AmberInpcrdFileFactory
class that generates an AmberInpcrdFile class via static methods:
@staticmethod
generateAmberInpcrdFile(positions, boxVectors=None, title=None)
For extending AmberPrmtopFile
, we could also choose at least three options:
A. Following PDBFile
, add a @staticmethod
to write an prmtop
file:
writeFile(system, topology, file=sys.stdout)
Note that not all System
objects can be written as AMBER prmtop
files. An exception should be thrown if the System
object contains forces that cannot be represented in AMBER prmtop
files.
B. Modify the constructor to allow construction from an input file or construction from system/topology
objects:
def __init__(self, file=None, system=None, topology=None)
C. Create an AmberPrmtopFileFactory
class that generates an AmberPrmtopFile class via static methods:
@staticmethod
generateAmberPrmtopFile(system=None, topology=None)
I'm happy to do the coding here (if desired) since I need this interoperability ASAP, but I'd very much like some useful feedback on the most consistent API and implementation so that this would actually be useful to others.
from openmm.
How about we define an AMBER PDB file type and have a (maybe derived) class for writing that? It seems that this really is a file type to itself.
This could be a workable approach to writing AMBER-compatible PDB files, though it may lead to what might be considered unnecessary duplication of code paths.
from openmm.
Writing inpcrd files would be pretty easy, so that would be a reasonable feature. But wouldn't mdcrd be more useful?
Apologies for the threadjack, but I recently wrote a very basic Python tool for outputting mdcrds to AMBER's netcdf file format (given a prmtop). It can be found here: https://bitbucket.org/mjw99/amberopenmmutils
Feel free to take/incorporate/improve.
I think the ability to write prmtop files would be very useful, and I've even considered writing such code myself, but I think it is very difficult to get right. This is because the format (and leap) has quirks and quite a few esoteric aspects that I've been burnt with in the past. In addition, I think AMBER should have migrated to an XML format for this a while ago, however, there seems to be an internal resistance to this.
That said, the documentation on the parm file format here, http://ambermd.org/formats.html is useful, but recently this has been augmented by Jason Swails http://archive.ambermd.org/201305/att-0256/prmtop.pdf and is more useful.
Also, I know Swails has written a number of python utilities in Python (Parmed.py) for manipulating prmtop files. These are in AmberTools and might be a useful starting point for a parm generator, John.
from openmm.
Converting OpenMM System+Topology into a prmtop will indeed be challenging because atom, bond, and angle types must be extracted from a System object, and a number of checks must be made before it can be concluded the System can even be correctly represented in a prmtop file.
This is certainly orders of magnitude more difficult than translating atom names to what AMBER expects in a PDB file.
from openmm.
In fact, it may be simpler to enable Forcefield write an Amber prmtop file, effectively replacing the AmberTools path for setting up systems. We could still load in correct PDB files generated by OpenMM (potentially including solvent), assign parameters from any forcefield OpenMM supports, and write prmtop files (for those forcefields that support the limitations of prmtop files).
Peter, what would you think about this? It avoids the issues you were concerned about.
from openmm.
I think it would help to back up for a moment and decide what use cases we're trying to support. "Interoperability" can mean a lot of things. Some examples include
- Build a system with AmberTools, simulate it with both OpenMM and AMBER, compare the results.
- Build a system with OpenMM, simulate it with AMBER.
- Build a system with OpenMM, load it with tleap for further modeling, simulate in OpenMM and/or AMBER.
- Run a simulation in OpenMM, analyze the trajectory with AmberTools (which particular programs?)
What are the particular things we want to enable?
from openmm.
On the API-design part of this thread, I want to cast my -1 against writing any more @staticmethod
s. And -2 on classes named SomethingFactory
that are a collection of @staticmethod
s. This isn't java. I realize that PDBFile
uses @staticmethod
s for writing (writeFile
, writeHeader
, writeModel
), but I think that is a suboptimal design that should not be replicated.
In my opinion, the APIs for file like objects should mirror the python file
builtin to the extent possible, as well as other established file-mapped objects in the scientific python ecosystem like tables.File
and netCDF4.Dataset
. This means that the constructor should take a mode (usually either read, r
, or write, q
, but some APIs might support append too), e.g. def __init__(self, filename, mode='r', ...)
. Both reading and writing instance methods -- not static methods -- should be provided on the class. The class should also have an __enter__(self)
and __exit__(self, *exc_info)
to support PEP343-style access.
from openmm.
All of the use cases @peastman suggested are useful. I think it should mainly be a matter of prioritizing.
My suggested priority order is:
- Build a system with AmberTools, simulate it with both OpenMM and AMBER, compare the results.
- Build a system with OpenMM, simulate it with AMBER.
- Build a system with OpenMM, load it with tleap for further modeling, simulate in OpenMM and/or AMBER.
- Run a simulation in OpenMM, analyze the trajectory with AmberTools (which particular programs?)
We have the functionality for (1) already via the current AmberPrmtopFile/AmberInpcrdFile
functionality.
My most pressing need is to enable (2) in some manner: The ability to take a system from OpenMM to be simulated in AMBER. The reason for this is that we have built a modeling pipeline that does a lot of things in OpenMM that would be difficult to do in other codes (adjusting number of waters, modifying protonation states, refinement in implicit and explicit solvent) and now need to prepare input files for the AMBER GPU port to run straightforward MD simulations on leadership computing resources (e.g. large chunks of Blue Waters or Titan). Note that this is only because AMBER currently has a speed advantage for straightforward MD over OpenMM.
(2) is currently not possible because we cannot easily set up an equivalent system in AMBER due to the renaming of residues and atoms; you can't simply read in a PDB file generated by OpenMM into LEaP. "Fixing" LEaP to read PDB v3.30 files and detect protonation states would require a great deal of effort, effectively implementing the same system used in the OpenMM app into LEaP. I've proposed a few ideas already in this thread about how #2 could be achieved (allow writing of AMBER-flavored PDB files; allow writing of prmtop/inpcrd files from system/topology
; allow Forcefield
to write AMBER prmtop/inpcrd files directly).
(3) would allow more flexibility than #2, but at the cost of the extra hassle of having to go through LEaP before running AMBER's sander
directly. Still, this would be desirable, especially if OpenMM was used for extensive implicit solvent modeling before moving systems into AMBER for explicit solvent simulation.
(4) would also be useful to some users due to the currently limited analysis tools in OpenMM. The main analysis tool would likely be AmbertTools ptraj
, and AMBER format NetCDF files could be generated by @rmcgibbo's MDTraj project. This would also require either a prmtop file.
from openmm.
On the API-design part of this thread, I want to cast my -1 against writing any more @staticmethods. And -2 on classes named SomethingFactory that are a collection of @staticmethods. This isn't java. I realize that PDBFile uses @staticmethods for writing (writeFile, writeHeader, writeModel), but I think that is a suboptimal design that should not be replicated.
@rmcgibbo : Can you give an example of how AmberPrmtopFile
and AmberInpcrdFile
would be refactored to conform to this design? It's not that I don't think this is a good idea---I think I'm just in need of some code use example with the proposed refactoring.
In my opinion, the APIs for file like objects should mirror the python file builtin to the extent possible, as well as other established file-mapped objects in the scientific python ecosystem like tables.File and netCDF4.Dataset. This means that the constructor should take a mode (usually either read, r, or write, q, but some APIs might support append too), e.g. def init(self, filename, mode='r', ...). Both reading and writing instance methods -- not static methods -- should be provided on the class. The class should also have an enter(self) and exit(self, *exc_info) to support PEP343-style access.
Same query here.
from openmm.
It sounds to me like writing inpcrd and prmtop files is the best way to go. Some aspects of that will be challenging - what do we do if you have a CustomGBForce? - but a large fraction of systems created by ForceField should be straightforward to translate. Mark, can you describe the particular issues you ran into with writing prmtop files?
We could definitely add a write() method to PDBFile. Note, though, that those static methods were not created arbitrarily! PDBReporter needs to write a file in pieces without ever having all the data in memory at once. It writes the header when you begin the simulation, writes a new model at each report interval, and writes the footer at the end of the simulation. I could have made those into instance methods, but then we would have mixed up the APIs for reading and writing PDB files in a confusing way.
from openmm.
Mark, can you describe the particular issues you ran into with writing prmtop files?
Hi Peter, apologies for the verbose reply, but these were the main issues I found/was aware of, when I was considering writing a prmtop file generator:
-
The lack of tools within Python for dealing with Fortran format. The formatting of values associated with a %FLAG, MUST adhere to the fortran format specified in its paired %FORMAT. Associated values are very easy to write and parse if you’re using Fortran, but, perversely, I have failed to find a decent Fortran format writer/parser library in any other language.The closest I’ve seen to this is FortranFormat in Java. What I’m getting at here, is that you will probably need to write a robust Fortran format parser/writer in python before you start.
-
The required set of %FLAG / %FORMAT pairs in a prmtop is not defined. Additional pairs have been added over time, but this not clearly documented or reflected in the %VERSION flag. Conversely, there are some FLAGS that have (in my experience) no use at all, e.g TREE_CHAIN_CLASSIFICATION. A corollary to this is that there is not a fixed number of entries in the the POINTERS list; future values are added onto the end.
-
The ordering of the %FLAG / %FORMAT pairs in a prmtop is not defined and can be a function of tool that generated it, however, generally, the POINTERS flag must be near the start since that contains information about the number of entries in other flags.
-
The use of negative values in DIHEDRALS_{INC,WITHOU}_HYDROGEN, on atom indices 3 and 4 to indicate an ignored 1-4 interaction and an improper dihedral is esoteric and confusing. If your output formatting is off, then point 1) comes into play and the sign may not be parsed properly. Also, I don’t completely understand LEaP’s criteria for the sign on index 3.
-
The ATOM_NAME entries are in theory case sensitive; lowercase is used to indicate GAFF atom types, i.e. CA != ca. The point at which their case sensitivity plays a role (i.e. during the parsing of parm*.dat) has probably passed by the time a prmtop file has been generated, however, it is something to be aware of. I recall using MMTK once to test a system with GAFF atoms and this had issues since it normalised all atomtypes internally to uppercase.
-
The %VERSION field value seems to have no meaning or use.
-
The values associated with the flags RADII and SCREEN are a function of the ASCII value in RADIUS_SET.
-
The logic associated with EXCLUDED_ATOMS_LIST is difficult and tightly coupled to the NUMBER_EXCLUDED_ATOMS list.
-
The %COMMENT flag is valid (and should be used more often), but some tool do not accept it.
-
Support for CHARMM topologies requires extra flags (i.e. via CHAMBER) and changes in the formatting specification of existing flags; I would not attempt to support that aspect.
P.S. (re my point 1; a quick google has turned up this, which I was not aware of)
from openmm.
Writing AMBER prmtop files will definitely be challenging, mainly because the atom types need to be automatically determined from the Force
objects. Regarding @mjw99 point (5), the atom type names themselves I don't think are relevant in the internals of AMBER.
The best strategy to generate a reasonable prmtop file may be to use the information in the XML forcefield specification (either directly or via ForceField
) to try to reconstruct what the atom types should be and which atoms are assigned which types, but this seems complex. It may be easier to have ForceField
generate prmtop files directly from a topology
, instead of trying to go from the information inside a System
object.
The AmberTools distribution contains a Python tool for reading/writing prmtop files (from Jason Swails) that I had not previously been aware of:
amber12/bin/chemistry/amber/readparm.py
I suggest we make use of this tool to read/write prmtop files for this purpose, since this will presumably continue to be supported by the AMBER developers.
The tool provides a means of building the arcane formatted contents of the prmtop file from structures that reflect the underlying topology and parameter sets (or vice-versa). I think it can also support "chamber" CHARMM-in-AMBER files.
from openmm.
Thanks Mark, that's great information to have. And I'll take a look at the AmberTools code.
from openmm.
I think we're doing much better with Amber interoperability thanks to @swails and ParmEd.
from openmm.
Related Issues (20)
- Residues template for lipids HOT 6
- xml force field for coarse-grained protein HOT 7
- Adaptive step size Verlet and Langevin integrators HOT 3
- Continue OpenMM simulation without checkpoint/XML HOT 6
- AmberPrmtopFile problem with DCDReporter HOT 2
- Minor API improvements HOT 2
- Multithreading on different platforms HOT 3
- NaN errors and thermostat issues when running on multiple GPUs HOT 6
- Use LF-Middle for LangevinIntegrator
- Failed tests HOT 4
- Box size bug HOT 3
- End-to-end distance between atoms? HOT 2
- Applying different drag coefficients to different particles HOT 6
- Deform simulations settings HOT 1
- Inclusion of a Drude polarizable DFHR system in the benchmark pipeline HOT 2
- Switching fn used for LRC of custom NB force different than used for energy HOT 2
- Overflow errors with random seeds HOT 5
- Specifying parameters per pair of particles in CustomNonbondForce HOT 1
- Implementation of Configurational-Bias Monte Carlo HOT 1
- Large time taken to initialize `OpenMM::Context` HOT 20
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openmm.