ericchansen / q2mm Goto Github PK

Quantum to Molecular Mechanics (Q2MM)

License: MIT License

Python 91.31% Shell 0.67% DIGITAL Command Language 8.02%

force-fields optimal-parameters force-field-optimization schrodinger quantum molecular-mechanics

q2mm's Introduction

Q2MM

Q2MM stands for quantum (mechanics) to molecular mechanics or quantum guided molecular mechanics, depending on what you prefer. Q2MM is open source software for force field optimization.

Python Dependencies

Q2MM uses Python 2.7.4. I'm reluctant to use more recent versions of Python because as far as I know, Schrödinger still uses 2.7.4.

The following modules are required, but are included in the standard library for Python 2.7.4.

argparse
collections
copy
glob
itertools
logging
mmap
os
random
re
string
sqlite3
subprocess
sys
textwrap
time

These are required, but aren't in the standard library.

numpy

Required for particular features.

schrodinger

If you'd like to use Schrödinger features, they recommend running external Python scripts using

$SCHRODINGER/run somepythoncode.py

Usage

You can get help for most python Q2MM scripts using the command line. Here's an example.

python calculate.py -h

Setting up the Schrödinger MM3* force field

These force field files are labeled mm3.fld. Our custom parameters are stored in substructures towards the end of theses files. For information on these substructures, see the MacroModel reference manual.

Substructures are marked for optimization by adding the word "OPT" to their title. For example, you could name your substructure "New Metal Parameters OPT". Running

python parameters.py -f pathtomm3.fld -a -pp

will print a list of the parameters that Q2MM identified. You can redirect the output parameter list to a file using standard Unix redirection.

python parameters.py -f pathtomm3fld -a -pp > params.txt

Here's an example of what params.txt might look like.

1854 1 0.0 inf
1854 2 0.0 inf
1854 3 -inf inf
1855 1 0.0 inf
1855 2 0.0 inf
1856 1 -inf inf
1856 2 -inf inf
1856 3 -inf inf

The first column refers to the line of the force field file where the parameter is located. The second column is an index refering to the location of the parameter in that line. For Schrödinger and mm3.fld, equilibrium bond lengths are found in column 1, force constants in column 2, dipoles in column 3, etc. See Schrödinger's documentation and the documentation inside parameters.

The 3rd and 4th column in params.txt are optional and specify the allowed parameter range. These values can be any floating points. Also, "inf" is used to signify the parameter can go to infinity. If the 3rd and 4th column aren't included, Q2MM will attempt to identify suitable parameter ranges based upon the parameter type.

Currently, these parameter ranges are hard walls. In other words, if a step is made to move outside that wall, the step is simply scaled down to not go too far. Ideally, we should implement soft walls in the future.

To select only certain types of parameters, use

python parameters.py -f mm3.fld -pt bf af

This command would print the bond and angle force constants in the format described above. See the help dialogue for parameters for more information.

Running an optimization loop

I would always recommend making a backup of your force field before beginning an optimization.

The loop module uses customized input files to manage the optimization of parameters. You can supply the input file using the simple command shown below.

python loop.py someinputfile

Here's an example of what the input file could look like.

DIR somedir
FFLD read mm3.fld # This is a comment.
PARM params.txt
RDAT -d somedir -je str_a.mae str_b.mae str_c.mae -je str_d.mae str_e.mae -jb str_a.mae str_b.mae str_c.mae str_d.mae str_e.mae
CDAT -d somedir -me str_a.mae str_b.mae str_c.mae -me str_d.mae str_e.mae -mb str_a.mae str_b.mae str_c.mae str_d.mae str_e.mae
COMP -o opt_start.txt
# Here's another comment.
LOOP 0.15
GRAD
SIMP
END
LOOP 0.05
GRAD
END
FFLD write smm3.fld
CDAT
COMP -o opt_end.txt

Let's breakdown each line.

DIR somedir

This sets the directory where all the data files, atom.typ, and mm3.fld files are located. Also, the MacroModel calculations will be run from this directory, and Q2MM intermediate or temporary files will be written here.

FFLD read mm3.fld

Read the initial force field.

PARM params.txt

Select certain parameters from the force field you just read. Without this, all parameters in the substructre are selected.

RDAT -d somedir -je str_a.mae str_b.mae str_c.mae -je str_d.mae str_e.mae -jb str_a.mae str_b.mae str_c.mae str_d.mae str_e.mae

This gathers the reference data used throughout the optimization. All of the arguments following RDAT are the same as the arguments used for the calculate module. See calculate help for more information. Note that the directory is still included in the command shown above.

CDAT -d somedir -me str_a.mae str_b.mae str_c.mae -me str_d.mae str_e.mae -mb str_a.mae str_b.mae str_c.mae str_d.mae str_e.mae

Same as above, except this is for the force field data.

COMP -o opt_start.txt

Compare the reference data to the force field data to determine the initial objective function score. Write the data and scores out to somedir/opt_start.txt.

LOOP 0.15

This marks the beginning of an optimization loop. All commands located between this line and the line containg END will be repeated until convergence is reached. In this case, it loops back and forth between gradient and simplex until the objective function changes by less than 15%.

GRAD

Use the gradient methods to optimize parameters. See the gradient module for more information.

SIMP

Use the simplex method to optimize parameters. See the simplex module for more information.

END

Marks the end of the commands being looped.

LOOP 0.05

Start another loop, but this time with a stricter convergence of 5% change. Really, you will hardly ever need to include two separate loops. I'm just showing you that you can if you want to for whatever reason.

GRAD

In this loop, we're only using the gradient method.

END

Marks the end of the second loop.

FFLD write mm3.fld

Write the optimized force field parammeters to somedir/mm3.fld.

CDAT

Calculate the force field data again. If CDAT is used without additional arguments, as shown here, then it remembers the previously entered arguments and repeats them. Calculating the force field data again here may seem excessive, but I wanted to make sure that the FF data stored in memory was calculated using the most recent optimized parameters. Better safe than sorry!

COMP -o opt_end.txt

Compare the reference and force field data for the optimized force field. Note that the reference data remains the same through this entire command file, so there's no need for me to use another RDAT command. Write the data and scores out to somedir/opt_end.txt. At this point, it would be wise to examine mm3.fld. Check that the parameters seem reasonable, and cross-check them with the backed up original parameters.

Changes to default settings

Maximum parameters for simplex optimization:

The maximum parameters that are used for simplex optimizations is defaulted to 10, but can be changed with the max_params command.

SIMP max_params=6

Control of gradient methods:

Five gradient methods are available: least-squared, lagrange, levenberg, newton-raphson, and SVD. Changing default settings for these require the use of the shortened name (lstsq, legrange, levenberg, newton, and svd) followed by "=" and the settings the user wants to use. True and False commands will turn on and off the optimizers, respectively. Users can also change the factors, cutoffs, and radii, that are used by including the setting they want to change followed by the values seperated by "/" nested in brackets.

GRAD lstsq=False newton=True,cutoffs[None],radii[0.01/0.1/2.0] svd=True,factor[0.01/0.1]

Changing default weights and step sizes:

A user may want to change the weights of certain data, or step sizes during differentiation of parameters. This can be accomplished with the keywords WGHT and STEP followed by the data/parameter type and value.

WGHT b 10.0
STEP be 1.0

For using hessian matrix from AmberMD:

In order to use -ah flag in CDAT section, the user must compile their own version of AmberMD.

            // q2mm
            FILE * hFile;
            hFile = fopen("./calc/hessian.mat","w");
            fprintf( hFile, "Hessian %d\n", natom);
            k = 0; 
            for (i = 1; i <= ncopy; i++){
                for (j = 1; j <= ncopy; j++){
                    fprintf( hFile, "%12.5f ", h[k]);
                    k++;
                }
                fprintf( hFile, "\n");
            }
            fclose(hFile);
            // q2mm

Add the following line to AmberTools/src/sff/nmode.c after following code

         /*
          * Mass weight the Hessian:
          */
         
         j = 1; 
         for (i = 1; i <= natom; i++) {
            g[j + 2] = g[j + 1] = g[j] = 1.0 / sqrt(m[i]);
            j += 3;
         }  
         
         k = 0; 
         for (i = 1; i <= ncopy; i++) {
            for (j = 1; j <= ncopy; j++) {
               h[k] = g[i] * h[k] * g[j];
               k++;
            }
         }

q2mm's People

Contributors

Stargazers

Watchers

Forkers

zophic gaslitbytech bodomartin q2mm nberkel05 xin-1991 peonor egulotty yaqinguppsala oneoverseven ssirimulla nsf-c-cas rickeyestes global-localhost global19 global19-atlassian-net devangkaruskar senppoa

q2mm's Issues

loop error

There is an error in line 115 of loop.py, no global variable opt, I just replaced with the corresponding line under the Q2MM fork and now it works...

SVD

I just looked for the first time in the SVD code. Short message: it is not doing SVD! It's working on the matrix A = JTJ (sorry, cannot superscript T).

Real SVD should work on the Jacobian directly, creating three matricies. You then modify the diagonal values (inverting all large values, setting all small values to zero), and create the inverted A-1 matrix directly by combining the three resulting matricies.

The current code instead works on A and puts it in the solver. SVD never needs the solver, you use the SVD instead.

With the current implementation, DON'T use SVD, it adds nothing compared to a regular least-squares.

Grabbing Eigenvalues

Currently it seems as if the code is not gathering the eigenvalues appropriately from Jaguar files. The eigenvalues of structures with negative frequencies are not reproduced when running python calculate.py -jeigz filename.in,filename.out

Input for compare.py and loop.py

I've been using the sort branch and I want to clarify how it works. I think I misunderstood the commit message (commit 1434d4d), but what needs to be done is that the input for compare and loop have to be in order of datatype. Can this be clarified?

So this will work:
python compare.py -c " -d somedirc/ -me -mea -meo -meao -mb -ma -mt -mq -mqh -mqa -mh -mgeig" -r " -d somedirc/ -je -jea -jeo -jeao -jb -ja -jt -jq -jqh -jqa -jh -jgeig"

where as if I were to switch the orders of the datatypes it will not be compared appropriately.

So this will not work, where I have moved angle and torsions to the end of the line:
python compare.py -c " -d somedirc/ -me -mea -meo -meao -mb -mq -mqh -mqa -mh -mgeig -ma -mt " -r " -d somedirc/ -je -jea -jeo -jeao -jb -jq -jqh -jqa -jh -jgeig -ja -jt"

Simplex

Something is badly wrong with the Simplex. When I'm testing it now, for a large set of parameters, when it starts the simplex, it initializes by doing numerical differentiation, for 150 parameters! This is never meaningful. First of all, a subset of no more than 10 parameters must be selected from the preceding Grad calculation (the initial Jacobian of that one). Second, we should never do forward AND backward when starting a Simplex. We usually use the initial point, and each of the <=10 parameters changed in the forward direction only, that is the starting Simplex.

Geometeric Parameters

After trying to run a loop of a parameter optimization the following error was found, using the following input:
python loop.py -c " -d DIRECTORY -mh FILENAME" -r -d DIRECTOR -jhi FILENMAE" --ptypes ae

'unrecognized MM3* parameter label: {}' .format(param.mm3_label)
AssertionError: unrecognized MM* parameter label: a2

Dangerous torsion-test

In filetypes.py, around line 1550, the angles in a torsion are tested for linearity (<5 or >175). The test fails badly if the two angles in the torsion are not included in the optimization set (which can happen). Haven't tried with the very latest version, but since the new "try" statement re-raises the exception, I believe it will fail again.

Informative ValueError

When running with -mgeig with multiple pairs, I get the error "ValueError: matrices are not aligned", it would be very helpful to also get the output about WHICH pair this is for. Now I have to run all 25 one at a time until I find the error...

Setup esp

When I'm running this it works fine except it's not selecting all the atoms I want it to. What would be nice is to only deselect atoms from OPT substructures. Instead, it's also deselecting atoms from a random P-H and the Rhodium substructure I have.

ferrocene_setupesp.zip

Strange error reading Gaussian frequencies

Error when reading Gaussian files with HPModes frequencies. filetypes.py checks for the presence of "Haromic" two places, obvious misspelling. Does this mean no one has used the HPModes keyword? We should. Also, seems to fail un "unusual" Frequency files, for example when the keyword "Projected" has been used. The lines before the actual eigenvectors seem variable, how to handle?

License checking

I've had cases now where licenses have been available, but the license checking in the Q2MM code itself fails for various reasons. I'd like a modification so that only proof of licenses unavailable will break here; on checking failure, let the code continue and keep your fingers crossed.

OPT in vdW

parameters.py crash when mm3.fld contains OPT in vdW parameters.

Command:
$SCHRODINGER/run ~/q2mm-3/parameters.py -pp -pt q

Three lines from within mm3.fld, with line numbers:
1036- CR 1.9600 0.0560 0.0000 0000 O 2
1037: CM 1.9600 0.0560 0.0000 0000 A 3 OPT (PeO Est.)
1038- O2 1.8200 0.0590 0.0000 0000 O 1

Traceback (most recent call last):
File "/home/kcjh508/q2mm-3/parameters.py", line 246, in
main(sys.argv[1:])
File "/home/kcjh508/q2mm-3/parameters.py", line 190, in main
ff = datatypes.import_ff(opts.ffpath)
File "/home/kcjh508/q2mm-3/datatypes.py", line 416, in import_ff
"[L{}] Can't read substructure name: {}".format(i + 1, line)
AssertionError: [L1037] Can't read substructure name: CM 1.9600 0.0560 0.0000 0000 A 3 OPT (PeO Est.)

File root.log is empty

No error if text "OPT" is removed from the vdW section (only in substructures)

error with get_most_converged.py

I have tried two different commands and had two different errors. I am trying to get the most converged coordinates from a Gaussian log.
Here is the first command:
python get_most_converged.py ../complex_substrates/allyl_alcohol_ts1b_re.log -c both -f gauss
Output:
Traceback (most recent call last):
File "get_most_converged.py", line 34, in
output = best_strucutre.format_coords (format=opts.format)
AttributeError: 'NoneType' object has no attribute 'formate_coords)

Second Command
python get_most_converged.py ../complex_substrates/allyl_alcohol_ts1b_re.log -a -c both -f gauss
Output:
Traceback (most recent call last):
File "get_most_converged.py", line 27, in
output = best_strucutre.format_coords (format=opts.format)
File "/afs/crc.nd.edu/user/a/arosale4/q2mm/filteypes.py", line 681, in format_coords
elements[atom.atomic_num, atom.x, atom.y, atom.z))
KeyError: 46

Support for Gaussian Hessian, eigenvectors, etc.

Add class GaussFormCheck to filetypes module, which should be responsible for retrieving the Hessian Hopefully this is in standard coordinates, as we need it to match the projected eigenvectors from inside the Gaussian log file.

Also must add eigenvector retrieval to GaussLog.

Add functions to the calculate module such that this data can be easily requested during an optimization.

New method to read Jaguar eigenvalues

This is an issue that was brought up here. However, the major issues in that thread were dealt with. It seemed more appropriate to have this be its own feature.

Basically, it would be great to have a method or option in calculate that reads Jaguar eigenvalues directly.

Currently, -jeigz reads the eigenvectors and Hessian, not the eigenvalues directly.

Below is what PO suggested.

... pick the frequency, convert it to a.u. (factor in my HessMan.py program), square it, and put back the sign, should now be the eigenvalue of the mass-weighted Hessian.

Mass weight Gaussian energy derivative datatypes

Is the Gaussian Hessian already mass weighted? For the option -gh, there currently isn't any additional mass weighting going on after reading the Hessian from the archive, which may be a big problem.

For -geigz, we simply read the eigenvalues and convert them to a diagonal matrix. Again, does any mass weighting have to occur here? Currently, I only have methods for mass weighting Hessians and eigenvectors, not eigenvalues. I'm not sure how I would do this.

For -mgeig, we use the Gaussian eigenvectors and the MM Hessian to form the diagonal eigenvalue matrix. Again, do we need to mass weight the eigenvectors after reading them?

Printing parameter files with "neg"

"parameters.py" still does not print a parameter list with "neg" for parameters that are allowed to be negative. I made a simple change at the end, shown here as "diff" output between original and my new file:

252c252,254

< print('{} {}'.format(param.mm3_row, param.mm3_col))

        if param.allow_negative:
            print('{} {} neg'.format(param.mm3_row, param.mm3_col))
        else: print('{} {}'.format(param.mm3_row, param.mm3_col))

RCA4 and TORC bond attributes not changing after merge

This comment thread provides the initial concerns with the atom index as the bond property for TORC and RCA4.
https://github.com/ericchansen/q2mm/commit/2e9c2e98b97a0603961973ec08de487ce687d922#commitcomment-21991092

I have found that this is not only a problem for what is discussed in the commit but also after a merging of structures. For example:
Template (S1) has the substructure C2=C2-C3 with atom indices of [1, 2, 3]
Structure to merge (S2) is a ring: C2=C2-C3-C3-C3-C3-1 with atom indices [1, 4, 5, 8, 11, 14] (plus hydrogens)

If S2 has a RCA4 between atoms [14, 1, 4, 5] and is then merged onto the template the resulting RCA4 bond property of the merged structure (S3) will be listed as [14, 1, 2, 5], but atoms 14 and 15 are not S2[14] an S2[15] anymore because of the merging sequence.

It might be better to read/write the atom index, but when manipulating structures this property becomes an schrodinger.structure._structureatom object.

Multiple Processing

Obviously we are limited by the number of tokens available and so we can't take advantage of our own form of additional processing power, but is it possible to use MacroModel's queuing system? I could be wrong, but it seems that the MacroModel computations are the rate determining step. Instead of having multiple *q2mm.mae files for each structure we are using as data, could we use one *q2mm.mae file that has all of the structures? From what I have gathered so far, it seems like we can use multiple processors for a single *.mae file while still having only checked out 2 tokens.

Request for logging of intermediate data

Older versions of the code generated tables of data in files par.one, par.ref, and par.tot. A description of the files can be found in Per-Ola Norrby and Tommy Liljefors, J. Comput. Chem. 1998, 19, 1146-1166. It would be nice to still have access to these files or perhaps something similar.

The loop is broken

Error in loop.py, using a case with only gradient, at the end of first iteration, get the following:

gradient -- Copied parameter derivatives from MM3LAGRANGE F10.0 to MM3READ.
Traceback (most recent call last):
File "/home/kcjh508/q2mm/loop.py", line 192, in
main(sys.argv[1:])
File "/home/kcjh508/q2mm/loop.py", line 186, in main
loop.run_loop_input(lines)
File "/home/kcjh508/q2mm/loop.py", line 108, in run_loop_input
self.ff = loop.opt_loop()
File "/home/kcjh508/q2mm/loop.py", line 43, in opt_loop
change = (last_score - self.ff.score) / last_score
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'

Seems "last_score" is not initialized. When did this happen? Reproducible, can supply files, but would like to make a smaller case, right now this is at the end of a 118 parameter loop

Just testing issues

Oh hey! Look at that! It's an issue!

Hessian inversion

Where did the old technique with setting a large positive value for the negative eigenvalue go? We've mostly replaced it with the new, "natural" TSFF where we just set the weight to zero, but there are still some cases where we'd like the old technique available, and I can't find it now. Just to be clear, we do not need to create the modified Hessian, just replace the reference value for the first eigenvalue with a large positive in the reference data and use it with a non-zero weight.

The application I need it for right now is a final step where the force field has been converged to the "natural" values, taking out the force constants only for the breaking and forming bonds (and maybe some angles including them if they also ended up with too low values), and rerun ONLY those parameters with the modified Hessian. See Elaines and my 2015 article in JCC, the "F-SN2" TS, for motivation.

Charge fitting

There is a problem when fitting dipoles at the border of the parameterization region, that is, when charges are affected both by dipoles that change, and unmodified dipoles from the underlying force field. Could potentially be alleviated by pruning the data:

Include all charges and charges_with_H where all dipoles to that atom are being modified.
For any atom with at least one unmodified bond to a non-hydrogen atom, exclude the charge.
For an atom that has only bonds that are being modified, and bonds to hydrogens, include the charges_with_H in the data set, but not the regular partial charge.

Force Constant Calculation

I tried doing a normal bond and angle force constants calculation. I got an error back pretty quickly. It seems to work for the ferrocene_ecl structure but not the ferrocene_stag. I've checked their files (01.in and .out) and can't tell the difference. Here's the error message:

Traceback (most recent call last):
  File "loop.py", line 183, in <module>
    main(sys.argv[1:])
  File "loop.py", line 177, in main
    loop.run_loop_input(lines)
  File "loop.py", line 114, in run_loop_input
    self.ref_data = calculate.main(self.args_ref)
  File "/afs/crc.nd.edu/user/n/nberkel/q2mm/calculate.py", line 104, in main
    data = collect_data(commands, inps, direc=opts.directory)
  File "/afs/crc.nd.edu/user/n/nberkel/q2mm/calculate.py", line 580, in collect_data
    hess.mass_weight_eigenvectors()
  File "/afs/crc.nd.edu/user/n/nberkel/q2mm/datatypes.py", line 723, in mass_weight_eigenvectors
    x, y = self.evecs.shape
AttributeError: 'NoneType' object has no attribute 'shape'

I've attached relevant files.
FC Error.zip

parameters.py substructure reading

parameters.py crashes if the substructure name contains a period (I used "Asymm. Suzuki OPT" as the name.

true/false properties in .mae file not working

Whether I use 0 or 1 for b_cs_first_match_only, or b_cs_both_enantiomers I can not seem to reduce the number of structures produced. I have even gone into merge.py and set first_match_only to True, and it would still match every instance of a pattern. I'm trying every version of schrodinger that we have available to see if that is a problem. I have also noticed that the pattern PD-C2=C2 will also match PD.C2=C2.

Problem with Hessian options

When using calculate.py, or programs that depend on it, together with options for using the Gaussian log file (-geigz or -mgeig), the program throws an error message about unrecognized options (all on the line), but the command still completes, and seems to do it correctly

Exit status 3 from MacroModel

I'm getting a new error, twice now in the past several days, that I'm not accustomed to.

simplex -- Calculating MM3[FORWARD ParamMM3[af]1890,2](None).
FATAL -139: Could not check out a license for mmlibs version 2.5.
Traceback (most recent call last):
File "loop.py", line 183, in
main(sys.argv[1:])
File "loop.py", line 177, in main
loop.run_loop_input(lines)
File "loop.py", line 106, in run_loop_input
self.ff = loop.opt_loop()
File "loop.py", line 40, in opt_loop
self.loop_lines, score=self.ff.score)
File "loop.py", line 142, in run_loop_input
self.ff = simp.run(r_data=self.ref_data)
File "/scratch365/ehansen3/r_rhod/opt.py", line 32, in wrapper
return func(_args, *_kwargs)
File "/scratch365/ehansen3/r_rhod/simplex.py", line 109, in run
data = calculate.main(self.args_ff)
File "/scratch365/ehansen3/r_rhod/calculate.py", line 102, in main
some_class.run(check_tokens=opts.check)
File "/scratch365/ehansen3/r_rhod/filetypes.py", line 1142, in run
os.path.splitext(self.name_com)[0]), shell=True)
File "/opt/crc/schrodinger/2014u1/mmshare-v25013/lib/Linux-x86_64/lib/python2.7/subprocess.py", lin
e 544, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'bmin -WAIT X004.q2mm' returned non-zero exit status 3

I checked X004.q2mm.log, and it appears normal. However, this is probably a log file from a previous calculation, not the one that caused this error. Nonetheless, the log is attached.

I checked the number of licenses available manually ~ 1 hour after being notified of this error. 8/25 tokens were checked out. It would seem strange that ~ 1 hour ago, all 24 tokens were consumed such that this job would fail.

Anyone have ideas?

Also attached is the log file for the job.

Renamed all attached files as text files.

rhod_b094.o996333.txt

X004.q2mm.log.txt