paesanilab / mbx Goto Github PK

View Code? Open in Web Editor NEW

28.0 28.0 31.0 217.84 MB

MBX is an energy and force calculator for data-driven many-body simulations.

License: Other

C++ 99.64% Shell 0.01% C 0.24% Python 0.07% Makefile 0.02% M4 0.01%

mbx's People

Contributors

Stargazers

Watchers

mbx's Issues

Vectorization of polynomials for H2O 2b/3b [WIP]

Hey @dgasmith , I started thinking how to vectorize the polynomial files.
As you know, most of the part of those files is declaring constants that depend on other constants:

   ...
    const double t5627 = t1948*t73;
    const double t5628 = t1961*t77;
    const double t5629 = t1963*t126;
    const double t5630 = t1961*t132;
    const double t5631 = t1963*t136;
    ...

If we want to put several systems and make a loop, the "straight forward" idea is to make those variables an array, not constant, and leave in the form of:

   ...
    t5627[n] = t1948[n]*t73[n];
    t5628[n] = t1961[n]*t77[n];
    t5629[n] = t1963[n]*t126[n];
    t5630[n] = t1961[n]*t132[n];
    t5631[n] = t1963[n]*t136[n];
    ...

The problem I see with this is that there might be then thousands of arrays. What do you think?

Incorrect unit test for unittest-co2-monomer

With the implementation of the classic potential we can calculate the bonds and angle energies for co2. However, at the time of implementation the JSON file was not implemented to select a one body potential or a classical one. Instead, the sum of each is given as the energy. The file setup_co2_1.h has the following lines that will need to be changed:
double total_energy = 9.9150780942e+00 + classic_energy;
double total_energy_ttm = 9.9150780942e+00 + classic_energy;
Because there is no way to select which energy we want, sum sum of the classical and one body energy is returned. Hence, the classic energy was added to the total and ttm energy to make the test pass. In future, the classic_energy variable will need to be removed.

Add water monomer energy function [WIP]

In order to add the PS PEF, we need to:

Add the files of the patridge-Shwenke surface
Add functions that update the charges given the positions
Add functions that update the coordinates of the virtual sites given the positions
Test the energy for a water monomer

Implementation of binary tree to condense gradients

The blocking of the gradients in the system.cpp file can be expensive. I will proceed to find a binary tree and implement it. I might need to discuss this with you in through videoconference, @dgasmith .

CMake Behaving weird for the plugin

Hey @dgasmith ,
Once I am done with a couple of minor things to polish, I would appreciate if we can sit together and figure out a way to make CMake work properly. I open this issue to remind me to do it.
For now, the following things are not behaving well:

Need to find a way to decide which compiler to use (intel/clang/gnu)
Seems like CMake is not accepting to use the std=c++11

In principle, fixing this should make things work properly.
No reply needed, but we will have to work on this. I already tried everything google told me, but there is some key information that I am missing (and I am sure is too trivial).
Thanks in advance!

Add function to set charges [WIP]

Need to implement a function that sets charges once position of all atoms are set. Water has coordinate-dependent charges.

Problem with intel compilers and clang

Hey @dgasmith ,

I cannot figure out what is going wrong with the intel and clang compilers. There are 3 problems that I can see:

g++ compiles and runs, but there are some numerical differences (8th decimal place) that are slightly suspicious (I didn't have them before) because...
icpc is compiling, but there is an error with test number 5 (energy + gradients of a single water molecule). The compiled code gets right and runs well the energy without gradients, but with the gradients there is a segfault, and trying to debug with gdb is useless. It breaks at the when is trying to exit the for loop https://github.com/chemphys/clusters_ultimate/blob/master/src/bblock/system.cpp#L229
And I don't understand why. I didn't touch this part of the code and it was working before... The only thing that changed is that I had the g++4.9.4 installed in my machine, and the few problems we had (some linking problems between intel and g++ libraries) were fixed. Do you think is a compiler problem? I tried to compile it in the supercomputer but the versions of gcc and cmake are old.
clang is compiling in travis but gets seg faults in both tests 5 and 6 (energy with and wthout gradients).

Summarizing: g++ works, but with small numerical differences in the difference of the gradients, intel works for energies and clang fails for both energies with and without gradients with segfault.

I have been on google all afternoon and I run out of ideas. I tried different optimizations, gdb, valgrind... but I cannot locate why icpc is failing there.
When you have time, can you try to compile it in your machine with intel and run the script test 5 and 6 in HOME/tests/tests/?

Thanks!

Dispersion Interaction [WIP]

I will proceed to add the dispersion interactions to the code.

In classical forcefields, the dispersion interaction (and non bonded interactions in general) depends on the type of atom, and each pair of atom types has a defined set of parameters to be used in the non-bonded contribution. However, in our approach, each pair of monomers has defined C6 coefficients. This means that water with water will have a defined C6 for O-O, O-H and H-H, CO2 with water will have defined coefficients for C-OW, C-HW, O-OW, O-HW, and so on.

Also, some large monomers, with atom distances of 4 bonds (distance 1-5) such as N2O5 will also have intramolecular dispersion that needs to be taken into account.

The way that I think might be the best to implement this is the following:

Create a general function that given C6 and damping (d6), C8 and damping (d8), and coordinates of two atoms, returns the dispersion contribution of that pair.For now we only use the C6/d6, but it will be nice if at least you also have the option of add the C8 contribution just in case.
Create a function that returns a vector of C6 and d6 for a given dimer, and since all the atom pairs have to be evaluated, no need to return the atom pairs. However, this will imply that the order of the atoms of the molecule is in a preset order. For now, the user has to add it to the input directly, but later on we can try to make the code reorder it if it is necessary.
Create a function that returns the same information but for a single monomer. Here there is a need to find a way to store the excluded pairs and non-excluded pairs for a given monomer. This will also be useful for the electrostatics. I think it has to be done during the monomer set up.
Create a dunction that evaluates the dispersion for a chunk of monomers/dimers to allow vectorization.

@dgasmith , I have a couple questions:

Does this way seem reasonable?
To store the excluded pairs, is a vector of vectors a bad idea? Would it be better to create a structure rather than vector of vectors?

Thanks!

Implement Electrostatics [WIP]

The implementation of the electrostatics will be following the iterative method to obtain the dipoles. For now, no PBC. See PDF for information about the theory and implementation.
Electrostatics.pdf

Add convergence params update in electrostatics

diptol_, maxItDip_, dipole_method_ should be updated when updating the coordinates. Right now, once the system is initialized, this is not modified anymore.

Energy with 0 polarizability and 0 polarizability factor seems to be hanging

When 0pol and 0 polfac are used, the electrostatics energy seems to hang in the gamma function. That should be investigated and fixed.

Migration of tests to use python [WIP]

I will migrate the test script to be python scripts so we don't have numerical issues anymore. For now, I reduced the precision in the output of the tests, but It will be more robust to use Python scripts.

buckingham turned off

In the master-dev_switch_ttm_poly branch of the chemphys fork, in "/src/potential/buckingham/buckingham.cpp" at line 386, the Buckingham contribution is switched off if the TTM pairs are enforced (rep_energy_ = 0.0;).

Something odd with general box implementation

There are some weird issues with the general box implementation. We need to investigate and write proper tests.

Add Travis CI [WIP]

We need to add the continuous implementation to the repository.

Update electrostatic force calculation with new functions in helpme

We need to update the force calculation using the new functions that @andysim has implemented in helpme.h

2B and 3B parallelization

I started the parallelization of 2B and 3B. I will put here scaling tests of the different parts, and the global scaling.

VTune report and Performance optimization [WIP]

I will proceed to run VTune and see the bottlenecks of the computation.

Typo in alpha

There is a consistent typo across the json files and the system class. The json element alpha is currently being names aplha. Needs to be changed.

Migrate compilation to CMake [WIP]

Do this after issue #9 is closed.

Need to write unittest for charge gradients

There must be a unittest for the charge gradients. On the way, rewrite and understand well the charge gradients is crucial.
Look at SetCharges and chargederivativeforce from sys_tools.cpp, and look at ps.cpp` for reference.

Memory leak when compiling with GNU

There is a mem leak when compile the software using GNU g++ 4.9.2. It seems that it comes from the OMP, but I cannot track the origin. I will investigate this issue.

==16345== HEAP SUMMARY:
==16345==     in use at exit: 8 bytes in 1 blocks
==16345==   total heap usage: 13,428 allocs, 13,427 frees, 963,818 bytes allocated
==16345== 
==16345== 8 bytes in 1 blocks are still reachable in loss record 1 of 1
==16345==    at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==16345==    by 0x5448778: gomp_malloc (alloc.c:36)
==16345==    by 0x54518B7: gomp_init_num_threads (proc.c:90)
==16345==    by 0x5446E8C: initialize_env (env.c:1187)
==16345==    by 0x400F502: call_init (dl-init.c:82)
==16345==    by 0x400F502: _dl_init (dl-init.c:131)
==16345==    by 0x40011A9: ??? (in /usr/lib64/ld-2.17.so)
==16345==    by 0x1: ???
==16345==    by 0xFFEFFE64E: ???
==16345==    by 0xFFEFFE68F: ???
==16345== 
==16345== LEAK SUMMARY:
==16345==    definitely lost: 0 bytes in 0 blocks
==16345==    indirectly lost: 0 bytes in 0 blocks
==16345==      possibly lost: 0 bytes in 0 blocks
==16345==    still reachable: 8 bytes in 1 blocks
==16345==         suppressed: 0 bytes in 0 blocks
==16345== 
==16345== For counts of detected and suppressed errors, rerun with: -v
==16345== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Make constant variables passed by reference actually constant

Some functions use variables as constant, but it is not specified that they are constant (specially variables passed by reference). This needs to be fixed.

Library in the makefile of the i-pi plugin isn't linked correctly

Hello,

on my computer (Debian) the Makefile of the i-pi plugin isn't linked correctly.
Here is my proposed fix, which worked on my machine:

Replace:
LIBS = -lmbxlib -fopenmp -lfftw3
LIBDIR = -L$(MBX_HOME)/install/lib/static

With:
LIBS = -l:libmbx.a -fopenmp -lfftw3
LIBDIR = -L$(MBX_HOME)/install/lib

Greetings Louis

Structure and timeline [WIP]

Structure and Timeline for the development of the software

@dgasmith I will keep updating this in the upcoming days. I am also tagging @zonca, who collaborates with us at SDSC (San Diego Supercomputer Center), and is mainly helping with the obtainment of the potential energy functions, and he can keep track of what's going on.

Main goal to accomplish by 12/31/2017

This code should be able to efficiently calculate energy and gradients of a water - ion system, and be prepared to implement other molecules such as CO2, SO4(2-), ...
In Phase-I, the code will be parallelized using OMP parallelization, and work in the gas phase (non-periodic boundary conditions).

Code Structure

1. Read/write configurations
a) Decide an input format. Right now (20170812) is a format that allows to differentiate between molecules and monomers, but we should try to use a format that is already being used by the community. PDB would be an option, but I don't know if that allows to differentiate monomers from molecules. This part is not crucial since is the way we input the data. As long as the data is inserted in the program, we don't really care how we do it. Things needed from the user:

Monomer type and coordinates
Molecules and monomers that form that molecule

2. Identify monomers and set them up
a) Fill the system with the monomer properties. As it is today (20170812) the monomers are set up inside the system class initialization, and the information is coded inside that function. For now, does not matter, but we might want to have the data outside and just read the information we will use. The properties that we need from the monomers are:

Number of sites (real and virtual). In case the virtual site depends on the coordinates, we will need to calculate the positions. For now, water is the only molecule with virtual sites in the electrostatics (M-site).
Charges of each site. We must take into account that for water the charge is position dependent, and that will have to be recalculated every time the coordinates change.
Polarizabilities and polarizability factors

3. Domain decomposition and alignment. This must be done at each energy calculation.
a) Order Monomers by type, and modify properties of the system accordingly.
b) Domain decomposition

Neighborhood list. Use the monomer indexes.
Update and alignment functions
Data structures to perform this and not take into account double counting.

4. 1b energy calculation
a) Modification of the Partridge-Schwenke potential energy surface code to take advantadge of SIMD
b) Modification of the 1b potential polynomials to perform SIMD operations (for Phase-I we won't need this since we are going to work with ion-water potentials, and the ions have always a 1B energy of 0.
c) Implementation. Write the function that gets the 1B energy.

5. 2b energy calculation
a) Modification of MB-pol 2B polynomials to take advantadge of SIMD operations.
b) Modification of MB-nrg ion-water 2B polynomials to take advantadge of SIMD operations
c) Implementation. Write the function that gets the 2B energy.

6. 3b energy calculation
a) Modification of MB-pol 3B polynomials to take advantadge of SIMD operations.
b) Modification of MB-nrg ion-water 3B polynomials to take advantadge of SIMD operations
c) Implementation. Write the function that gets the 3B energy.

7. Electrostatic interaction
a) Rewrite electrostatic energy calculation subroutine optimized for SIMD and OMP parallelization.

8. Parallelization
a) 1-body energy. Divide and conquer. For each type of monomer, divide them in chunks of the same amount of data, and send them to calculate. As an example, 1024 water molecules in 16 cores: send 64 water molecule monomer calculation to each core.
b) 2-body energy. Each core will receive a chunk of data. Here is where the neighborhood list will play the role. Send each neighborhood to a different core.
c) 3-body energy. Use the neighborhood lists. I will need help to avoid double counting here..
d) Electrostatics

Charge charge interactions should be trivial to parallelize
Charge dipole interactions should also be trivial to parallelize
Induced dipoles is what might be a little bit more difficult, since we either have to do a cholesky decomposition (faster for small systems) or solve iteratively (faster for large systems)

9. Generation of logs and meaningful error messages
This will require to test all possible errors that the user can make, and make sure that the code understands why it is failing, and throws the corresponding error message. @zonca suggested using the BOOST log libraries. What do you think @dgasmith ?

Timeline

Here I will put the date and what should be done by then.

By 08/18/2017

Code Structure and Timeline defined.
Log library implemented (spdlog)
Data structure for domain decomposition decided ( @dgasmith , could you send me the references you mention?)

By 08/25/2017

Implement alignment and domain decomposition

By 09/08/2017

Modify Patridge-Shwenke PES to use SIMD.
Implement 1B energy for water

By 09/22/2017

Adapt 2B polynomials for water to use SIMD
Adapt 2B polynomials for ion-water to use SIMD
Make sure alignment and domain decomposition for 2B also works
Implement the 2B correction calculation in code

By 09/29/2017

Buffer week. Finish everything that is not complete from above.

By 10/06/2017

Implement dispersion energy calculation

Decide how to input data (database, inside the code...)
Implement the function to efficiently get dispersion
Add the dispersion energy to the 2B energy contribution

By 10/20/2017

Implement 3B polynomial and adapt them to use SIMD
Make sure domain decomposition works also for 3B

By 11/17/2017

Rewrite electrostatics function to use SIMD
Buffer time in case something else is missing

By 12/8/2017

Implement openMP paralellization for 1B, 2B, 3B, dispersion and electrostatics

By 12/15/2017

Benchmarks, test cases, timings and scalings

@dgasmith , sounds reasonable? Some things might take longer, some might take less time. I think this timeline is something realistic.

Write unittests for dipole functions

The dipole functions are currently not used or tested. It would be nice to write a test at least for water and make sure that it is working properly and as expected.
The comparison should be done with the values from DLPOLY

Domain decomposition implementation [WIP]

Proceeding to the implementation of an efficient neighbor list.

I have been looking at different algorithms used in the MD packages to keep track of the neighbors. I put here some of the most interesting ones:

link-cell algorithm (DLPOLY2) is useful for short cutoff distances in comparison with the cell size.
Brode-Ahlrichs scheme (DLPOLY2) is useful for long cutoff distances in comparison with the cell size.
Verlet lists

Since we are going to aim for big systems, and even if the box is small, I would go for the link cell algorithm, since it will also work for non-pbc. @dgasmith , do you have in mind any other algorithm that we can implement (something maybe more recent)? You mentioned some papers about this.

Update 20170821

We decide to go with KD-Tree to keep track of neighbors and perform the domain decomposition.
The KD-Tree will give us a breakdown of all the molecules in the system. The procedure to follow will be the following:

1-body energy

For the 1B energy, we will do it all at once, since we don't need to look for pairs or neighbors. We will send chunks of data to each core that will use SIMD instructions.

2-body energy

Here is where we will take advantage of the kd-tree.

Set a reasonable cutoff Rc that should work for all the dimers (since none of our polynomials right now has a cutoff larger than 9.5, setting it at 10 or 10.5 should be OK)
Get the dimers
We will get the dimers as we build the tree:

Add monomers[0] to tree (create tree)
For (i = 1, i < monomers.size(), i++) do 
  Find list of monomers in tree within a distance Rc
  Add those pairs to the appropiate dimer vector (h2o - h2o, h2o - cl ...) 
                      // Maybe we can have a vector of vectors, so it will be 
                      // easy to align
  Add monomer[i] to tree
Done

This should do the work for the 2B terms. But I am afraid that is not efficient, since kd-trees are not prepared to be modified. The only way that comes to my mind is the following:

Generate kd-tree with all data
For each monomer i (we know the index), perform a radial search in the tree within the cutoff Rc. The code will throw a vector with pairs of all the indexes j and distances <j,d>. Then, for each pair it throws, if j <= i, we discard that and we don't count it. But I have doubts about the efficiency of this... @dgasmith , what do you think?

3-body energy

I will update later this part, once we figure out how to do the 2b.

Update 20170824

The algorithm implemented is the following:

Generate kd-tree with all data
For each monomer i:

Perform radial search around i within the cutoff defined (default 10.0 Ang)
For all the matches j:
- If INDEX(j) > INDEX(i), add it to the dimer vector (this will take care of double counting)
- If the dimer is added, perform a radial search around j
- For all the matches k, add the corresponding trimer if INDEX(k) > INDEX(j)

This seems to do the work.

icpc fails to compile code when optimization is activated

Hey @zonca ,

This repo compiles well with clang, g++ and intel without optimizzations. However, I run into an error when I optimize the code with optimization (-O1, -O2 or -O3).

icpc: error #10106: Fatal error in /data/software/repo/intel/2017.0.098/compilers_and_libraries_2017.0.098/linux/bin/intel64/mcpcom, terminated by kill signal

Googling a little bit seems that is a memory error, but using the keyword -mcmodel=large also doesn't help.
I tried multi-file optimization (-ipo) and it still fails. Any idea?
@agoetz , @darcykimball , maybe you know how to fix this?

Thanks!

Bug in Domain decomposition

There is a bug in domain decomposition that only appears for a large system. I am investigating the issue.

Header clean up to reduce compilation time

Hey @dgasmith ,

As we have discussed, we have a problem with the compilation time of the package using g++. In our local machines and the supercomputers, this is not really a problem, since although it takes a little bit to compile, it ends compiling. However, in travis it takes too long. The two options you suggested are:

Try to reduce the number of header files. Sincerely, I didn't fully understand what do you mean with that, and I would really appreciate if you can have a look at this possible solution.
Convert the polynomial files to C. If option 1 doesn't help, I can rewrite the polynomial files to be C-like, and then compile them with gcc. However, I will need help to hook this in CMake, since I don't really know how to tell cmake to compile some files with gcc and some with g++ (unless it detects it automatically through the extension).

If you can have a look at Option 1 one of these days, I would really apreciate it. All the polynomial files are in the potential folder, under the folder name of 1b, 2b, and 3b. The polynomial files start always with poly-... All the polynomial files are used by the x1b-... x2b-... files, which contain the object that has the energy evaluation function.
Thanks!

PBC dispersion

Hey @dgasmith ,

I remember that when we talked about periodic boundary conditions, you mentioned that usually, to ensure integration, one sets the dispersion at the cutoff to 0, and shifts the whole potential.
We can talk about the best way to do this on Monday, if that is OK for you. Here is Spring Break, and I don't know if you guys have this too, but I will be at school for sure.

As an update for PBC, I have the 1B and 2B (poly + dispersion) working, but probably in not the most efficient way. I will implement the 3B, and then tackle the electrostatics.

MB dynamic polymorphism

I think I had asked Marc about this, but I'll explain here as a PSA:

Do the many-body energy functions need to depend on the dynamic types (runtime-type) of the monomer "units"? There's a polymorphic base class, Monomer, with a virtual member function Calc1BEnergy, so the built-in dispatch system works:

Butyl butylFragment{...};
Hydroxyl hydroxylFragment{...};

Monomer& m1 = butylFragment; // Dynamic type is Butyl 
Monomer& m2 = hydroxyFragment;  // Dynamic type is Hydroxyl
butylFragment.Calc1BEnergy(); // Calls Butyl::Calc1BEnergy
hydroxylFragment.Calc1BEnergy(); // Calls Hydroxyl::Calc1BEnergy

Fine. We can get the right 1B energies just by manipulating polymorphic references.

What about 2B energies? Maybe we'd like it to look like:

Calc2BEnergy(butylFragment, hydroxyFragment); // The arguments' static types are that of Monomer&

And expect it do the right thing. For this, Calc2BEnergy would have to inspect the arguments' dynamic types in order to know which 2B function to pick; specifically, the one that describes a butyl interacting with a hydroxy. C++ does not support this type of dispatch natively; they're usually called "multimethods", i.e. methods that may depend on the dynamic type of more than one object.

A classical way to implement multimethods is the Visitor pattern, but IMO it's kinda convoluted for this purpose, and not natural: there is a natural (functional) symmetry arising from physical fact that the code is implementing energy functions: no fragment (modeled by objects in C++) is special. Also, Visitor, AFAIK, doesn't scale well (WRT to adding more types). Example implementations attest to this (IMO).

There are many ways to address the issue of multimethods, but basically I wanted to know if they were necessary in the first place, for this project.

Add monomer identification from site index

I would like a function or vector to be added which would let the user identify the monomer name and index to which an atomic/site index belongs to. e.g.

for example, for a system containing 1h2o and 1 co2 (in that order) we could have a list with the monomer id and index to which each site belongs:

mon_id_list=[h2o, h2o, h2o, h2o, co2, co2, co2]
mon_index_list=[0, 0, 0, 0, 1, 1, 1]

So that if the user wanted to know which monomer site 4 belonged to they could just use mon_id_list[3] and mon_index_list[3] to get that value.

Vectorization of water monomer energy [WIP]

Now the reference water monomer energy calculation is implemented, we will proceed to vectorize this calculation.

@dgasmith , will we use autovectorization from intel compilers, or we set everything manually?

Set/Get coordinates and get gradients from system needs modification

The get/set coordinates from the system currently implemented return the coordinates AS THEY ARE. This means that will return the reordered coordinates, and add them in the order of the input, messing with the order of the monomers inside the code.

This needs to be addressed.

Set coordinates should take the coordinates in the original input order, and set them in the system in the actual system order. This can be done by using the original order variable.
Get coordinates should take the coordinates in the system order, and send them in the original input order.
Gradients should be returned in the original input order, not in the actual order.

I will work on this and fix it.

CRITICAL BUG in energy2b.cpp

Energy2b.cpp has a problem. When the pair is not found in that function, it simply returns 0. The problem appears when the monomers m1 and m2 are swapped, which causes the gradients to not be reswapped.

return 0.0;

should be changed to

energy = 0.0;

This problem only appears when we only have TTM potential and we dont have MB-nrg potential implemented. @chemphys will fix this in an upcoming release.

OpenMM plugin for clusters MB-pol/MB-nrg [WIP]

Opening this issue to keep track of the OpenMM plugin implementation.

@dgasmith , I don't know which is the best practice for this, so any input and advice you can give to me regarding on how to structure it would be appreciated.

My idea is to create a new folder in (src?) with the openMM plugin. Do you think that is OK, or is better to create a folder in the home directory of the code called "clusters_ultimate" and another called "OpenMM_plugin"?

The idea I have in mind of this repo is that we will have a library that will contain the black box (input XYZ, output gradients and energy) that will be linked to different codes (for now only openMM, but maybe later we want to put it somewhere else). Which is the best way to structure this?

Thanks!

correlation.dat 2nd column incorrect for fit-2b-ttm

for fit-2b-ttm, the predicted energy from the ttm model(2nd column of correlation.dat) is printed incorrectly. when printing to corrleation.dat, the dispersion is uses the wrong parameters, change final_l to final_nl.

Improve generalization in energy calls

As the code is today, the monomer information, polynomial coefficients, and so on, are hardcoded inside the classes and the code. We should find a way to generalize that, so adding a new monomer does not imply changing the source code and recompile.

Add two-body PEFs for water [WIP]

I will start to implement the 2B potentials for water (MB-pol) in the software.

Making Clusters_Ultimate public

@dgasmith , I will proceed to make this repo public. Before that, I want to ask you if there is anything I should do before (like protecting the master branch, or something like that). @zonca , @agoetz , feel free to add any comment!

Thanks!

Tang-Toennies function modification [WIP]

The Tang-Toennies damping function needs to be modified. Not a priority, but will be done.

Periodic Boundary Conditions Domain Decomposition

Hey @dgasmith , I think I will need a little bit of advice here. I have been trying to adapt the kd-tree to work on PBC, but it is way more complicated than I expected. I am looking for methods to cluster the molecules fast and in PBC, but I cannot find anything reliable.
Any idea of which kind of data structure would be good for this?

There are a few in Python, but I don't think if it would be worth to translate them to C++

Add Function to set positions of the virtual sites [WIP]

Will need a function, previous to start implementing electrostatics, that sets the positions of the virtual sites (for now, only water has M sites).

Travis CI compilation time

Hey @dgasmith ,

I updated the travis script and the tests. Now I use python to compare the files.
In any case, there is something I don't understand. To compile the code without optimizations with g++ in my machine takes less than a minute. However, travis seems to need way more time. Any idea why?

Thanks!

Timings Info

Hey @dgasmith ,

I made a few modifications to the code:

AddClusters has been modified to only take into account the first N monomers for i in <i,j> pairs, and <i,j,k> trimers. These doesn't speed down the code and allows to deal with as many molecules as we want. N, for now, is fixed in the initialization of the system to 4096, which is the maximum number I saw that was not affecting the speed. If N = 8192, the speed starts decreasing.
I added maxNDimEval_ and maxNDimEval_ variables in the system. This is the maximum number of dimers/trimers that will be evaluated at once. For now is set to 1024 in the initialization.
The monomer energy calculation had problems for systems larger than 8192 water molecules, so I also added maxNMonEval_ and set it to 1024. The reason is that water is the only monomer that will have an "special" energy function, since all the other monomers will be evaluated through polynomials. I keep 1024 for consistency with dimers and trimers.

These modifications seem to have a very positive effect on the timings. I have tested up to 64K molecules, and the timing seems to increase linearly with N for 1B, 2B and 3B. I will collect some data and make some figures to show the timings for each component as a function of size.
The largest system I plan to try for now is 133K water molecules, which I think is a nice goal for now.

All these changes are in a local branch in my machine, that I will merge with the master and push it to the pull request if you think that's OK.

Just a note, all these results are obtain in my local machine, compiled with intel with optimizations. Without optimizations everything is slowed down about an order of magnitude. I am excited to see if we can make the electrostatics as fast as the 3B (for now, the bottleneck by an order of magnitude).

I will keep you posted. See you!

Energy calculations with water and cesium

There is a discrepancy in energy calculations with H2O and Cs between clusters and clusters_ultimate.
However, output is correct when using a different alkali metal such as Na

Test case used: Cesium test case.zip
Clusters output:

E water: -0.0465113
E water-ion: 14.4941
E ion-ion: 0
E electrostatic: -42.9099
Binding_Energy= -28.4624
Frame 0: E_nograd= -28.4624
E water: -0.0465113
E_disp wat[0] - ion = -0.000568402
E_disp wat[1] - ion = -0.00318064
E_disp wat[2] - ion = -1.47787
E_disp wat[3] - ion = -1.45717
E_disp wat[4] - ion = -0.0464568
E 2body water ion =  13.7215
E 2body water ion dispersion =  -2.98524
E 3body water ion =  0
E water-ion: 10.7362
E ion-ion: 0
E electrostatic: -42.9099
Binding_Energy= -32.2202
Cs   Analit: 0.428654 Numerical: 0.428677 Diff: 2.28693e-05
Cs   Analit: 5.03951 Numerical: 5.03955 Diff: 3.9877e-05
Cs   Analit: -9.00079 Numerical: -9.00075 Diff: 3.53496e-05
O   Analit: 0.39095 Numerical: 0.392244 Diff: 0.00129382
O   Analit: -0.326983 Numerical: -0.325939 Diff: 0.00104354
O   Analit: 0.170085 Numerical: 0.170316 Diff: 0.000230181
H   Analit: -0.0500947 Numerical: -0.0499067 Diff: 0.000188028
H   Analit: 0.375746 Numerical: 0.3767 Diff: 0.000953376
H   Analit: -0.135325 Numerical: -0.135137 Diff: 0.000188332
H   Analit: -0.45655 Numerical: -0.455379 Diff: 0.00117076
H   Analit: -0.143246 Numerical: -0.143148 Diff: 9.75213e-05
H   Analit: -0.0226415 Numerical: -0.0225845 Diff: 5.70395e-05
O   Analit: 0.0257439 Numerical: 0.027029 Diff: 0.00128506
O   Analit: -0.12931 Numerical: -0.128042 Diff: 0.00126769
O   Analit: -0.0193054 Numerical: -0.0192914 Diff: 1.39854e-05
H   Analit: 0.148143 Numerical: 0.148289 Diff: 0.000146008
H   Analit: -0.138494 Numerical: -0.137316 Diff: 0.00117776
H   Analit: 0.0125376 Numerical: 0.0125408 Diff: 3.2205e-06
H   Analit: 0.0256808 Numerical: 0.0268692 Diff: 0.00118839
H   Analit: -0.010717 Numerical: -0.010587 Diff: 0.00013
H   Analit: 0.0162808 Numerical: 0.01629 Diff: 9.13447e-06
O   Analit: -3.29213 Numerical: -3.29072 Diff: 0.00140965
O   Analit: -5.85683 Numerical: -5.85568 Diff: 0.00114763
O   Analit: 1.12008 Numerical: 1.12013 Diff: 5.02506e-05
H   Analit: -0.541738 Numerical: -0.540562 Diff: 0.00117689
H   Analit: -0.486096 Numerical: -0.485989 Diff: 0.000106987
H   Analit: -0.0394601 Numerical: -0.0394418 Diff: 1.83287e-05
H   Analit: -0.225402 Numerical: -0.225069 Diff: 0.000332675
H   Analit: -0.773361 Numerical: -0.772376 Diff: 0.000984915
H   Analit: 0.135635 Numerical: 0.135697 Diff: 6.18113e-05
O   Analit: 3.10853 Numerical: 3.10973 Diff: 0.00119704
O   Analit: 1.85154 Numerical: 1.85211 Diff: 0.000563647
O   Analit: 5.40948 Numerical: 5.41033 Diff: 0.00085098
H   Analit: 0.116344 Numerical: 0.116562 Diff: 0.000218271
H   Analit: 0.334541 Numerical: 0.335006 Diff: 0.000465098
H   Analit: 0.464371 Numerical: 0.46501 Diff: 0.000639074
H   Analit: 0.537594 Numerical: 0.538633 Diff: 0.0010396
H   Analit: 0.237585 Numerical: 0.237696 Diff: 0.000111207
H   Analit: 0.444575 Numerical: 0.444736 Diff: 0.000160372
O   Analit: -0.0819296 Numerical: -0.0809258 Diff: 0.00100379
O   Analit: 0.00579098 Numerical: 0.0063284 Diff: 0.000537421
O   Analit: 0.429198 Numerical: 0.430221 Diff: 0.00102294
H   Analit: -0.0329703 Numerical: -0.0325514 Diff: 0.000418926
H   Analit: -0.0200361 Numerical: -0.0197199 Diff: 0.000316202
H   Analit: 0.472464 Numerical: 0.473054 Diff: 0.000589926
H   Analit: -0.11361 Numerical: -0.112937 Diff: 0.000673589
H   Analit: 0.031096 Numerical: 0.0313637 Diff: 0.00026772
H   Analit: 0.538543 Numerical: 0.538927 Diff: 0.000384472

Clusters_ultimate output:

1B = 0.0207602
2B = -3.05251
3B = 0
Elec = -42.9099
1B = 0.0207602
2B = -3.05251
3B = 0
Elec = -42.9099
TEST test_005_energy FAILED
OUTPUT...
Energies without gradients:
disp = -3.04143    2b = -0.0110763
system[....0]=         -4.59417e+01    kcal/mol
Energies with gradients:
disp = -3.04143e+00    2b = -1.10763e-02
system[....0]=         -4.59417e+01    kcal/mol

Atom             GradientX           GradientY           GradientZ
Cs              -2.705e-01          -9.459e+00           1.546e+01
O                3.922e-01          -3.259e-01           1.703e-01
H               -4.991e-02           3.767e-01          -1.351e-01
H               -4.554e-01          -1.431e-01          -2.258e-02
O                2.703e-02          -1.280e-01          -1.929e-02
H                1.483e-01          -1.373e-01           1.254e-02
H                2.687e-02          -1.059e-02           1.629e-02
O                7.206e+00           1.268e+01          -2.387e+00
H                4.473e-01           1.022e+00          -3.194e-01
H                5.078e-01           8.404e-01          -1.642e-01
O               -6.763e+00          -4.274e+00          -1.205e+01
H               -5.959e-01          -2.569e-01          -1.022e+00
H               -3.960e-01          -2.047e-01          -9.767e-01
O               -8.622e-02           1.472e-02           6.223e-01
H                4.232e-02          -7.455e-02           3.825e-01
H               -1.806e-01           7.981e-02           4.364e-01

Two pointers to same value

Hey @zonca ,

I have a question. The software I am writting have monomer, molecule and system classes. Right now, the coordinates are stored in each one of the monomers. To accelerate things in the electrostatics, I would like to have an array of coordinates also in the system class. The coordinates of the monomer are stored in the heap, in a shared_ptr. Let's say I have the following peace of code:

/* In monomer */
shared_ptr<double> xyz = std::shared_ptr<double> (new double[n_sites * 3],
[]( double *p ) { delete[] p; });
std::copy(coords, coords + n_real_sites * 3, xyz.get() );
/* end monomer */

After this, in system, I can have:
/* In system */
shared_ptr<double> xyz = std::shared_ptr<double> (new double[n_total_sites * 3], []( double *p ) { delete[] p; });

Can I do the following? I want to make the pointer xyz in the system point to all the same values as the pointers in xyz in the monomers, so if I update the values to which the pointers in monomer point to, I am also changing the values to which the pointers in system are pointing to. Simplified, would be:

double a[2] = {1.0, 2.0} ; \\ monomer 1 xyz
double b[2] = {3.0, 4.0}; \\ monomer 2 xyz
double c[4]; \\ system xyz
c = a;
c + 1 = a + 1;
c + 2 = b;
c + 3 = b + 1;

If now I do
b[1] = 5.0;

Will c[3] be 5.0?
And in this case, will it work with pointers in the heap? Is there any better way to do it?

Thanks!

Installation error on MacOS: "Could NOT find FFTW" when it is installed.

Dear Sir,

I failed to install MBX on my MacOS 12.2.1 even after FFTW was installed by brew install fftw.

The output of ls -l /usr/local/include/fftw3* :

lrwxr-xr-x  1 mac  staff  43 Jan 16 21:30 /usr/local/include/fftw3-mpi.f03 -> ../Cellar/fftw/3.3.10/include/fftw3-mpi.f03
lrwxr-xr-x  1 mac  staff  41 Jan 16 21:30 /usr/local/include/fftw3-mpi.h -> ../Cellar/fftw/3.3.10/include/fftw3-mpi.h
lrwxr-xr-x  1 mac  staff  37 Jan 16 21:30 /usr/local/include/fftw3.f -> ../Cellar/fftw/3.3.10/include/fftw3.f
lrwxr-xr-x  1 mac  staff  39 Jan 16 21:30 /usr/local/include/fftw3.f03 -> ../Cellar/fftw/3.3.10/include/fftw3.f03
lrwxr-xr-x  1 mac  staff  37 Jan 16 21:30 /usr/local/include/fftw3.h -> ../Cellar/fftw/3.3.10/include/fftw3.h
lrwxr-xr-x  1 mac  staff  44 Jan 16 21:30 /usr/local/include/fftw3l-mpi.f03 -> ../Cellar/fftw/3.3.10/include/fftw3l-mpi.f03
lrwxr-xr-x  1 mac  staff  40 Jan 16 21:30 /usr/local/include/fftw3l.f03 -> ../Cellar/fftw/3.3.10/include/fftw3l.f03
lrwxr-xr-x  1 mac  staff  40 Jan 16 21:30 /usr/local/include/fftw3q.f03 -> ../Cellar/fftw/3.3.10/include/fftw3q.f03

These are the steps I did:

Step 1:

cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS=" -fPIC -O2 -Wall" -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc -DUSE_OPENMP:BOOL=TRUE -H. -Bbuild

Output 1:

-- Setting option ENABLE_XHOST: ON
-- Performing Test CMAKE_C_FLAGS [-xHost] - Failed
-- Performing Test CMAKE_C_FLAGS [-march=native] - Success, Appending
-- Performing Test CMAKE_CXX_FLAGS [-xHost] - Failed
-- Performing Test CMAKE_CXX_FLAGS [-march=native] - Success, Appending
-- Setting option CMAKE_BUILD_TYPE: Debug
-- Setting option CMAKE_INSTALL_LIBDIR: lib
-- Setting option CMAKE_INSTALL_OBJDIR: obj
-- Setting option CMAKE_COMPILE_TESTS: TRUE
-- Setting option CMAKE_INSTALL_UNITTESTS: bin/unittests
-- Setting option CMAKE_INSTALL_INCLUDEDIR: include
-- Setting option PYMOD_INSTALL_LIBDIR: /
-- Setting option ENABLE_GENERIC: OFF
-- Setting option MBX_CXX_STANDARD: 11
-- MBX install: /Users/mac/.jackprogram/MBX/install
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/mac/.jackprogram/MBX/build

Then change to the directory by cd build/.
Step 2:

make

Output 2:

[ 12%] Creating directories for 'MBX-core'
[ 25%] No download step for 'MBX-core'
[ 37%] No update step for 'MBX-core'
[ 50%] No patch step for 'MBX-core'
[ 62%] Performing configure step for 'MBX-core'
loading initial cache file /Users/mac/.jackprogram/MBX/build/MBX-core-prefix/tmp/MBX-core-cache-Debug.cmake
CMake Warning (dev) in CMakeLists.txt:
  No project() command is present.  The top-level CMakeLists.txt file must
  contain a literal, direct call to the project() command.  Add a line of
  code such as

    project(ProjectName)

  near the top of the file, but after cmake_minimum_required().

  CMake is pretending there is a "project(Project)" command on the first
  line.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Single precision FFTW not found
-- Double precision FFTW not found
-- Long double precision FFTW not found
CMake Error at /opt/local/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find FFTW (missing: FFTW_INCLUDES FFTW_LIBRARIES)
Call Stack (most recent call first):
  /opt/local/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /Users/mac/.jackprogram/MBX/cmake/FindFFTW.cmake:158 (find_package_handle_standard_args)
  CMakeLists.txt:5 (find_package)


-- Configuring incomplete, errors occurred!
See also "/Users/mac/.jackprogram/MBX/build/MBX-core-prefix/src/MBX-core-build/CMakeFiles/CMakeOutput.log".
See also "/Users/mac/.jackprogram/MBX/build/MBX-core-prefix/src/MBX-core-build/CMakeFiles/CMakeError.log".
make[2]: *** [MBX-core-prefix/src/MBX-core-stamp/MBX-core-configure] Error 1
make[1]: *** [CMakeFiles/MBX-core.dir/all] Error 2
make: *** [all] Error 2

CMakeOutput.log and CMakeError.log are in following links:
https://cdn.jsdelivr.net/gh/HuangJiaLian/DataBase0@master/uPic/2022_03_04_23_CMakeOutput.log
https://cdn.jsdelivr.net/gh/HuangJiaLian/DataBase0@master/uPic/2022_03_04_23_CMakeError.log

Do I need to change the makefile? I am not sure what should I do to solve this problem. Thank you very much!

@chemphys

Add cutoff for electrostatics.

A cutoff for electrostatics should be added to the code. First approximation should use the minimum image convention, and later on changed to a more efficient algorithm.

paesanilab / mbx Goto Github PK

mbx's People

Contributors

Stargazers

Watchers

Forkers

mbx's Issues

Structure and Timeline for the development of the software

Main goal to accomplish by 12/31/2017

Code Structure

Timeline

By 08/18/2017

By 08/25/2017

By 09/08/2017

By 09/22/2017

By 09/29/2017

By 10/06/2017

By 10/20/2017

By 11/17/2017

By 12/8/2017

By 12/15/2017

Proceeding to the implementation of an efficient neighbor list.

Update 20170821

1-body energy

2-body energy

3-body energy

Update 20170824

Recommend Projects

Recommend Topics

Recommend Org