stefanch / sgdml Goto Github PK

View Code? Open in Web Editor NEW

137.0 8.0 35.0 24.62 MB

sGDML - Reference implementation of the Symmetric Gradient Domain Machine Learning model

Home Page: http:/www.sgdml.org

License: MIT License

Python 100.00%

machine-learning molecular-force-fields molecular-dynamics gaussian-process quantum-chemistry

sgdml's Introduction

Symmetric Gradient Domain Machine Learning (sGDML)

For more details visit: sgdml.org
Documentation can be found here: docs.sgdml.org

Requirements:

Python 3.7+
PyTorch (>=1.8)
NumPy (>=1.19)
SciPy (>=1.1)

Optional:

ASE (>=3.16.2) (to run atomistic simulations)

Getting started

Stable release

Most systems come with the default package manager for Python pip already preinstalled. Install sgdml by simply calling:

$ pip install sgdml

The sgdml command-line interface and the corresponding Python API can now be used from anywhere on the system.

Development version

(1) Clone the repository

$ git clone https://github.com/stefanch/sGDML.git
$ cd sGDML

...or update your existing local copy with

$ git pull origin master

(2) Install

$ pip install -e .

Using the flag --user, you can tell pip to install the package to the current users's home directory, instead of system-wide. This option might require you to update your system's PATH variable accordingly.

Optional dependencies

Some functionality of this package relies on third-party libraries that are not installed by default. These optional dependencies (or "package extras") are specified during installation using the "square bracket syntax":

$ pip install sgdml[<optional1>]

Atomic Simulation Environment (ASE)

If you are interested in interfacing with ASE to perform atomistic simulations (see here for examples), use the ase keyword:

$ pip install sgdml[ase]

Reconstruct your first force field

Download one of the example datasets:

$ sgdml-get dataset ethanol_dft

Train a force field model:

$ sgdml all ethanol_dft.npz 200 1000 5000

Query a force field

import numpy as np
from sgdml.predict import GDMLPredict
from sgdml.utils import io

r,_ = io.read_xyz('geometries/ethanol.xyz') # 9 atoms
print(r.shape) # (1,27)

model = np.load('models/ethanol.npz')
gdml = GDMLPredict(model)
e,f = gdml.predict(r)
print(e.shape) # (1,)
print(f.shape) # (1,27)

Authors

Stefan Chmiela
Jan Hermann

We appreciate and welcome contributions and would like to thank the following people for participating in this project:

Huziel Sauceda
Igor Poltavsky
Luis Gálvez
Danny Panknin
Grégory Fonseca
Anton Charkin-Gorbulin

References

[1] Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., Müller, K.-R., Machine Learning of Accurate Energy-conserving Molecular Force Fields. Science Advances, 3(5), e1603015 (2017)
10.1126/sciadv.1603015
[2] Chmiela, S., Sauceda, H. E., Müller, K.-R., Tkatchenko, A., Towards Exact Molecular Dynamics Simulations with Machine-Learned Force Fields. Nature Communications, 9(1), 3887 (2018)
10.1038/s41467-018-06169-2
[3] Chmiela, S., Sauceda, H. E., Poltavsky, I., Müller, K.-R., Tkatchenko, A., sGDML: Constructing Accurate and Data Efficient Molecular Force Fields Using Machine Learning. Computer Physics Communications, 240, 38-45 (2019) 10.1016/j.cpc.2019.02.007
[4] Chmiela, S., Vassilev-Galindo, V., Unke, O. T., Kabylda, A., Sauceda, H. E., Tkatchenko, A., Müller, K.-R., Accurate Global Machine Learning Force Fields for Molecules With Hundreds of Atoms. Science Advances, 9(2), e1603015 (2023) 10.1126/sciadv.adf0873

sgdml's People

Contributors

Stargazers

Watchers

sgdml's Issues

Function : gdml_train.train(task) failed for newest sgml version 0.4.10;

Hi, It looks like gdml_train.train() would return a TypeError: the first argument must be callable

(It works well for previous version 0.4.4 sgml)
Just for newest sgml version 0.4.10; If we just run the demo code as below:

import sys
import numpy as np
from sgdml.train import GDMLTrain

dataset = np.load('d_ethanol.npz')
n_train = 200

gdml_train = GDMLTrain()
task = gdml_train.create_task(dataset, n_train,
valid_dataset=dataset, n_valid=1000,
sig=10, lam=1e-15)

model = gdml_train.train(task)

File "/Users/sgdml/train.py", line 812, in train
callback, disp_str='Generating descriptors and their Jacobians'; TypeError: the first argument must be callable

multiprocessing hangs in iterative solver

For some reason, the second time (?!) the code hits this line:

https://github.com/stefanch/sGDML/blob/master/sgdml/utils/desc.py#L345

during sgdml all, multiprocessing imap blocks the execution and ctrl+c gives this backtrace:

^C[CRIT] Traceback (most recent call last):
         File "/usr/lib/python3.9/multiprocessing/pool.py", line 853, in next
           item = self._items.popleft()
       IndexError: pop from an empty deque

       During handling of the above exception, another exception occurred:

       Traceback (most recent call last):
         File "/home/pie/Venvs/sgdmlenv/lib/python3.9/site-packages/sgdml/cli.py", line 1083, in train
           model = gdml_train.train(
         File "/home/pie/Venvs/sgdmlenv/lib/python3.9/site-packages/sgdml/train.py", line 927, in train
           R_desc, R_d_desc = desc.from_R(
         File "/home/pie/Venvs/sgdmlenv/lib/python3.9/site-packages/sgdml/utils/desc.py", line 345, in from_R
           map_func(partial(_from_r, lat_and_inv=lat_and_inv), R)
         File "/usr/lib/python3.9/multiprocessing/pool.py", line 858, in next
           self._cond.wait(timeout)
         File "/usr/lib/python3.9/threading.py", line 312, in wait
           waiter.acquire()
       KeyboardInterrupt

Disabling parallelism in this function is an acceptable workaround (not much time spent here).

Using numpy compiled with MKL support and 12 cores.

RuntimeError: expected scalar type Double but found Char

Error shows up using v1.0.0 and torch 1.13.1 with CUDA.

To fix it I had to change

sGDML/sgdml/torchtools.py

Line 549 in 8af877a

'agg_mat', torch.zeros((self.n_atoms, self.dim_d), dtype=torch.int8)

to dtype=torch.double.

Reproduce the interatomic distance distribution

Hi, very nice work. I'm a student who is learning machine learning force field recently. I want to reproduce the figures of interatomic distance distribution shown in the paper. But I don't know how to plot this physical quantity. Is there a script in this repo? Thank you very much.

Compilation success, fail when running.

Hi everyone,
I have successfully installed sgdml in my (Windows 10) machine, but when I run it I have the following error message:

C:\Windows\system32>sgdml all datasets/npz/ethanol.npz 200 1000 5000
Traceback (most recent call last):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\David\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\Scripts\sgdml.exe_main.py", line 4, in
File "C:\Users\David\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sgdml\cli.py", line 67, in
from .predict import GDMLPredict
File "C:\Users\David\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\sgdml\predict.py", line 36, in
Pool = mp.get_context('fork').Pool
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\multiprocessing\context.py", line 239, in get_context
return super().get_context(method)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\multiprocessing\context.py", line 193, in get_context
raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'fork'

How can I fix it? Help is greatly appreciated ...
Thank you very much, David.

Issue with torch

Hi there,

totally not an expert, but in order to make the code work I had to change this line

sGDML/sgdml/torchtools.py

Line 890 in 124de3d

c = lat_inv.mm(diffs.t())

c = lat_inv.mm(diffs.t().double())

error with batch prediction in spk.model.NeuralNetworkPotential

When 'trn.AddOffsets(MD17.energy, add_mean=True, add_atomrefs=False)' is used in 'spk.model.NeuralNetworkPotential', the batch predictions only has one value in 'energy'. If this term is not used, it works correctly.

#example script from MD17 in tutorials
from ase import Atoms

load model

model_path = os.path.join(forcetut, "best_inference_model")
best_model = torch.load(model_path).to('cpu')

set up converter

converter = spk.interfaces.AtomsConverter(
neighbor_list=trn.ASENeighborList(cutoff=5.0), dtype=torch.float32
)

res = {}
for i in range(256):
# create atoms object from dataset
structure = ethanol_data.test_dataset[i]
atoms = Atoms(
numbers=structure[spk.properties.Z], positions=structure[spk.properties.R]
)
inputs = converter(atoms)
for list in inputs:
if list in res:
res[list] = torch.cat([res[list],inputs[list]], 0)
else:
res[list] = inputs[list]
res['_pbc'] = res['_pbc'].reshape(-1) #need to reshape too?

print(res['energy'])

res["energy"] = torch.rand(256)

print(res['energy'].shape)

convert atoms to SchNetPack inputs and perform prediction

results = best_model(res)

print(results)

Could not find a version that satisfies the requirement sgdml

Hey Stefan,

When I pip install sgdml, I got an error and could not install it:

"Could not find a version that satisfies the requirement sgdml (from versions: )
No matching distribution found for sgdml"

I have seen the similar questions in StackOverflow, but they just suggest to upgrade the version.
And I tried, but it doesn't work yet. Could you maybe fix with this problem?

My version is
(Python Version: 2.7.15
SciPy Version: 1.1.0
NumPy Version: 1.15.1)

np.asscalar error in execute sgdml all ethanol_dft.npz 200 1000 5000

Hi, Stefanch. When i try to train and validate use sgdml all , it show bugs as below:
"
[DONE] Validation errors (MAE/RMSE): energy 0.159/0.221, forces 0.827/1.200 @ 38.5 geo/s
[CRIT] Traceback (most recent call last):
File "/data/xxx/software/sgdml/sGDML/sgdml/cli.py", line 2216, in main
getattr(sys.modules[name], args['command'])(**args)
File "/data/xxx/software/sgdml/sGDML/sgdml/cli.py", line 617, in all
model_dir_or_file_path = train(
File "/data/xxx/software/sgdml/sGDML/sgdml/cli.py", line 1047, in train
valid_errs = test(
File "/data/xxx/software/sgdml/sGDML/sgdml/cli.py", line 1695, in test
'mae': np.asscalar(e_mae),
File "/home/xxx/anaconda3/envs/sgdml/lib/python3.8/site-packages/numpy/init.py", line 311, in getattr
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'asscalar'
"
And my linux envs in conda as :
numpy 1.23.3
python 3.8.0
scipy 1.9.1
torch 1.12.1
ase 3.22.1

I cannot find how to fix this error(may cli.py has error define attribute), can you offer some sugguestion? Thanks for your reply!

potentially inconsistent labels, ValueError: Sign not allowed in string format specifier

I'm unable to train models, getting the below error for the specified version and command

sgdml 1.0.2 [Python 3.8.12, NumPy 1.24.4, SciPy 1.8.0, PyTorch 1.13.1+cu117, ASE 3.22.1]

python bin/sgdml all md17_ethanol.npz 100 100 100


 STEP 2  Training and validation
----------------------------------------------------------------------------------------------------
Task 1 of 9                       100 + 100 points (training + validation), sigma (length scale): 10
[100%] Generating descriptors and their Jacobians                                                   
[INFO] Using analytic solver (expected memory use: ~167 MB)
[DONE] Assembling kernel matrix                                                           took 1.5 s
[DONE] Training on 100 points                                                                       
[WARN] Potentially inconsistent energy labels detected!
       The predicted energies for the training data are only weakly correlated with the reference
       labels (correlation coefficient -0.25). Note that correlation is independent of scale, which
       indicates that the issue is most likely not just a unit conversion error.
       
       Troubleshooting tips:
       (1) Verify the correct correspondence between geometries and labels in the provided dataset.
       (2) This issue might very well just be a sympthom of using too few trainnig data and your
           labels are correct.
       (3) Verify the consistency between energy and force labels.
           - Correspondence between force and energy labels correct?
           - Accuracy of forces (convergence of your ab-initio calculations)?
           - Was the same level of theory used to compute forces and energies?
       (4) Is the training data spread too broadly (i.e. weakly sampled transitions between example
           clusters)?
       (5) Are there duplicate geometries in the training data?
       (6) Are there any corrupted data points (e.g. parsing errors)?
[WARN] Potentially inconsistent scales in energy vs. force labels detected!
       The integrated force predictions differ from the reference energy labels by factor ~0.00 (for
       the training data), meaning that this model will likely fail to predict energies accurately
       in real-world use.
       
       Troubleshooting tips:
       (1) Verify consistency of units in energy and force labels.
       (2) This issue might very well just be a sympthom of using too few trainnig data and your
           labels are correct.
       (3) Is the training data spread too broadly (i.e. weakly sampled transitions between example
           clusters)?
[100%] Validation errors (MAE/RMSE): energy 18265411872728781722082082816.000/22655316095867156075506565120.000, forces 81967145953778761280568426496.000/167843964717341591313296916480.000[CRIT] Traceback (most recent call last):
         File "/home/hellstrom/software/sGDML/sgdml/cli.py", line 2288, in main
           getattr(sys.modules[__name__], args['command'])(**args)
         File "/home/hellstrom/software/sGDML/sgdml/cli.py", line 689, in all
           model_dir_or_file_path = train(
         File "/home/hellstrom/software/sGDML/sgdml/cli.py", line 1119, in train
           valid_errs = test(
         File "/home/hellstrom/software/sGDML/sgdml/cli.py", line 1688, in test
           ui.callback(
         File "/home/hellstrom/software/sGDML/sgdml/utils/ui.py", line 139, in callback
           color_str(' {:>{width}}'.format(sec_disp_str, width=w), fore_color=GRAY)
       ValueError: Sign not allowed in string format specifier

Exception: 'adj_set'

When I run
'sgdml all datasets/npz/ethanol.npz 200 1000 5000'
The exception occurs.
How I can set the key 'adj_set'?
Best regard.

Missing Iterative Solver

Hello,

The iterative solver was added in 32a489a here:

sGDML/sgdml/train.py

Line 49 in 32a489a

from .solvers.iterative import Iterative

but seems to be missing.

README.md encoding not specified in setup.py

Hello, I was unable to pip-install sgdml (Python 3.6), it gives a UnicodeDecodeError.

Changing
with open(path.join(this_dir, 'README.md')) as f:
to
with open(path.join(this_dir, 'README.md'), encoding='utf8') as f:
in setup.py solves the problem.

parallel prediction not working on Windows

If I use the ASE interface on Windows I get the below error. It works fine if I comment out the line with prepare_parallel() in SGDMLCalculator (such that I run in serial?). Would it be possible to check for the OS and run sgdml-via-ase in serial on Windows?

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\AMS2019.405.r87185\bin\python3.6\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\matti\.scm\python\AMS2019.4.venv\lib\site-packages\sgdml\predict.py", line 121, in _predict_wkr
    glob = globs[glob_id]
NameError: name 'globs' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:/AMS2019.405.r87185/scripting/standalone/external_engines/ase_calculators.py", line 50, in <module>
    ase_calculator = engine_interface.get_ase_calculator(model_name=args.model_name, params_path=args.params_path)
  File "C:\AMS2019.405.r87185\scripting\scm\external_engines\core.py", line 223, in get_ase_calculator
    calc = SGDMLCalculator(params_path, E_to_eV=E_to_eV, F_to_eV_Ang=F_to_eV_Ang)
  File "C:\Users\matti\.scm\python\AMS2019.4.venv\lib\site-packages\sgdml\intf\ase_calc.py", line 70, in __init__
    self.gdml_predict.prepare_parallel()
  File "C:\Users\matti\.scm\python\AMS2019.4.venv\lib\site-packages\sgdml\predict.py", line 756, in prepare_parallel
    gps = n_bulk * n_reps / timeit.timeit(_dummy_predict, number=n_reps)
  File "C:\AMS2019.405.r87185\bin\python3.6\lib\timeit.py", line 233, in timeit
    return Timer(stmt, setup, timer, globals).timeit(number)
  File "C:\AMS2019.405.r87185\bin\python3.6\lib\timeit.py", line 178, in timeit
    timing = self.inner(it, self.timer)
  File "<timeit-src>", line 6, in inner
  File "C:\Users\matti\.scm\python\AMS2019.4.venv\lib\site-packages\sgdml\predict.py", line 699, in _dummy_predict
    self.predict(r_dummy)
  File "C:\Users\matti\.scm\python\AMS2019.4.venv\lib\site-packages\sgdml\predict.py", line 1088, in predict
    _predict_wo_wkr_starts_stops, self.wkr_starts_stops
  File "C:\AMS2019.405.r87185\bin\python3.6\lib\multiprocessing\pool.py", line 735, in next
    raise value
NameError: name 'globs' is not defined

OS: Mac OS 10.14.6

Steps: Can easily be reproduced on my machine using the following script (here using uracil and toluene as an example)

import numpy as np
from sgdml.predict import GDMLPredict

uracil_data= np.load('datasets/uracil.npz')
X_uracil=np.reshape(uracil_data['R'],(len(uracil_data['R']),-1))
toluene_data=np.load('datasets/toluene.npz')
X_toluene=np.reshape(toluene_data['R'],(len(toluene_data['R']),-1))

uracil_npz=np.load('models/uracil.npz')
uracil_model=GDMLPredict(uracil_npz)

E,_=uracil_model.predict(X_uracil[0])
print(f"E={E}")

toluene_npz=np.load('models/toluene.npz')
toluene_model=GDMLPredict(toluene_npz)

#leads to error:
#ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (66,) and requested shape (12000,105)
E,_=uracil_model.predict(X_uracil[0])