Comments (8)
Hi! Thank you for reporting back. Could you please send us the input line for MACE?
When swa
is switched on the loss changes, so it's numerical value will be different, which is not a problem.
from mace.
Hi @Vinceuwe,
Thanks for your interest in MACE!
The model that was saved corresponds to the model with the overall lowest loss. This means that it will depend on how you weight the forces and energies in your loss. As the default is using a larger weight on forces than energies, the model with the best RMSE on forces (at epoch 176 for you) was saved.
The reason it might be confusing for you is the usage of swa. The way it works in your case is the following :
- For the first 800 epochs, the loss was computed by putting a weight of 10 on the forces and 1 on the energies, hence your forces being much better.
- After 800 epochs, the loss weights changed to 1000 on the energies and 1 on the forces. Hence the better energies.
Because it seems the model is struggling with learning energies, this results in a significant deterioration in the accuracy of the forces. The best model saved was thus the one at 176.
Could you please send us the input file for the trained model? Also, could you tell us your system size and check if you are using the correct E0s.
from mace.
Hi, Thanks for your reply. This information you provide definitely help me understand MACE more. The following is the submitting script:
python /raven/u/hwan/mace/scripts/run_train.py
--name="MACE_model"
--train_file="atoms_training_32.xyz"
--valid_fraction=0.05
--test_file="atoms_test_32.xyz"
--config_type_weights='{"Default":1.0}'
--energy_key="DFT_energy"
--forces_key="DFT_forces"
--model="MACE"
--hidden_irreps='128x0e + 128x1o'
--r_max=5.0
--batch_size=30
--max_num_epochs=1000
--swa
--start_swa=800
--ema
--ema_decay=0.99
--amsgrad
--restart_latest
--device=cuda \
2022-09-20 10:27:01.622 INFO: CUDA version: 11.1, CUDA device: 0
2022-09-20 10:27:07.698 INFO: Using isolated atom energies from training file
2022-09-20 10:27:07.725 INFO: Loaded 931 training configurations from 'atoms_training_32.xyz'
2022-09-20 10:27:07.725 INFO: Using random 5.0% of training set for validation
2022-09-20 10:27:07.864 INFO: Loaded 207 test configurations from 'atoms_test_32.xyz'
2022-09-20 10:27:07.864 INFO: Total number of configurations: train=885, valid=46, tests=[Default: 131, slab_MD: 76]
2022-09-20 10:27:07.870 INFO: AtomicNumberTable: (8, 77)
2022-09-20 10:27:07.871 INFO: Atomic energies: [-0.08969644, -0.33524439]
2022-09-20 10:27:24.751 INFO: WeightedEnergyForcesLoss(energy_weight=1.000, forces_weight=10.000)
2022-09-20 10:27:24.908 INFO: Average number of neighbors: 39.096
For my training set, it has atoms ranging from 4 atoms to 200 atoms, they are quite diverse which is the result of GAP workflow over 30 iterations
from mace.
Thanks for you reply. Your input script seems correct to me. However I think you might have a problem with your atomic energies. Could you please try to run again while adding to your input script --E0s="average"
. This will do a linear fit on your training data to compute your E0s.
from mace.
Hi I test --E0s="average"
with Isolated atom in my training set:
2022-09-20 20:22:45.071 INFO: Epoch 994: loss=4.3130, RMSE_E_per_atom=68.1 meV, RMSE_F=450.6 meV / A
2022-09-20 20:23:09.699 INFO: Epoch 996: loss=4.1301, RMSE_E_per_atom=66.6 meV, RMSE_F=445.2 meV / A
2022-09-20 20:23:34.471 INFO: Epoch 998: loss=4.0964, RMSE_E_per_atom=66.3 meV, RMSE_F=447.6 meV / A
2022-09-20 20:23:46.774 INFO: Training complete
2022-09-20 20:23:46.775 INFO: Loading checkpoint: checkpoints/MACE_model_run-123_epoch-460.pt
2022-09-20 20:23:47.353 INFO: Loaded model from epoch 460
2022-09-20 20:23:47.353 INFO: Computing metrics for training, validation, and test sets
2022-09-20 20:24:03.228 INFO: Evaluating train ...
2022-09-20 20:24:19.352 INFO: Evaluating valid ...
2022-09-20 20:24:22.500 INFO: Evaluating Default ...
2022-09-20 20:24:24.432 INFO: Evaluating slab_MD ...
2022-09-20 20:24:24.914 INFO:
+-------------+---------------------+------------------+-------------------+
| config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % |
+-------------+---------------------+------------------+-------------------+
| train | 110.3 | 211.1 | 1.49 |
| valid | 122.6 | 186.5 | 4.17 |
| Default | 103.0 | 101.5 | 3378.72 |
| slab_MD | 52.5 | 177.4 | 13.87 |
+-------------+---------------------+------------------+-------------------+
2022-09-20 20:24:24.914 INFO: Saving model to checkpoints/MACE_model_run-123.model
without Isolated atom in my training set:
2022-09-20 20:44:35.017 INFO: Epoch 996: loss=219.9501, RMSE_E_per_atom=503.1 meV, RMSE_F=517.0 meV / A
2022-09-20 20:45:00.217 INFO: Epoch 998: loss=236.2850, RMSE_E_per_atom=521.5 meV, RMSE_F=523.2 meV / A
2022-09-20 20:45:12.731 INFO: Training complete
2022-09-20 20:45:12.732 INFO: Loading checkpoint: checkpoints/MACE_model_run-123_epoch-800.pt
2022-09-20 20:45:13.144 INFO: Loaded model from epoch 800
2022-09-20 20:45:13.144 INFO: Computing metrics for training, validation, and test sets
2022-09-20 20:45:28.456 INFO: Evaluating train ...
2022-09-20 20:45:44.218 INFO: Evaluating valid ...
2022-09-20 20:45:47.300 INFO: Evaluating Default ...
2022-09-20 20:45:49.142 INFO: Evaluating slab_MD ...
2022-09-20 20:45:49.613 INFO:
+-------------+---------------------+------------------+-------------------+
| config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % |
+-------------+---------------------+------------------+-------------------+
| train | 68.4 | 120.9 | 0.85 |
| valid | 475.9 | 281.9 | 6.30 |
| Default | 307.2 | 174.9 | 5819.96 |
| slab_MD | 44.9 | 185.4 | 14.50 |
+-------------+---------------------+------------------+-------------------+
2022-09-20 20:45:49.613 INFO: Saving model to checkpoints/MACE_model_run-123.model
These results still looks not that satisfying, do you have any possible suggestions for this?
from mace.
Could you please link me your full log file and your train file please?
from mace.
he, can you provide your email?
from mace.
Yes, it is [email protected] .
from mace.
Related Issues (20)
- need to be able to specify what head is used for _prediction_ with a multihead model HOT 1
- error table sometimes missing stress/virial columns HOT 1
- mace_run_train: unrecognized arguments: `--foundation_model` on pip installer version HOT 1
- AtomicEnergies input for training new model
- bug in print_git_commit() HOT 3
- What is `compiled` model in new version 0.3.5? HOT 1
- How do you generate the input xyz files for MD22? HOT 1
- How do you make mutli-GPU training? HOT 2
- a couple of weird statements in new multihead fine-tuning interface code HOT 2
- training with foudational_model HOT 12
- How to run multi-GPU with mace-lammps?
- Error reading 'extended' xyz format files HOT 23
- What are the node energies? HOT 2
- Training flag not working HOT 2
- Dipole fitting on multi-GPU
- Is 0.694 timesteps/s a normal speed?
- select pretrained configs for multihead replay with or without replacement? HOT 2
- adding cmd argument for specifying wandb directory HOT 2
- CUDA Out of Memory with GCMC
- Distributed Training: invalid device ordinal
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mace.