deepmodeling / dpgen Goto Github PK

View Code? Open in Web Editor NEW

289.0 13.0 173.0 8.12 MB

The deep potential generator to generate a deep-learning based model of interatomic potential energy and force field

Home Page: https://docs.deepmodeling.com/projects/dpgen/

License: GNU Lesser General Public License v3.0

Python 99.46% Shell 0.03% AMPL 0.49% C 0.01% Perl 0.01% Modula-3 0.01%

python concurrent-learning active-learning

dpgen's People

Contributors

Stargazers

Watchers

Forkers

njzjz amcadmus njustcodingjs haidi-ustc cloudac7 hxtp riddlezyc qshao jameswind liugrouphnu tamaswells obaica gaosilagelangri marisayu angusezhang scut-ccmp baozcwj wanghaopku 0ut0fcontrol y1xiaoc hongzhentian jwz360 manyi-yang nancy877 zezhong-zhang pangchq felix5572 jianxinghuang kevinwenminion cndaqiang hnlab vibsteamer maxffff shenghuanggroup littlegun-issp trollchu jiaminghu121 mkphuthi zhangbei07 chemshift fengyuewuya xcxxcx1996 dmh1998dmh saidigroup kpleo-nanoacademic plin1112 picodase pulseternal haoxy97 hsulab mh-guo boliqq07 kick-h daniel1991zy zhonghengfu ericwang6 hezhengda tuoping shunsunsun mingzhong15 saltball yata727 leoil wangzyphysics goodoid zhanpengou edison105422 fqgong nkato1206 zhenming-xu yuhangyao mori0711 shazj99 lauthirteen franklalalala maki49 18434760862 lizhiqiang100 jinyuanh lixy211 kzhiwei zhu-liu sailfish009 chiahsinchu wct4715 wangfeiteng1 johnzzt liupf16 iprozd likefallwind zongwuyang qlyang94 yoki8424 pkufjh pee8379 areschenchen ashoremrfish joey-zhangcy ricky-zhao znslhx

dpgen's Issues

【bug】解析INCAR时没有忽略注释

如果在INCAR中存在下面内容

# NSW = 0

...

NSW = 10

下面的代码会错误的将md_nstep设置为0，导致后续计算出错。

dpgen/dpgen/data/gen.py

Lines 654 to 666 in ca01bd8

 if "NSW" in line: 

 nsw_flag = True 

 nsw_steps = int(incar_line.split()[-1]) 

 break 

 #dlog.info("nsw_steps is", nsw_steps) 

 #dlog.info("md_nstep_jdata is", md_nstep_jdata) 

 if nsw_flag: 

 if (nsw_steps != md_nstep_jdata): 

 dlog.info("WARNING: your set-up for MD steps in PARAM and md_incar are not consistent!") 

 dlog.info("MD steps in PARAM is %d"%(md_nstep_jdata)) 

 dlog.info("MD steps in md_incar is %d"%(nsw_steps)) 

 dlog.info("DP-GEN will use settings in md_incar!") 

 jdata['md_nstep'] = nsw_steps

error happens when doing task 8

when i finish the run_fp , running to post_fp the error happens like this :
i check the file ,seems everythiing goes well ,and i check the OUTCAR in each fp file they are all finished well.
So i choose to run again changing the group_size=20 in the fp section ,which used to 10. the problem disappear ,weird.
so ,would you tell me what cause that error and how i can avoid it the next time, thx.
btw ,what the meaning of ' unsuccessfully terminated jobs' , becasue it seems each fp calculation has been done in the run _fp

INFO:dpgen:-------------------------iter.000000 task 06--------------------------
INFO:dpgen:system 000 candidate :     14 in     62  22.58 %
INFO:dpgen:system 000 failed    :     10 in     62  16.13 %
INFO:dpgen:system 000 accurate  :     38 in     62  61.29 %
INFO:dpgen:system 000 accurate_ratio:   0.6129    thresholds: 1.0000 and 1.0000   eff. task min and max   -1   20   number of fp tasks:     14
INFO:dpgen:-------------------------iter.000000 task 07--------------------------
INFO:dpgen:new submission of d43b40d3-1b70-49e2-b9e9-b99fb9fbb89c for chunk 5776fd87d63bd9cbe3b31a38c5c7f85b8fcbf0b6
INFO:dpgen:new submission of 96a0626f-0f78-468a-a567-2c8faed8f62c for chunk f6b6fada4486171b74f408bb3db7d1c10cbf9f18
INFO:dpgen:job 96a0626f-0f78-468a-a567-2c8faed8f62c finished
INFO:dpgen:job d43b40d3-1b70-49e2-b9e9-b99fb9fbb89c finished
INFO:dpgen:-------------------------iter.000000 task 08--------------------------
INFO:dpgen:failed tasks:      0 in     14    0.00 % 
INFO:dpgen:failed frame:      3 in     14   21.43 % 
Traceback (most recent call last):
  File "/home/ben/.local/bin/dpgen", line 8, in <module>
    sys.exit(main())
  File "/home/ben/.local/lib/python3.8/site-packages/dpgen/main.py", line 175, in main
    args.func(args)
  File "/home/ben/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2410, in gen_run
    run_iter (args.PARAM, args.MACHINE)
  File "/home/ben/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2399, in run_iter
    post_fp (ii, jdata)
  File "/home/ben/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2267, in post_fp
    post_fp_vasp(iter_index, jdata)
  File "/home/ben/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2033, in post_fp_vasp
    raise RuntimeError("find too many unsuccessfully terminated jobs")
RuntimeError: find too many unsuccessfully terminated jobs

Dpgen code is not compatible with new version pymatgen

https://pymatgen.org/index.html#major-announcement-v2022-0
'A backwards incompatible change has been introduced in v2022.0.*.'
'If your existing code uses from pymatgen import , you will need to make modifications.'

I have met several problems when installing dpgen due to new version pymatgen.

Inappropriate format of OUTCAR file from abinito md calculations

Hi all, thanks for sharing the amazing module. Could you please have a look at my issue below?

I have a problem at the stage of preparing initial data, specifically, at the substeps of 3 (Run a shor AIMD in folder 02.md) and 4 (Collect data in folder in 02.md). The substeps 1 and 2 went through well and no issue came out. After the abinitio md calculations are finished, there is an error saying that it could not parse the OUTCAR file of md calculations correctly, as shown in the error message at the end.

Then I checked the OUTCAR file in 02.md directory, found that the their format is not appropriate. For example, there is a line which contains "VOLUME and BASIS-vectors are now:" but is not followed by volume matrix and vector matrix. One example is attached.

It seems that the data in OUTCAR file is not printed properly in abinitio md calculation (with VASP). I didn't encounter such issue before when I did VASP calculations on my own. It would be great if you have any clue that helps. Thanks!

000001.zip

"""
Traceback (most recent call last):
File "/home/lliu147/miniconda2/envs/dpgen/bin/dpgen", line 10, in
sys.exit(main())
File "/home/lliu147/miniconda2/envs/dpgen/lib/python3.7/site-packages/dpgen/main.py", line 125, in main
args.func(args)
File "/home/lliu147/miniconda2/envs/dpgen/lib/python3.7/site-packages/dpgen/data/gen.py", line 779, in gen_init_bulk
coll_vasp_md(jdata)
File "/home/lliu147/miniconda2/envs/dpgen/lib/python3.7/site-packages/dpgen/data/gen.py", line 536, in coll_vasp_md
_sys = dpdata.LabeledSystem(oo)
File "/home/lliu147/miniconda2/envs/dpgen/lib/python3.7/site-packages/dpdata/system.py", line 645, in init
self.from_vasp_outcar(file_name, begin = begin, step = step)
File "/home/lliu147/miniconda2/envs/dpgen/lib/python3.7/site-packages/dpdata/system.py", line 743, in from_vasp_outcar
= dpdata.vasp.outcar.get_frames(file_name, begin = begin, step = step)
File "/home/lliu147/miniconda2/envs/dpgen/lib/python3.7/site-packages/dpdata/vasp/outcar.py", line 65, in get_frames
coord, cell, energy, force, virial, is_converge = analyze_block(blk, ntot, nelm)
File "/home/lliu147/miniconda2/envs/dpgen/lib/python3.7/site-packages/dpdata/vasp/outcar.py", line 113, in analyze_block
for ss in tmp_l.replace('-',' -').split()[0:3]])
File "/home/lliu147/miniconda2/envs/dpgen/lib/python3.7/site-packages/dpdata/vasp/outcar.py", line 113, in
for ss in tmp_l.replace('-',' -').split()[0:3]])
ValueError: could not convert string to float: 'FORHAR:'
"""

Incompatible versions of examples.

I have the error "~/.conda/envs/deepmd-1.2.2/bin/dp_train: No such file or directory" in the train.log file when I use the command "dpgen run *.json" and I can`t find the ralated setting in the machine.json file . Only the command" dp" exist in this path.
Thanks!

some errors in call "dpgen run pa... ma..."

Dear developers,
When I call "dpgen run pa... ma...",it have something wrong about call"lmp_mpi",the error is as following:(it concludes 10 initial configurations).
*** Error in /public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (!prev): 0x00000000031eb0e0 *** INFO:dpgen:job 6ec96f6e-e2f0-4832-8e5e-2d6db31ba8a0 terminated, submit again INFO:dpgen:job e6f52053-7806-4467-808d-dd28b357af50 terminated, submit again INFO:dpgen:job 5253a482-fc5a-434f-a0ab-132921c4f5c3 terminated, submit again INFO:dpgen:job 8c3a9990-fb1e-473e-9851-7a07dc6e50e0 terminated, submit again INFO:dpgen:job f18139e0-f8ca-4edd-8ce4-30b2b4b1c920 terminated, submit again *** Error in /public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (fasttop): 0x00000000028f9870 ***
INFO:dpgen:job 2c277044-bc55-4ad5-abf3-e1a0475d15ea terminated, submit again
INFO:dpgen:job d55941a2-8ca3-46fd-b043-e53c6d4aafee terminated, submit again
INFO:dpgen:job e0bc2f24-0589-4350-b96e-c057e14161cb terminated, submit again
INFO:dpgen:job de00f61b-2311-4606-b5fa-cff57cd1e53d terminated, submit again
INFO:dpgen:job 6ec96f6e-e2f0-4832-8e5e-2d6db31ba8a0 terminated, submit again
INFO:dpgen:job e6f52053-7806-4467-808d-dd28b357af50 terminated, submit again
INFO:dpgen:job d4a20368-ffda-490c-ae14-d75122d50bcb terminated, submit again
*** Error in /public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (fasttop): 0x00000000027ae850 *** INFO:dpgen:job f18139e0-f8ca-4edd-8ce4-30b2b4b1c920 terminated, submit again INFO:dpgen:job 5253a482-fc5a-434f-a0ab-132921c4f5c3 terminated, submit again INFO:dpgen:job 2c277044-bc55-4ad5-abf3-e1a0475d15ea terminated, submit again INFO:dpgen:job 8c3a9990-fb1e-473e-9851-7a07dc6e50e0 terminated, submit again INFO:dpgen:job de00f61b-2311-4606-b5fa-cff57cd1e53d terminated, submit again INFO:dpgen:job 6ec96f6e-e2f0-4832-8e5e-2d6db31ba8a0 terminated, submit again INFO:dpgen:job d55941a2-8ca3-46fd-b043-e53c6d4aafee terminated, submit again INFO:dpgen:job e0bc2f24-0589-4350-b96e-c057e14161cb terminated, submit again *** Error in /public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (!prev): 0x00000000021507f0 ***
*** Error in /public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (!prev): 0x00000000017fc7f0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x81299)[0x2b21ffbf1299] /lib64/libc.so.6(+0x39d10)[0x2b21ffba9d10] /lib64/libc.so.6(+0x39d37)[0x2b21ffba9d37] /public1/soft/gcc/4.9.2//lib64/libgomp.so.1(+0x6dc6)[0x2b21fff44dc6] /public1/soft/gcc/4.9.2//lib64/libgomp.so.1(+0xeaff)[0x2b21fff4caff] /public1/soft/gcc/4.9.2//lib64/libgomp.so.1(GOMP_parallel+0x3a)[0x2b21fff478ba] /public1/home/sc31148/a/deepmd_root/lib/libdeepmd_op.so(_ZN15ProdVirialSeAOp7ComputeEPN10tensorflow15OpKernelContextE+0x8ab)[0x2b21f145fcfb] INFO:dpgen:job e6f52053-7806-4467-808d-dd28b357af50 terminated, submit again INFO:dpgen:job f18139e0-f8ca-4edd-8ce4-30b2b4b1c920 terminated, submit again INFO:dpgen:job 5253a482-fc5a-434f-a0ab-132921c4f5c3 terminated, submit again INFO:dpgen:job d4a20368-ffda-490c-ae14-d75122d50bcb terminated, submit again *** Error in /public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (!prev): 0x00000000020fd4c0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81299)[0x2ac6adac4299]
/lib64/libc.so.6(+0x39ce9)[0x2ac6ada7cce9]
/lib64/libc.so.6(+0x39d37)[0x2ac6ada7cd37]
/public1/soft/gcc/4.9.2//lib64/libgomp.so.1(+0x6dc6)[0x2ac6ade17dc6]
/public1/soft/gcc/4.9.2//lib64/libgomp.so.1(+0xeaff)[0x2ac6ade1faff]
/public1/soft/gcc/4.9.2//lib64/libgomp.so.1(GOMP_parallel+0x3a)[0x2ac6ade1a8ba]
/public1/home/sc31148/a/deepmd_root/lib/libdeepmd_op.so(_ZN15ProdVirialSeAOp7ComputeEPN10tensorflow15OpKernelContextE+0x8ab)[0x2ac69f332cfb]
INFO:dpgen:job de00f61b-2311-4606-b5fa-cff57cd1e53d terminated, submit again
Traceback (most recent call last):
File "/public3/home/sc31148/.local/bin/dpgen", line 11, in
load_entry_point('dpgen==0.8.0', 'console_scripts', 'dpgen')()
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 2309, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 2284, in run_iter
run_model_devi (ii, jdata, mdata)
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 1010, in run_model_devi
errlog = 'model_devi.log')
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/dispatcher/Dispatcher.py", line 91, in run_jobs
while not self.all_finished(job_handler, mark_failure) :
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/dispatcher/Dispatcher.py", line 215, in all_finished
raise RuntimeError('Job %s failed for more than 3 times' % job_uuid)
RuntimeError: Job 6ec96f6e-e2f0-4832-8e5e-2d6db31ba8a0 failed for more than 3 times
So.Could you tell me what should I do next?

Automatize virtual environment packing with pypi

Summary
Enable automatic virtual environment packing with pypi.

Details
Straigten out the entire workflow and automatize.

Issue with PLUMED in DPGEN procedure

Hello

I'm trying to run dpgen with plumed in the structure of Aluminum clusters,but something went wrong in the model_devi part,it seem that the trajectory files did not generated.Error messages are following

DPGEN Version : 0.8.1

`Traceback (most recent call last):
File "/home/zpou/anaconda3/envs/dpgen/bin/dpgen", line 8, in
sys.exit(main())
File "/home/zpou/anaconda3/envs/dpgen/lib/python3.8/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/home/zpou/anaconda3/envs/dpgen/lib/python3.8/site-packages/dpgen/generator/run.py", line 2340, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/home/zpou/anaconda3/envs/dpgen/lib/python3.8/site-packages/dpgen/generator/run.py", line 2315, in run_iter
run_model_devi (ii, jdata, mdata)
File "/home/zpou/anaconda3/envs/dpgen/lib/python3.8/site-packages/dpgen/generator/run.py", line 999, in run_model_devi
dispatcher.run_jobs(mdata['model_devi_resources'],
File "/home/zpou/anaconda3/envs/dpgen/lib/python3.8/site-packages/dpgen/dispatcher/Dispatcher.py", line 91, in run_jobs
while not self.all_finished(job_handler, mark_failure) :
File "/home/zpou/anaconda3/envs/dpgen/lib/python3.8/site-packages/dpgen/dispatcher/Dispatcher.py", line 226, in all_finished
rjob['context'].download(task_chunks[idx], backward_task_files)
File "/home/zpou/anaconda3/envs/dpgen/lib/python3.8/site-packages/dpgen/dispatcher/LocalContext.py", line 114, in download
raise RuntimeError('do not find download file ' + rfile)
RuntimeError: do not find download file /home/zpou/dpgen_work/aa51e6f4-efb8-410a-95f4-7162524155ce/task.000.000001/dump.0.xyz

files.zip

task 06 is slow and needs lots of memory if trajectory is large

Task 06 loads all model deviations in the MD simulation into the memory, and then turns to lists of index. If the data is big, it will occupy lots of memory and consume lots of time to write out files.

TODO: add an option to minimize the memory. (e.g., write out files can be skipped, and it is only necessary to create candidate list)

About Installation

I have installed deepmd-kit by "conda install", and I create a virtual environment, deepmd-kit is installed in it. How can I install deepgen in this conda virtual environment ?

Automatize virtual environment packing with docker

Summary
Enable automatic virtual environment packing with docker.

Details
Create Dockerfile for the entire workflow.

A terrible training accuracy for Al

Hello everyone,

I am a beginner using dpgen. I trained a model for fcc-Al. However, when I tested the model, I got a significant deviation for self-interstitial formation energy. Could anyone please provide me a few suggestions? The input files and results are listed below,

param.json in init_bulk
{
"stages": [1, 2, 3, 4],
"cell_type": "fcc",
"super_cell": [2, 2, 2],
"elements": ["Al"],
"from_poscar": true,
"from_poscar_path": "POSCAR",
"potcars": ["POTCAR"],
"relax_incar": "INCAR.rlx",
"md_incar": "INCAR.md",
"scale": [1.00],
"skip_relax": false,
"pert_numb": 50,
"md_nstep": 20,
"pert_box": 0.03,
"pert_atom": 0.01,
"coll_ndata": 5000,
"type_map": ["Al"],
"_comment": "that's all"
}

param.json in run
{
"type_map": ["Al"],
"mass_map": [27],

"init_data_prefix":	"../init/",

"init_data_sys":	[
"POSCAR.02x02x02/02.md/sys-0032/deepmd"
		],
"init_batch_size":	[
1
],
"sys_configs":	[
["/home/zhq/WORK/fzh/work/ml/test/Al-2/init/POSCAR.02x02x02/01.scale_pert/sys-0032/scale-1.000/00000[0-4]/POSCAR"],
["/home/zhq/WORK/fzh/work/ml/test/Al-2/init/POSCAR.02x02x02/01.scale_pert/sys-0032/scale-1.000/00000[5-9]/POSCAR"],
["/home/zhq/WORK/fzh/work/ml/test/Al-2/init/POSCAR.02x02x02/01.scale_pert/sys-0032/scale-1.000/00001*/POSCAR"],
["/home/zhq/WORK/fzh/work/ml/test/Al-2/init/POSCAR.02x02x02/01.scale_pert/sys-0032/scale-1.000/00002*/POSCAR"],
["/home/zhq/WORK/fzh/work/ml/test/Al-2/init/POSCAR.02x02x02/01.scale_pert/sys-0032/scale-1.000/00003*/POSCAR"],
["/home/zhq/WORK/fzh/work/ml/test/Al-2/init/POSCAR.02x02x02/01.scale_pert/sys-0032/scale-1.000/00004*/POSCAR"]
],
"_comment":		" 00.train ",
"numb_models":	4,
"default_training_param" : {
"model":{
"_comment": " model parameters",
"type_map":["Al"],
"descriptor":{
"type":			"se_a",
"sel":			[200],
"rcut_smth":		0.5,
"rcut":			6.0,
"neuron":		[25, 50, 100],
"resnet_dt":		false,
"axis_neuron":		12,
"seed":			1

},
"fitting_net":{
"neuron": [240, 240, 240],
"resnet_dt": true,
"sedd": 1
}},
"learning_rate":{
"type": "exp",
"start_lr": 0.001,
"decay_steps": 2000,
"decay_rate": 0.95
},
"loss":{
"start_pref_e": 0.02,
"limit_pref_e": 2,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0.0,
"limit_pref_v": 0.0
},
"training":{
"coord_norm": true,
"type_fitting_net": false,
"_comment": " traing controls",
"systems": [],
"set_prefix": "set",
"stop_batch": 400000,
"batch_size": 1,
"seed": 0,
"_comment": " display and restart",
"_comment": " frequencies counted in batch",
"disp_file": "lcurve.out",
"disp_freq": 2000,
"numb_test": 4,
"save_freq": 20000,
"save_ckpt": "model.ckpt",
"load_ckpt": "model.ckpt",
"disp_training": true,
"time_training": true,
"profiling": false,
"profiling_file": "timeline.json",
"_comment": "that's all"}
},

"_comment":		" 01.model_devi ",
"_comment": "model_devi_skip: the first x of the recorded frames",
"model_devi_dt":		0.002,
"model_devi_skip":		0,
"model_devi_f_trust_lo":	0.05,
"model_devi_f_trust_hi":	0.20,
"model_devi_e_trust_lo":	1e10,
"model_devi_e_trust_hi":	1e10,
"model_devi_clean_traj":	true,
"model_devi_jobs":

[
{
"_idx": 0,
"ensemble": "npt",
"nsteps": 300,
"press": [
1.0, 10, 100
],
"sys_idx": [
0
],
"temps": [
50
],
"trj_freq": 10
},
{
"_idx": 1,
"ensemble": "npt",
"nsteps": 1000,
"press": [
1.0, 10, 100
],
"sys_idx": [
0, 1
],
"temps": [
50
],
"trj_freq": 10
},
{
"_idx": 2,
"ensemble": "npt",
"nsteps": 1000,
"press": [
1.0, 10, 100
],
"sys_idx": [
2, 3
],
"temps": [
50
],
"trj_freq": 10
},
{
"_idx": 3,
"ensemble": "npt",
"nsteps": 3000,
"press": [
1.0, 10, 100
],
"sys_idx": [
4, 5
],
"temps": [
50
],
"trj_freq": 10
},
{
"_idx": 4,
"ensemble": "npt",
"nsteps": 3000,
"press": [
1.0, 10, 100
],
"sys_idx": [
4, 5
],
"temps": [
50
],
"trj_freq": 10
}
],

"_comment":		" 02.fp ",
"fp_style":		"vasp",
"shuffle_poscar":	false,
"fp_task_max":	300,
"fp_task_min":	5,
"fp_pp_path":	"./",
"fp_pp_files":	["POTCAR"],
"fp_incar":         "/home/zhq/WORK/fzh/work/ml/test/Al-2/run/INCAR",
"_comment":		" that's all "

}

param.json in test
{
"_comment": "models",
"potcar_map": {
"Al": "potential/POTCAR"
},
"conf_dir": "confs/Al/std-fcc",
"key_id": "",
"task_type": "deepmd",
"task": "all",

"vasp_params": {
    "ecut": 650,
    "ediff": 1e-6,
    "kspacing": 0.15,
    "kgamma": false,
    "npar": 1,
    "kpar": 1,
    "_comment": " that's all "
},
"lammps_params": {
    "model_dir": "Al_model",
    "type_map": [
        "Al"
    ],
    "model_name": false,
    "model_param_type": false
},
"_comment": "00.equi",
"alloy_shift": false,
"_comment": "01.eos",
"vol_start": 12,
"vol_end": 22,
"vol_step": 0.5,
"_comment": "02.elastic",
"norm_deform": 2e-2,
"shear_deform": 5e-2,
"_comment": "03.vacancy",
"supercell": [
    3,
    3,
    3
],
"_comment": "04.interstitial",
"insert_ele": [
    "Al"
],
"reprod-opt": false,
"_comment": "05.surface",
"min_slab_size": 10,
"min_vacuum_size": 11,
"_comment": "pert xz to work around vasp bug...",
"pert_xz": 0.01,
"max_miller": 2,
"static-opt": false,
"relax_box": false,
"_comment": "06.phonon",
"supercell_matrix": [
    2,
    2,
    2
],
"band": "0 1 0  0.5 1 0.5  0.375 0.75 0.375  0  0  0  0.5 0.5 0.5",
"_comment": "that's all"

}

the result log
DeepModeling

Version: 0.8.2.dev0+gf8d70a4.d20210204
Date: Feb-04-2021
Path: /home/zhq/.local/lib/python3.7/site-packages/dpgen

Dependency

 numpy     1.20.3   /apps/lib/anaconda/anaconda3/e5/lib/python3.7/site-packages/numpy
dpdata     0.1.19   /apps/lib/anaconda/anaconda3/e5/lib/python3.7/site-packages/dpdata-0.1.19-py3.7.egg/dpdata

pymatgen 2020.10.9.01 /apps/lib/anaconda/anaconda3/e5/lib/python3.7/site-packages/pymatgen
monty 4.0.2 /apps/lib/anaconda/anaconda3/e5/lib/python3.7/site-packages/monty
ase 3.19.1 /apps/lib/anaconda/anaconda3/e5/lib/python3.7/site-packages/ase
paramiko 2.7.2 /home/zhq/.local/lib/python3.7/site-packages/paramiko
custodian 2021.1.8 /home/zhq/.local/lib/python3.7/site-packages/custodian

Reference

Please cite:
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E,
DP-GEN: A concurrent learning platform for the generation of reliable deep learning
based potential energy models, Computer Physics Communications, 2020, 107206.

Description

/gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/00.equi/Al/std-fcc/deepmd
conf_dir: EpA(eV) VpA(A^3)
confs/Al/std-fcc -3.7483 16.503

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-12.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-12.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-13.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-13.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-14.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-14.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-15.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-15.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-16.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-16.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-17.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-17.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-18.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-18.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-19.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-19.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-20.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-20.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-21.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-21.50

Vpa(A^3) EpA(eV)
11.9999981608503 -2.96662795170267
12.4999949243073 -2.97802583445048
12.9999983614591 -3.09448430830428
13.4999983961358 -3.32230518208628
13.9999959870801 -3.5283274458636
14.5000040385406 -3.6320900440121
15.0000041246356 -3.6964484536534
15.5000045857403 -3.7304097714782
16.0 -3.74457075952198
16.4999997474047 -3.74831572706045
16.9999987265498 -3.74521051825265
17.4999961808157 -3.73651305151505
18.0000045936601 -3.7231315655221
18.4999940008706 -3.7045569241317
19.0000055945374 -3.67865231543475
19.5000043580999 -3.64584828796787
19.9999947077013 -3.6097276969094
20.4999931235139 -3.57066598363155
20.9999979876931 -3.52542444079503
21.5000005488178 -3.4737049576875
gen with norm [-0.02, -0.01, 0.01, 0.02]
gen with shear [-0.05, -0.025, 0.025, 0.05]
111.73 50.42 50.42 0.00 0.00 0.00
50.42 111.73 50.42 0.00 0.00 0.00
50.42 50.42 111.73 0.00 0.00 0.00
0.00 0.00 0.00 35.33 0.00 0.00
0.00 0.00 0.00 0.00 35.33 0.00
0.00 0.00 0.00 0.00 0.00 35.33

Bulk Modulus BV = 70.86 GPa

Shear Modulus GV = 33.46 GPa

Youngs Modulus EV = 86.74 GPa

Poission Ratio uV = 0.30

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/03.vacancy/Al/std-fcc/deepmd/struct-3x3x3-000

/gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/03.vacancy/Al/std-fcc/deepmd

Structure: Vac_E(eV) E(eV) equi_E(eV)
struct-3x3x3-000: 0.618 -400.452 -401.070
task poscar: /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/04.interstitial/Al/std-fcc/deepmd/POSCAR

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/04.interstitial/Al/std-fcc/deepmd/struct-Al-3x3x3-000

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/04.interstitial/Al/std-fcc/deepmd/struct-Al-3x3x3-001

/gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/04.interstitial/Al/std-fcc/deepmd/struct-Al-3x3x3-*
Insert_ele-Struct: Inter_E(eV) E(eV) equi_E(eV)
struct-Al-3x3x3-000: 0.720 -407.846 -408.566
struct-Al-3x3x3-001: 0.835 -407.731 -408.566

Best regards,
Zhongheng

How to restart training from NN in last iteration ?

Dpgen trains the neuron network from scratch in every iteration, and I wonder how the restart training from the train stage of last iteration.

A strange error

INFO:dpgen:-------------------------iter.000000 task 01--------------------------
INFO:dpgen:new submission of 6b33f4ab-ee6f-46a3-8b5e-010c8f50b303 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
INFO:dpgen:new submission of 3dd38ba5-1aab-47b6-aa9d-a595219da418 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
INFO:dpgen:new submission of 06bf77bf-c80b-4d54-b89d-5308703421e8 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
INFO:dpgen:new submission of b20f8073-24c8-4f68-a99c-02672ca961ca for chunk 221407c03ae5c73109cce71d27e24637824f3333
INFO:dpgen:job 6b33f4ab-ee6f-46a3-8b5e-010c8f50b303 finished
INFO:dpgen:job 06bf77bf-c80b-4d54-b89d-5308703421e8 finished
INFO:dpgen:job 3dd38ba5-1aab-47b6-aa9d-a595219da418 finished
INFO:dpgen:job b20f8073-24c8-4f68-a99c-02672ca961ca finished
INFO:dpgen:-------------------------iter.000000 task 02--------------------------
INFO:dpgen:-------------------------iter.000000 task 03--------------------------
Traceback (most recent call last):
File "/nfs-share/home/1800011848/.local/bin/dpgen", line 8, in
sys.exit(main())
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2309, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2278, in run_iter
cont = make_model_devi (ii, jdata, mdata)
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 726, in make_model_devi
system = dpdata.System(os.path.join(conf_path, poscar_name), fmt = fmt, type_map = jdata['type_map'])
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpdata/system.py", line 120, in init
self.from_fmt(file_name, fmt, type_map=type_map, begin= begin, step=step, **kwargs)
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpdata/system.py", line 137, in from_fmt
func(self, file_name, **kwargs)
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpdata/system.py", line 565, in from_vasp_poscar
with open(file_name) as fp:
IsADirectoryError: [Errno 21] Is a directory: 'iter.000000/01.model_devi/confs/000.0000.poscar'

And the iter.000000/01.model_devi/confs/000.0000.poscar is a symbolic link connected to the root.

Issue of dpgen on PBS system

Summary
I run DPGEN jobs (dpgen run) on localhost and want to do calculations (such as dp train, DFT calculate) on the remote HPC.
DPGEN can distribute tasks to the remote HPC (batch: PBS). But finishing the calculation, it failed to collect the data from HPC.

DPGEN Version and Platform
Version: 0.9.1.dev0+g769ccf4.d20210504
ubuntu 20.4

Details
Description

Traceback (most recent call last):
File "/home/[email protected]/.local/bin/dpgen", line 8, in
sys.exit(main())
File "/home/[email protected]/.local/lib/python3.8/site-packages/dpgen/main.py", line 175, in main
args.func(args)
File "/home/[email protected]/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2410, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/home/[email protected]/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2373, in run_iter
run_train (ii, jdata, mdata)
File "/home/[email protected]/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 529, in run_train
dispatcher.run_jobs(mdata['train_resources'],
File "/home/[email protected]/.local/lib/python3.8/site-packages/dpgen/dispatcher/Dispatcher.py", line 91, in run_jobs
while not self.all_finished(job_handler, mark_failure) :
File "/home/[email protected]/.local/lib/python3.8/site-packages/dpgen/dispatcher/Dispatcher.py", line 210, in all_finished
status = rjob['batch'].check_status()
File "/home/[email protected]/.local/lib/python3.8/site-packages/dpgen/dispatcher/PBS.py", line 25, in check_status
raise RuntimeError ("status command qstat fails to execute. erro info: %s return code %d"
RuntimeError: status command qstat fails to execute. erro info: qstat: 10169.franklin01 Job has finished, use -x or -H to obtain historical job information
return code 35

find too many unsuccessfully terminated jobs

Hi, I am getting the following error in the second iteration:

find too many unsuccessfully terminated jobs

How to go ahead?

Thanks,
Mayank

find too many unsuccessfully terminated jobs

When I run 'dpgen run para.json machine.json', I met the error as following:

OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-11
OMP: Info #214: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #156: KMP_AFFINITY: 12 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #285: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #285: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #285: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #285: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #191: KMP_AFFINITY: 1 socket x 6 cores/socket x 2 threads/core (6 total cores)
OMP: Info #216: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to socket 0 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to socket 0 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to socket 0 core 3 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to socket 0 core 3 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to socket 0 core 4 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to socket 0 core 4 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to socket 0 core 5 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to socket 0 core 5 thread 1
OMP: Info #252: KMP_AFFINITY: pid 7001 tid 7001 thread 0 bound to OS proc set 0
DeepModeling

Version: 0.8.1.dev2+g413d945
Date: Aug-09-2020
Path: /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen

Dependency

 numpy     1.18.5   /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/numpy
dpdata     0.1.17   /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpdata

pymatgen 2020.8.3 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/pymatgen
monty 3.0.4 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/monty
ase 3.20.0 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/ase
paramiko 2.7.1 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/paramiko
custodian 2020.4.27 /home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/custodian

Reference

Please cite:
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E,
DP-GEN: A concurrent learning platform for the generation of reliable deep learning
based potential energy models, Computer Physics Communications, 2020, 107206.

Description

INFO : start running
INFO : =============================iter.000000==============================
INFO : -------------------------iter.000000 task 00--------------------------
INFO : -------------------------iter.000000 task 01--------------------------
INFO : new submission of 462315b9-50c0-48d6-98b0-67bae1db53ce for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
INFO : new submission of 43b39538-8222-49a7-b9a8-b399d93d1127 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
INFO : new submission of f4dc3041-215b-46a4-ae3f-67c64e272f66 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
INFO : new submission of f49ef967-0c47-41c9-b960-5ca26efc65c6 for chunk 221407c03ae5c73109cce71d27e24637824f3333
INFO : job f49ef967-0c47-41c9-b960-5ca26efc65c6 finished
INFO : job 462315b9-50c0-48d6-98b0-67bae1db53ce finished
INFO : job 43b39538-8222-49a7-b9a8-b399d93d1127 finished
INFO : job f4dc3041-215b-46a4-ae3f-67c64e272f66 finished
INFO : -------------------------iter.000000 task 02--------------------------
INFO : -------------------------iter.000000 task 03--------------------------
INFO : -------------------------iter.000000 task 04--------------------------
INFO : new submission of 782869a5-6cfa-47c0-ac09-9ab2775ce2d8 for chunk 5934a71cd58aa3719d6b4ab5368b50ce8bf3a54c
INFO : new submission of 5dc9c8d4-f832-4d11-9428-d0dcbbad79b1 for chunk 9b04ed091d6b748e0740bfb91043e477e45dad09
INFO : job 782869a5-6cfa-47c0-ac09-9ab2775ce2d8 finished
INFO : job 5dc9c8d4-f832-4d11-9428-d0dcbbad79b1 finished
INFO : -------------------------iter.000000 task 05--------------------------
INFO : -------------------------iter.000000 task 06--------------------------
INFO : system 000 candidate : 818 in 5050 16.20 %
INFO : system 000 failed : 4177 in 5050 82.71 %
INFO : system 000 accurate : 55 in 5050 1.09 %
INFO : system 000 accurate_ratio: 0.0109 thresholds: 1.0000 and 1.0000 eff. task min and max -1 20 number of fp tasks: 20
INFO : system 002 candidate : 282 in 5050 5.58 %
INFO : system 002 failed : 4707 in 5050 93.21 %
INFO : system 002 accurate : 61 in 5050 1.21 %
INFO : system 002 accurate_ratio: 0.0121 thresholds: 1.0000 and 1.0000 eff. task min and max -1 20 number of fp tasks: 20
INFO : system 004 candidate : 594 in 5050 11.76 %
INFO : system 004 failed : 4386 in 5050 86.85 %
INFO : system 004 accurate : 70 in 5050 1.39 %
INFO : system 004 accurate_ratio: 0.0139 thresholds: 1.0000 and 1.0000 eff. task min and max -1 20 number of fp tasks: 20
INFO : -------------------------iter.000000 task 07--------------------------
INFO : new submission of 2baf1807-5ba8-4362-830f-88fe0b216943 for chunk 16cc50909d40073045aaf94a03cbf9f28586cfcf
INFO : job 2baf1807-5ba8-4362-830f-88fe0b216943 finished
INFO : -------------------------iter.000000 task 08--------------------------
INFO : failed tasks: 0 in 60 0.00 %
INFO : failed frame: 60 in 60 100.00 %

Traceback (most recent call last):
File "/home/wzb198910/Software/anaconda3/bin/dpgen", line 8, in
sys.exit(main())
File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/generator/run.py", line 2311, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/generator/run.py", line 2300, in run_iter
post_fp (ii, jdata)
File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/generator/run.py", line 2168, in post_fp
post_fp_vasp(iter_index, jdata)
File "/home/wzb198910/Software/anaconda3/lib/python3.8/site-packages/dpgen/generator/run.py", line 1934, in post_fp_vasp
raise RuntimeError("find too many unsuccessfully terminated jobs")
RuntimeError: find too many unsuccessfully terminated jobs

How can I solve this problem?
Thanks.

PS: Ubuntu 20.04 + python 3.8 + deepmd-kit 1.2.0 + dpdata 0.1.17 + dpgen 0.8.1.dev2+g413d945

How to deal with this error?

When the dpgen have finished step 7 of iter000000, and all fp missions have been finished successfully，there comes with an error like this. It seems sonthing wrong with dpdata? Thanks a lot!

Ask for help: UserWarning: The batch sizes are not enough

Summary
My question is how to use several files with different atoms as input files("init_data_sys"), to train the model? The second is, how to use different structures with different atoms as initial structures ("sys_configs")? The following are the details:

Summary

I use many structures. I use dpgen to run the iteration. If I use one prepared files, it went well with each structure. However, if added two structure files as "Li14" ，"Li8"，(these files were created with dpdata from OUTCARs), I will get error:

/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py:242: UserWarning: The batch sizes are not enough. Assume auto for those not spefified.
warnings.warn("The batch sizes are not enough. Assume auto for those not spefified.")
Traceback (most recent call last):
File "/scratch/Anaconda3/envs/deepc/bin/dpgen", line 8, in
sys.exit(main())
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 2340, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 2299, in run_iter
make_train (ii, jdata, mdata)
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 252, in make_train
assert (len(init_data_sys_) <= len(init_batch_size_))
AssertionError

I change the "init_batch_size" to 100, it did not help. how to add all files to train the model?

Detailed Description

<!
{
"type_map": ["Li"],
"mass_map": [7.0],

"_comment": "initial data set for Training and the number of frames in each training batch",
"init_data_prefix": "~/scratch/deep/deecada/",
"init_data_sys": [
"Li14"
],
"init_batch_size": [
2
],

"_comment": "configurations for starting MD in Exploration and batch sizes when traning snapshots derived from these configs (if they were selected)",
"sys_configs_prefix": "~/scratch/deep/poscars/",
"sys_configs": [
[
"0000[0-9]/POSCAR"
],
[
"0001[0-9]/POSCAR"
]
],
"sys_batch_size": [
4,
4
],

"_comment": " 00.train ",
"numb_models": 4,

"default_training_param": {
"model": {
"type_map": ["Li"],
"descriptor": {
"type": "se_a",
"sel": [64],
"rcut_smth": 0.5,
"rcut": 5.0,
"neuron": [10,20,40],
"resnet_dt": false,
"axis_neuron": 12,
"seed": 0
},
"fitting_net": {
"neuron": [120,120,120],
"resnet_dt": true,
"coord_norm": true,
"type_fitting_net": false,
"seed": 0
}
},
"loss": {
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0
},
"learning_rate": {
"type": "exp",
"start_lr": 0.001,
"decay_steps": 1000,
"decay_rate": 0.95
},
"training": {
"systems": [],
"set_prefix": "set",
"stop_batch": 200000,
"batch_size": 1,
"seed": 1,
"_comment": "frequencies counted in batch",
"disp_file": "lcurve.out",
"disp_freq": 1000,
"numb_test": 4,
"save_freq": 1000,
"save_ckpt": "model.ckpt",
"load_ckpt": "model.ckpt",
"disp_training": true,
"time_training": true,
"profiling": false,
"profiling_file": "timeline.json"
}
},

"_comment": " 01.model_devi ",
"model_devi_dt": 0.002,
"model_devi_skip": 0,
"model_devi_f_trust_lo": 0.05,
"model_devi_f_trust_hi": 0.15,
"model_devi_clean_traj": false,
"model_devi_jobs": [
{
"sys_idx": [
0
],
"temps": [
50
],
"press": [
1
],
"trj_freq": 10,
"nsteps": 10000,
"ensemble": "nvt",
"_idx": "00"
},
{
"sys_idx": [
1
],
"temps": [
50
],
"press": [
1
],
"trj_freq": 10,
"nsteps": 30000,
"ensemble": "nvt",
"_idx": "01"
}
],

"_comment": " 02.fp ",
"fp_style": "vasp",
"shuffle_poscar": false,
"fp_task_max": 40,
"fp_task_min": 8,
"fp_pp_path": "./",
"fp_pp_files": [ "POTCAR_Li"],
"fp_incar": "INCAR_Li"

-->
If the structures atoms are different in :
"sys_configs_prefix": "~/scratch/deep/poscars/",
"sys_configs": [
[
"0000[0-9]/POSCAR"
],
[
"0001[0-9]/POSCAR"
]

I will get an error ：
INFO:dpgen:-------------------------iter.000001 task 08--------------------------
INFO:dpgen:failed tasks: 0 in 40 0.00 %
Traceback (most recent call last):
File "/scratch/Anaconda3/envs/deepc/bin/dpgen", line 8, in
sys.exit(main())
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 2340, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 2329, in run_iter
post_fp (ii, jdata)
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 2197, in post_fp
post_fp_vasp(iter_index, jdata)
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 1928, in post_fp_vasp
all_sys.append(_sys)
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpdata/system.py", line 1316, in append
if not System.append(self, system):
File "/scratch/Anaconda3/envs/deepc/lib/python3.6/site-packages/dpdata/system.py", line 336, in append
raise RuntimeError('systems with inconsistent formula could not be append: %s v.s. %s' % (self.uniq_formula, system.uniq_formula))
RuntimeError: systems with inconsistent formula could not be append: Li10 v.s. Li3

Because the atoms in POSCARs in "sys_configs_prefix" is different, how to write that accurate?

How to deal with this error [dpgen autotest make ... self.data['atom_types'] = np.argsort(idx)[self.data['atom_types']] IndexError: index 2 is out of bounds for axis 0 with size 2]

Summary

I am run “dpgen autotest make ...”
but I get the below information in my screen:

(base) [bqfu@node01 ea05]$ /home/share/dp/bin/dpgen autotest make property3c.json
DeepModeling

Version: 0.9.1.dev0+g769ccf4.d20210311
Date: Mar-11-2021
Path: /home/bqfu/.local/lib/python3.7/site-packages/dpgen

Dependency

 numpy     1.16.3   /home/bqfu/anaconda3/lib/python3.7/site-packages/numpy
dpdata     0.1.19   /home/bqfu/.local/lib/python3.7/site-packages/dpdata

pymatgen 2019.4.11 /home/bqfu/anaconda3/lib/python3.7/site-packages/pymatgen
monty 2021.3.3 /home/bqfu/.local/lib/python3.7/site-packages/monty
ase 3.21.1 /home/bqfu/.local/lib/python3.7/site-packages/ase
paramiko 2.7.2 /home/bqfu/.local/lib/python3.7/site-packages/paramiko
custodian 2021.2.8 /home/bqfu/.local/lib/python3.7/site-packages/custodian

Reference

Please cite:
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E,
DP-GEN: A concurrent learning platform for the generation of reliable deep learning
based potential energy models, Computer Physics Communications, 2020, 107206.

Description

----------------------------------3c----------------------------------
gen eos from 0.85 to 1.15 by every 0.01
gen with norm [-0.002, -0.001, 0.001, 0.002]
gen with shear [-0.005, -0.0025, 0.0025, 0.005]
gen vacancy with supercell [3, 3, 3]
gen interstitial with supercell [6, 6, 6] with element ['Si', 'C']
Traceback (most recent call last):
File "/home/share/dp/bin/dpgen", line 11, in
sys.exit(main())
File "/home/bqfu/.local/lib/python3.7/site-packages/dpgen/main.py", line 175, in main
args.func(args)
File "/home/bqfu/.local/lib/python3.7/site-packages/dpgen/auto_test/run.py", line 57, in gen_test
run_task(args.TASK, args.PARAM, args.MACHINE)
File "/home/bqfu/.local/lib/python3.7/site-packages/dpgen/auto_test/run.py", line 28, in run_task
make_property(confs, inter_parameter, property_list)
File "/home/bqfu/.local/lib/python3.7/site-packages/dpgen/auto_test/common_prop.py", line 91, in make_property
inter.make_input_file(kk, prop.task_type(), prop.task_param())
File "/home/bqfu/.local/lib/python3.7/site-packages/dpgen/auto_test/Lammps.py", line 119, in make_input_file
lammps.cvt_lammps_conf(os.path.join(output_dir, 'POSCAR'), os.path.join(output_dir, 'conf.lmp'), lammps.element_list(self.type_map))
File "/home/bqfu/.local/lib/python3.7/site-packages/dpgen/auto_test/lib/lammps.py", line 29, in cvt_lammps_conf
d_poscar = dpdata.System(fin, fmt='vasp/poscar', type_map=type_map)
File "/home/bqfu/.local/lib/python3.7/site-packages/dpdata/system.py", line 123, in init
self.apply_type_map(type_map)
File "/home/bqfu/.local/lib/python3.7/site-packages/dpdata/system.py", line 405, in apply_type_map
self.check_type_map(type_map)
File "/home/bqfu/.local/lib/python3.7/site-packages/dpdata/system.py", line 401, in check_type_map
self.sort_atom_names(type_map=type_map)
File "/home/bqfu/.local/lib/python3.7/site-packages/dpdata/system.py", line 388, in sort_atom_names
self.data['atom_types'] = np.argsort(idx)[self.data['atom_types']]
IndexError: index 2 is out of bounds for axis 0 with size 2

dpgen install issue in a docker container

Hi, I intend to install dpgen in a docker container with non-administrative privileges, so I try to install it with another direcotory instead of ./local/bin, e.g. pip install --prefix /ghome/jiangshuai/dpgen_test
However, even though the installation is successfully with no error reporting, when dpgen -h, it shows: ModuleNotFoundError: No module named 'dpgen'. Does anyone know how to fix this or is there another way to install dpgen?
Thanks in advance.

Run dpgen with ORCA

I create this issue to track the progress.
Thank you for Anguse's guidance. @AnguseZhang
TODO:

Setup dpgen in our lab
add class orca in dpdata
add ORCA make_fp to generate ORCA input in dpgen
add example
add unittest in dpdata and dpgen
(optional) support SGE dispatcher like PBS in dpgen, SGE vs PBS

cc @mapleleaf-soar @elifzeng

A mistake when I fitted different systems

Summary

DPGEN Version and Platform

Job submission and computing cluster configuration

Expected Behavior

Actual Behavior

Steps to Reproduce

Further Information, Files, and Links

File "/data/apps/miniconda3/envs/deepmd/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1167, in _run
(np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (4,) for Tensor 'd_sea_t_natoms:0', which has shape '(5,)'
WARNING:tensorflow:From /data/apps/miniconda3/envs/deepmd/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Traceback (most recent call last):
File "/data/apps/miniconda3/envs/deepmd/bin/dp", line 10, in
sys.exit(main())
File "/data/apps/miniconda3/envs/deepmd/lib/python3.7/site-packages/deepmd/main.py", line 75, in main
freeze(args)
File "/data/apps/miniconda3/envs/deepmd/lib/python3.7/site-packages/deepmd/freeze.py", line 89, in freeze
freeze_graph(args.folder, args.output, args.nodes)
File "/data/apps/miniconda3/envs/deepmd/lib/python3.7/site-packages/deepmd/freeze.py", line 41, in freeze_graph
input_checkpoint = checkpoint.model_checkpoint_path
AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

[BUG] SSH session not active

If paramiko cannot fix this issue, we may need to add try...except to retry the connection.

IndexError: list index out of range

Hi,

While executing the dpgen program after 2nd iteration, it complains about "IndexError: list index out of range".
Please suggest how to solve this.

Thanks,
Mayank

[BUG] username in Slurm's `_check_sub_limit` function is incorrect

Summary

In Slurm's _check_sub_limit function, username is fetched by

dpgen/dpgen/dispatcher/Slurm.py

Line 199 in 85a5f35

username = getpass.getuser()

This is, of course, incorrect, as this is the local username, but not the remote username.

DPGEN Version and Platform

latest devel

Many Slurm SBATCH flags are not correctly defined.

To whom it may concern,

In the scripts ..../lib/python3.8/site-packages/dpgen/{dispatcher/Slurm.py,auto_test/lib/RemoteJob.py,remote/RemoteJob.py} many of the SBATCH flags starting with two dashes (--) are missing an equals sign (=) between the flag and the value of the flag.

For example in ..../lib/python3.8/site-packages/dpgen/remote/RemoteJob.py the blank space in column 42 of line 464 should be replaced by an equals sign (=). In other words the following line in RemoteJob.py

    ret += "#SBATCH --ntasks-per-node %d\n" % res['task_per_node']

should be

    ret += "#SBATCH --ntasks-per-node=%d\n" % res['task_per_node']

to generate a working slurm script.

Sincerely

John J. Low, Argonne National Laboratory

Compatibility with Quantum Espresso

Hi. Would compatibility with QE (or GPAW) be something you might look at or would you consider it difficult to include in the current implementation? After skimming the code it seems dpgen is very dependent on VASP.

Automatically trigger virtual environment packing

Summary
Automatically trigger virtual environment packing at specific condition (e. g. when the master branch is updated).

Details

can dpgen init_bulk use multiple poscars ?

I want to start from multiple structures, and can "from_poscar" be a list ?

How to desert some systems when running fp?

The model_devi process produced some very strange structure, so I want to delete them.
I just deleted the workdir of these fp tasks, and it seems that when the dpgen restarts, these jobs will be submitted again.

The problem of dpgen on Slurm system

I do not know where is wrong, the machine file or the installation?
The detail can be seen below:

my install methods:
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2020.02-Linux-x86_64.sh
conda create -n deepc python=3.6 libprotobuf==3.8.0
conda activate deepc
conda install deepmd-kit=*=cpu lammps-dp==*cpu -c deepmodeling
pip install pymatgen==2019.6.5 monty==2.0.4 ase==3.17.0 paramiko==2.6.0 custodian==2019.2.10 dpgen==0.8.1

No error after installation. The CH4 example can be finished on my Desktop computer with no errors. It did not go on well on supercomputer system. The vision of our supersystem is : CentOS Linux release 7.7.1908 (Core).

I run the CH4 example on slurm system by: dpgen run param.json machine-slurm.json >log 2>error
the record.dpgen:
0 0
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
1 0

the error file :
Traceback (most recent call last):
File "/home/elgao/scratch/anaconda3_bob/envs/deepc/bin/dpgen", line 8, in
sys.exit(main())
File "/home/elgao/scratch/anaconda3_bob/envs/deepc/lib/python3.6/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/home/elgao/scratch/anaconda3_bob/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 2340, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/home/elgao/scratch/anaconda3_bob/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 2303, in run_iter
run_train (ii, jdata, mdata)
File "/home/elgao/scratch/anaconda3_bob/envs/deepc/lib/python3.6/site-packages/dpgen/generator/run.py", line 530, in run_train
errlog = 'train.log')
File "/home/elgao/scratch/anaconda3_bob/envs/deepc/lib/python3.6/site-packages/dpgen/dispatcher/Dispatcher.py", line 91, in run_jobs
while not self.all_finished(job_handler, mark_failure) :
File "/home/elgao/scratch/anaconda3_bob/envs/deepc/lib/python3.6/site-packages/dpgen/dispatcher/Dispatcher.py", line 216, in all_finished
raise RuntimeError('Job %s failed for more than 3 times' % job_uuid)
RuntimeError: Job f01bba9e-181e-4fad-8b96-c61db8350cf5 failed for more than 3 times

the dpgen.log file:
2021-04-01 17:08:16,365 - INFO : =============================iter.000001==============================
2021-04-01 17:08:16,365 - INFO : -------------------------iter.000001 task 00--------------------------
2021-04-01 17:08:16,407 - INFO : -------------------------iter.000001 task 01--------------------------
2021-04-01 17:08:16,457 - INFO : new submission of f01bba9e-181e-4fad-8b96-c61db8350cf5 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-04-01 17:08:16,530 - INFO : new submission of f9cf7153-403f-4f3e-9c4e-928aae9010a8 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-04-01 17:08:16,576 - INFO : new submission of 9ac8ba18-fbda-4e38-98c4-a97637433a5f for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-04-01 17:08:16,618 - INFO : new submission of 604657a5-597a-4185-9b24-b6f62216e38b for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-04-01 17:09:16,797 - INFO : job f01bba9e-181e-4fad-8b96-c61db8350cf5 terminated, submit again
2021-04-01 17:09:16,875 - INFO : job f9cf7153-403f-4f3e-9c4e-928aae9010a8 terminated, submit again
2021-04-01 17:09:16,950 - INFO : job 9ac8ba18-fbda-4e38-98c4-a97637433a5f terminated, submit again
2021-04-01 17:09:17,013 - INFO : job 604657a5-597a-4185-9b24-b6f62216e38b terminated, submit again
2021-04-01 17:10:17,192 - INFO : job f01bba9e-181e-4fad-8b96-c61db8350cf5 terminated, submit again
2021-04-01 17:10:17,278 - INFO : job f9cf7153-403f-4f3e-9c4e-928aae9010a8 terminated, submit again
2021-04-01 17:10:17,344 - INFO : job 9ac8ba18-fbda-4e38-98c4-a97637433a5f terminated, submit again
2021-04-01 17:10:17,402 - INFO : job 604657a5-597a-4185-9b24-b6f62216e38b terminated, submit again
2021-04-01 17:11:17,538 - INFO : job f01bba9e-181e-4fad-8b96-c61db8350cf5 terminated, submit again
2021-04-01 17:11:17,653 - INFO : job f9cf7153-403f-4f3e-9c4e-928aae9010a8 terminated, submit again
2021-04-01 17:11:17,717 - INFO : job 9ac8ba18-fbda-4e38-98c4-a97637433a5f terminated, submit again
2021-04-01 17:11:17,786 - INFO : job 604657a5-597a-4185-9b24-b6f62216e38b terminated, submit again

In the f01bba9e-181e-4fad-8b96-c61db8350cf5 folder, four slurm-*.out file have the same content:
/home/XXX/scratch/anaconda3_bob/bin/conda: line 3: import: command not found
/home/XXX/scratch/anaconda3_bob/bin/conda: line 6: syntax error near unexpected token sys.argv' /home/XXX/scratch/anaconda3_bob/bin/conda: line 6: if len(sys.argv) > 1 and sys.argv[1].startswith('shell.') and sys.path and sys.path[0] == '':'

the file of /home/XXX/scratch/anaconda3_bob/bin/conda:
#!/home/elgao/scratch/anaconda3_bob/bin/python

-- coding: utf-8 --

import sys

Before any more imports, leave cwd out of sys.path for internal 'conda shell.*' commands.

see conda/conda#6549

if len(sys.argv) > 1 and sys.argv[1].startswith('shell.') and sys.path and sys.path[0] == '':
# The standard first entry in sys.path is an empty string,
# and os.path.abspath('') expands to os.getcwd().
del sys.path[0]

if name == 'main':
from conda.cli import main
sys.exit(main())

the machine file machine-slurm.json :
{
"train": [
{
"machine": {
"batch": "slurm",
"work_path": "/home/XXX/scratch/deepmd_bob/dpwork"
},
"resources": {
"numb_node": 1,
"numb_gpu": 0,
"task_per_node": 8,
"with_mpi": false,
"name": "dp_pj",
"partition": "pub",
"time_limit": "3600:00:00",
"exclude_list": [],
"source_list": [
" ~/.bashrc","conda activate deepc"
],
"module_list": []
},
"command": "dp",
"group size": 1
}
],
"model_devi": [
{
"machine": {
"batch": "slurm",
"work_path": "/home/XXX/scratch/deepmd_bob/dpwork"
},
"resources": {
"numb_node": 1,
"numb_gpu": 0,
"task_per_node": 16,
"with_mpi": false,
"partition": "pub",
"name":"lmp",
"time_limit": "3600:00:00",
"exclude_list": [],
"source_list": [ " ~/.bashrc","conda activate deepc"],
"module_list": []
},
"command": "mpirun -np 8 /scratch/XXX/anaconda3_bob/envs/deepc/bin/lmp ",
"group_size": 2
}
],
"fp": [
{
"machine": {
"batch": "slurm",
"work_path": "/home/XXX/scratch/deepmd_bob/dpwork"
},
"resources": {
"numb_node": 1,
"numb_gpu": 0,
"task_per_node": 16,
"exclude_list": [],
"with_mpi": false,
"name": "aimd_zcb",
"source_list": [],
"module_list": [
"module load intel"
],
"partition": "pub",
"time_limit": "3600:00:00",
"_comment": "that's All"
},
"command": " srun -n 16 /project/XXX/00_apps/vasp6.1_isif_vtst_bob",
"group_size": 10
}
]
}

Please help me out of this.

Hi. After the pip install of dpgen, I run the command and the problem shows up

Traceback (most recent call last):
File "/home/dengbin/miniconda3/bin/dpgen", line 5, in
from dpgen.main import main
File "/home/dengbin/.local/lib/python3.8/site-packages/dpgen/main.py", line 9, in
from dpgen.generator.run import gen_run
File "/home/dengbin/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 42, in
from dpgen.generator.lib.vasp import write_incar_dict
File "/home/dengbin/.local/lib/python3.8/site-packages/dpgen/generator/lib/vasp.py", line 5, in
from pymatgen.io.vasp import Incar
File "/home/dengbin/.local/lib/python3.8/site-packages/pymatgen/init.py", line 14, in
import ruamel.yaml as yaml

How to solve this problem?

Thank you!

An Error when running the iter.1

INFO:dpgen:=============================iter.000001==============================
INFO:dpgen:-------------------------iter.000001 task 00--------------------------
INFO:dpgen:-------------------------iter.000001 task 01--------------------------
INFO:dpgen:-------------------------iter.000001 task 02--------------------------
INFO:dpgen:-------------------------iter.000001 task 03--------------------------
Traceback (most recent call last):
File "/nfs-share/home/1800011848/.local/bin/dpgen", line 8, in
sys.exit(main())
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2309, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2278, in run_iter
cont = make_model_devi (ii, jdata, mdata)
File "/nfs-share/home/1800011848/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 687, in make_model_devi
ss = sys_configs[idx]
IndexError: list index out of range

fp_task_max

Hello all,

My "eff. task max" for fp labeling is always 100, even though I set the "fp_task_max" tags much larger than 100.
Is there anyone who has an idea of what's going wrong?

Training with multiple compositions

Hi,

I am using dpgen to train on two different compositions. I used "init_bulk" on the two compositions and was able to go through the training, exploration and labeling. However, while the dpgen is collecting the labeled data in the 1st iteration, I face the following error:
RuntimeError: systems with inconsistent formula could not be append:

Is it true that in the current version of dpgen or dpdata, collecting labeled VASP data of different composition is not allowed, or would there be a way around this error? I had an impression that this is possible reading this paper: "10.1103/physrevmaterials.3.023804" since a potential model for Al-Mg system is trained for various Al-Mg ratios.

problem when run dpgen test in local system

python version:3.8.5
deepmd-kit version :1.x
dpgen version : 0.9.3.dev9+g00432d2

problem describe:
when i try running the example in dpgen-master/tests/generator , the error happens like this:
it seems like the jinput go somtething wrong, but i didnt change the param-mg-vasp.json file as a input.
So, anyone could tell me how to fix that ,thx.

Description

------------
Traceback (most recent call last):
  File "/home/ben/.local/bin/dpgen", line 8, in <module>
    sys.exit(main())
  File "/home/ben/.local/lib/python3.8/site-packages/dpgen/main.py", line 175, in main
    args.func(args)
  File "/home/ben/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2410, in gen_run
    run_iter (args.PARAM, args.MACHINE)
  File "/home/ben/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 2369, in run_iter
    make_train (ii, jdata, mdata)
  File "/home/ben/.local/lib/python3.8/site-packages/dpgen/generator/run.py", line 312, in make_train
    jinput['training']['systems'] = init_data_sys
KeyError: 'training'

the log file :
2021-05-06 13:49:57,770 - INFO : start running
2021-05-06 13:49:57,771 - INFO : =============================iter.000000==============================
2021-05-06 13:49:57,771 - INFO : -------------------------iter.000000 task 00--------------------------
(but i can use standalone dp to train a model )

and here is my
machine config file :


{
"train": [
	{
	"machine": {
		"batch": "shell",
		"work_path": "/home/ben/desktop/work/dpgen/test2/temp"
		},
	"resources": {
		"numb_gpu": 0,
		"task_per_node": 8,
		"partition": "cpu",
		"exclude_list": [],
		"mem_limit": 8,
		"source_list": [],
		"module_list": []
		},
	"command": "/home/ben/desktop/1/yes/bin/dp",
	"group_size": 1
	}
],
"model_devi": [
	{
	"machine": {
		"batch": "shell",
		"work_path": "/home/ben/desktop/work/dpgen/test2/temp"
	},
	"resources": {
		"numb_gpu": 0,
		"task_per_node": 8,
		"partition": "cpu",
		"exclude_list": [],
		"mem_limit": 8,
		"source_list": [],
		"module_list": []
		},
	"command": " ~/desktop/1/lammps/src/lmp_mpi",
	"group_size": 1
	}
],
"fp": [
	{
	"machine": {
		"batch": "shell",
		"work_path": "/home/ben/desktop/work/dpgen/test2/temp"
	},
	"resources": {
		"numb_gpu": 0,
		"task_per_node": 8,
		"with_mpi": false,
		"source_list": ["/home/ben/intel/parallel_studio_xe_2019.5.075/psxevars.sh"],
		"module_list": [],
		"partition": "cpu",
	},
	"command": "ulimit -s unlimited && mpirun -n 4 /home/ben/desktop/1/vasp.5.4.4/bin/vasp",
	"group_size": 30
	}
	]
}

Automatize virtual environment packing with conda.

Summary
Enable automatic virtual environment packing with conda.

Details
Straigten out the entire workflow and automatize.

How to interpret the number of lines in ".shuffled.00.out" file

Hi, thank you for developing this package, which is very useful.

In the run step of dpgen, I found that it generates three files (in 02.fp) named res_accurate.shuffled.000.out, res_failed.shuffled.000.out and candidate.shuffled.000.out, each containing some configurations. Does it mean that the configurations (or structures) in res_accurate.* belong to the neighborhood of the training data, and the ones in res_failed.shuffled.*.out are far from the training data? Then what about the configurations in candidate.shuffled.*.out? How could I know how many of the generated configurations in DPMD simulations are "new" configurations in '01.model_devi' step?

The other question is about the exploration strategy. In the iterations reported in your paper (as shown in the attached figure), you didn't increase the temperatures of DPMD simulations until the labeling percentage approaches 0. Is there any reason behind this strategy? Is this better than exploring temperatures randomly?

Appreciate it again.

"clusters" needs to be included in SLURM

machine.json

Hello,

Similar to the flag "Partition" that needs to be defined in machine.json, our HPC slurm script also needs to define explicitly:
#SBATCH --clusters=

May I ask for your help to include this option?

Thanks!

Setting up DPGEN to run in university cluster

Hello,

After reading the recently published paper https://doi.org/10.1103/PhysRevMaterials.3.023804 for developing more accurate potentials for surfaces and bulks, I thought of setting it up on the cluster of University of Pittsburgh. We used slurm for job submission. However, since DPGEN seems to be built for a different cluster environment, I had to make changes to the source code to make it adjust to ours.

Instead of submitting jobs from a personal system, I need to send it from the cluster's login node. In the initial stage of "Initial data preparation", jobs are getting submitted, however, it doesn't realize the job hasn't finished and terms it as "failed" and tried submitting it another three times. I then ended up printing few variables from the file Dispatcher.py to see if things are correct.

Few particularly concerning values did arrive:
This is what I get
{'task_chunks': [['sys-0016']], 'job_list': [{'context': <dpgen.dispatcher.LocalContext.LocalContext object at 0x7fe470804e80>, 'batch': <dpgen.dispatcher.Slurm.Slurm object at 0x7fe4807e60f0>}], 'job_record': <dpgen.dispatcher.Dispatcher.JobRecord object at 0x7fe470ba0860>, 'command': ['srun --mpi=pmi2 vasp_std'], 'resources': {'task_per_node': 28, 'numb_node': 2, 'numb_gpu': 0, 'exclude_list': [], 'with_mpi': False, 'qos': 'short', 'source_list': [], 'module_list': ['intel/2017.1.132', 'intel-mpi/2017.1.132', 'mkl', 'fftw', 'vasp/5.4.4'], 'time_limit': '1:0:0', 'partition': 'opa', 'cluster': 'mpi', '_comment': "that's All", 'cpus_per_task': -1, 'mem_limit': -1, 'account': '', 'constraint_list': [], 'license_list': [], 'module_unload_list': [], 'envs': None, 'cuda_multi_tasks': False, 'allow_failure': False, 'cvasp': False}, 'outlog': 'log', 'errlog': 'err', 'backward_task_files': ['OUTCAR', 'CONTCAR']}

Most of it seems right, but job_list and job_record have methods to a class. Is this correct? If not, what is it supposed to look like?

There are a few additional question:

Do you think it is worth spending time to try and get it running on our cluster?
If yes, could one of the developers help us in setting it up here?
If not, what would be a better option to get it working? I think submitting jobs manually seems to be a difficult task. Would AWS be a better alternative, since I am assuming that it is independent of where you work from?

It would be great to get some help with this. Really looking forward to build reliable potentials for the system that I am working on.

Best
Sid

Training a model from an already built .pb file

Hello,

I have been trying to fine tune my dp model with some active learning iterations. I have a .pb file that was generate from a regular deepmd training (without dpgen). Now, I would want to use the same .pb file and train it with further MD runs in DPGEN for different operating conditions.

I was a bit confused with the documentation. What I understood is that I would have to set a path to my previous training meta-data files (which contain the model.ckpt* files) which would be my "training_iter0_model_path" and then I would have to set "training_init_model" to be true. This would then use the weight and bias parameters as that of the model at "training_iter0_model_path" which would be the first model for iter0. Is this correct?

Another question, is there a way to bypass the init_data_sys part if I already have an initial model ready? With this I could save some time with running DFT-MD simulations.

I hope to hear from you guys soon.

Thanks in advance.

Best
Siddarth

A mistake when running a model

Hello,

Does anyone know how to fix it?

INFO:dpgen:-------------------------iter.000000 task 01--------------------------
INFO:dpgen:new submission of 97ec709c-3ded-4696-87bd-91bf95574bea for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
INFO:dpgen:new submission of 8fb1cb88-7244-4da3-b9ae-83dd88426f1c for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
INFO:dpgen:new submission of 9de06c6f-ae00-4742-98cb-9dcd5a8cefcd for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
INFO:dpgen:new submission of b058a8d3-2f18-405c-a179-8327e55905cb for chunk 221407c03ae5c73109cce71d27e24637824f3333
INFO:dpgen:job 97ec709c-3ded-4696-87bd-91bf95574bea finished
INFO:dpgen:job 8fb1cb88-7244-4da3-b9ae-83dd88426f1c finished
INFO:dpgen:job b058a8d3-2f18-405c-a179-8327e55905cb finished
INFO:dpgen:job 9de06c6f-ae00-4742-98cb-9dcd5a8cefcd finished
INFO:dpgen:-------------------------iter.000000 task 02--------------------------
INFO:dpgen:-------------------------iter.000000 task 03--------------------------
Traceback (most recent call last):
File "/home/zhq/.local/bin/dpgen", line 10, in
sys.exit(main())
File "/home/zhq/.local/lib/python3.7/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/home/zhq/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 2346, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/home/zhq/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 2315, in run_iter
cont = make_model_devi (ii, jdata, mdata)
File "/home/zhq/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 737, in make_model_devi
_make_model_devi_native(iter_index, jdata, mdata, conf_systems)
File "/home/zhq/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 931, in _make_model_devi_native
deepmd_version = deepmd_version)
File "/home/zhq/.local/lib/python3.7/site-packages/dpgen/generator/lib/lammps.py", line 59, in make_lammps_input
ret+= "neighbor %f bin\n"%(jdata["neighbor"])
KeyError: 'neighbor'

best regards,
Zhongheng

About the KPOINTS

Hi,
I don't know how to set the KPOINTS in the json files, is there some parameters?
Thanks

error when use "pip install --user ." to install the dpgen

I have successflully installed the dpgen package on one supercomputer before. But this time when i use "pip install --user ." on another computer, there was an error:

ERROR: Could not install packages due to an EnvironmentError: [('/lustre3/jmxue_pkuhpc/software/lammps_software/dpgen/.git/objects/pack/pack-add722551b0cd46bae0c45e5a3f94d4a4578e1ca.pack', '/ram/tmp/pip-req-build-_ks46gq9/.git/objects/pack/pack-add722551b0cd46bae0c45e5a3f94d4a4578e1ca.pack', "[Errno 13] Permission denied: '/ram/tmp/pip-req-build-_ks46gq9/.git/objects/pack/pack-add722551b0cd46bae0c45e5a3f94d4a4578e1ca.pack'"), ('/lustre3/jmxue_pkuhpc/software/lammps_software/dpgen/.git/objects/pack/pack-add722551b0cd46bae0c45e5a3f94d4a4578e1ca.idx', '/ram/tmp/pip-req-build-_ks46gq9/.git/objects/pack/pack-add722551b0cd46bae0c45e5a3f94d4a4578e1ca.idx', "[Errno 13] Permission denied: '/ram/tmp/pip-req-build-_ks46gq9/.git/objects/pack/pack-add722551b0cd46bae0c45e5a3f94d4a4578e1ca.idx'")]

it seems i don't have the permission, but i can make document using "mkdir -p ram/tmp/pip-req-build-_ks46gq9/.git/objects/pack/". I don't know why this error occured. So i simply use "python setup.py install" as well as installing the dpdata. And it works, and the installed PATH is
“/home/jmxue_pkuhpc/lustre3/anaconda3/envs/deepMD/bin/dpgen” .

A mistake when I fitted different systems

Summary

DPGEN Version and Platform

Job submission and computing cluster configuration

Expected Behavior

Actual Behavior

Steps to Reproduce

Further Information, Files, and Links

Issue with LAMMPS jobs in DPGEN procedure

Hi,
I'm trying to use your package in order to train a neural network potential to simulate water.
The procedure seems to proceed correctly till it's time for LAMMPS to generate new configurations using the just trained potentials. The dpgen.log gives me back the following text. It seems like the stage 4 of the iteration (run_model_devi), which should consist in the submissions of the LAMMPS job, does not happen. Anyway, no errors are present.

2020-12-05 18:31:13,398 - INFO : start running
2020-12-05 18:31:13,401 - INFO : =============================iter.000000==============================
2020-12-05 18:31:13,401 - INFO : -------------------------iter.000000 task 00--------------------------
2020-12-05 18:31:13,419 - INFO : -------------------------iter.000000 task 01--------------------------
2020-12-05 18:31:13,615 - INFO : new submission of e26935fc-bf92-41b8-89f0-608d0cf6a83d for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2020-12-05 18:31:13,707 - INFO : new submission of 31803ae5-570c-42ba-9543-48a4fa40bcc4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2020-12-05 18:31:13,828 - INFO : new submission of 4a9babda-ba80-4558-bc7f-d686547281fc for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2020-12-05 18:31:14,009 - INFO : new submission of 07831ff6-e1fd-4894-a8f2-5d908b206c5d for chunk 221407c03ae5c73109cce71d27e24637824f3333
2020-12-05 18:32:14,187 - INFO : job e26935fc-bf92-41b8-89f0-608d0cf6a83d finished
2020-12-05 18:33:14,413 - INFO : job 31803ae5-570c-42ba-9543-48a4fa40bcc4 finished
2020-12-05 18:34:14,588 - INFO : job 4a9babda-ba80-4558-bc7f-d686547281fc finished
2020-12-05 18:35:14,748 - INFO : job 07831ff6-e1fd-4894-a8f2-5d908b206c5d finished
2020-12-05 18:35:14,800 - INFO : -------------------------iter.000000 task 02--------------------------
2020-12-05 18:35:14,803 - INFO : -------------------------iter.000000 task 03--------------------------
2020-12-05 18:35:14,812 - INFO : -------------------------iter.000000 task 04--------------------------
2020-12-05 18:35:14,822 - INFO : -------------------------iter.000000 task 05--------------------------
2020-12-05 18:35:14,823 - INFO : -------------------------iter.000000 task 06--------------------------
2020-12-05 18:35:14,827 - INFO : -------------------------iter.000000 task 07--------------------------
2020-12-05 18:35:14,831 - INFO : -------------------------iter.000000 task 08--------------------------
2020-12-05 18:35:14,832 - INFO : =============================iter.000001==============================
2020-12-05 18:35:14,833 - INFO : -------------------------iter.000001 task 00--------------------------
2020-12-05 18:35:14,840 - INFO : -------------------------iter.000001 task 01--------------------------
2020-12-05 18:35:14,843 - INFO : -------------------------iter.000001 task 02--------------------------
2020-12-05 18:35:14,843 - INFO : -------------------------iter.000001 task 03--------------------------
2020-12-05 18:35:14,844 - INFO : finished

Thank you

Cesare

[Feature Request] Check compatible version with DeePMD-kit

Summary
We should figure out which version of DeePMD-kit will be supported in DP-GEN.

Detailed Description

Problems occur when users specify a wrong version of "param.json" or "machine.json", such as "dp_train" and "dp train" and "input.json" of DeePMD-kit.

I propose to deprecate the support for DeePMD-kit < 1.0, and only provide examples for DeePMD-kit >= 1.0.

Meanwhile, we're going to release DeePMD-kit 2.0. I'm wondering if there is a big difference of user interface?

HOw to start dpgen exploration from an existing frozen_model.db file

Hi,

I have a trained model for a system, which works ok but at high temperature it shows some issue. I am wondering if I could perform further training using dpgen using the available frozen_model.pb files. It would be of a great help.

I have another question regarding training step/stop_batch should be chosen in param.json file.

Thanks,
Mayank

Error when running model_devi_jobs

my param.json is

"model_devi_dt": 0.002,
  "model_devi_skip": 0,
  "model_devi_f_trust_lo":  0.05,
  "model_devi_f_trust_hi":  0.15,
  "model_devi_clean_traj":  false,
  "model_devi_jobs": [
    {
      "sys_idx": [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360],
      "temps": [
        30
      ],
      "press": [
        1
      ],
      "trj_freq": 5,
      "nsteps":   500,
      "ensemble": "npt",
      "_idx":     "00"
    }
  ],

During Iter.0000, four jobs was submitted and only one job failed. And the track is.

ERROR: Attempting to rescale a 0.0 temperature (../velocity.cpp:735)
Last command: velocity all create ${TEMP} 197386

	if "NSW" in line:
	nsw_flag = True
	nsw_steps = int(incar_line.split()[-1])
	break
	#dlog.info("nsw_steps is", nsw_steps)
	#dlog.info("md_nstep_jdata is", md_nstep_jdata)
	if nsw_flag:
	if (nsw_steps != md_nstep_jdata):
	dlog.info("WARNING: your set-up for MD steps in PARAM and md_incar are not consistent!")
	dlog.info("MD steps in PARAM is %d"%(md_nstep_jdata))
	dlog.info("MD steps in md_incar is %d"%(nsw_steps))
	dlog.info("DP-GEN will use settings in md_incar!")
	jdata['md_nstep'] = nsw_steps

deepmodeling / dpgen Goto Github PK

dpgen's People

Contributors

Stargazers

Watchers

Forkers

dpgen's Issues

the result log DeepModeling

Dependency

Reference

Please cite: Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 107206.

Description

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-12.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-12.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-13.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-13.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-14.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-14.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-15.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-15.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-16.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-16.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-17.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-17.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-18.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-18.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-19.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-19.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-20.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-20.50

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-21.00

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/01.eos/Al/std-fcc/deepmd/vol-21.50

Bulk Modulus BV = 70.86 GPa

Shear Modulus GV = 33.46 GPa

Youngs Modulus EV = 86.74 GPa

Poission Ratio uV = 0.30

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/03.vacancy/Al/std-fcc/deepmd/struct-3x3x3-000

/gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/03.vacancy/Al/std-fcc/deepmd

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/04.interstitial/Al/std-fcc/deepmd/struct-Al-3x3x3-000

generate /gpfs01/zhq_work/fzh/work/ml/test/Al-2/test/04.interstitial/Al/std-fcc/deepmd/struct-Al-3x3x3-001

Details Description

Dependency

Reference

Please cite: Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 107206.

Description

I am run “dpgen autotest make ...” but I get the below information in my screen:

(base) [bqfu@node01 ea05]$ /home/share/dp/bin/dpgen autotest make property3c.json DeepModeling

Dependency

Reference

Please cite: Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 107206.

Description

-- coding: utf-8 --

Before any more imports, leave cwd out of sys.path for internal 'conda shell.*' commands.

see conda/conda#6549

Recommend Projects

Recommend Topics

Recommend Org

the result log
DeepModeling

Please cite:
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E,
DP-GEN: A concurrent learning platform for the generation of reliable deep learning
based potential energy models, Computer Physics Communications, 2020, 107206.

Details
Description

Please cite:
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E,
DP-GEN: A concurrent learning platform for the generation of reliable deep learning
based potential energy models, Computer Physics Communications, 2020, 107206.

I am run “dpgen autotest make ...”
but I get the below information in my screen:

(base) [bqfu@node01 ea05]$ /home/share/dp/bin/dpgen autotest make property3c.json
DeepModeling

Please cite:
Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E,
DP-GEN: A concurrent learning platform for the generation of reliable deep learning
based potential energy models, Computer Physics Communications, 2020, 107206.