scm-nv / qmflows Goto Github PK
View Code? Open in Web Editor NEWThis library tackles the construction and efficient execution of computational chemistry workflows
License: Other
This library tackles the construction and efficient execution of computational chemistry workflows
License: Other
I think that functions should not been in the init.py modules. It causes unexpected behavior. Also, the Python community in general disencourage the use of the exec
function, because it can cause nasty bugs.
The properties.json files should have the following format:
{property: {
"parser": "awk" or "python" or "kfreader",
"file_ext": "out" or "dat" or "hess" etc,
"function": "some awk script" or "name_of_a_function_in_parser.py" or ["section", "property"] in t21
}}
User Settings should get preference over defaults Settings. But now it is printing both user and settings in the case of functionals. For example look at the XC section of following example script where it prints both PBE and LDA, where it should only give preference to PBE(user preferred functional). It should be fixed as we are now having more people started using our code.
`# Default imports
from qmworks import Settings, templates, run
from noodles import gather
from plams import Molecule
from qmworks.packages.SCM import adf
import plams
plams.init()
import os
import sys
import fnmatch
from os.path import join
path = "/home/ganga/Workflows/Workflows3/Ayers/WithoutDFTBHessian/Moleculeswithmul2"
files = os.listdir(path)
xyzFiles = filter(lambda x: fnmatch.fnmatch(x,"*.xyz"), files)
pathsXYZ = map(lambda x: join(path,x), xyzFiles)
molecules = [Molecule(name, 'xyz') for name in pathsXYZ]
settings = Settings()
settings.functional = "pbe"
settings.basis = "TZ2P"
settings.specific.adf.charge = "0 2"
settings.specific.adf.unrestricted = ""
job_list = []
for m in molecules:
ts = adf(templates.ts.overlay(settings),m)
job_list.append(adf(templates.freq.overlay(settings), ts.molecule))
wf = gather(*job_list)
results = run(wf, n_processes = 1)`
Scf
Converge 1e-06
Iterations 100
End
Xc
Gga PBE
Lda
End
End Input
Dear Johan,
I have a following problem with our current TS script.
Look at below example:
[14:59:46] Executing ac01_r2_DFTB.run
[14:59:57] Execution of ac01_r2_DFTB.run finished with returncode 154
[14:59:57] WARNING: Job ac01_r2_DFTB finished with nonzero return code
[14:59:57] Job ac01_r2_DFTB finished with status 'crashed'
So ac01_r2_DFTB is crashed due to exceeding number of geometry optimizations. It is pure DFTB issue nothing to do with qmworks. But if you see the below line:
Job et01_r2_DFTB started
[15:05:08] Starting et01_r2_DFTB.prerun()
[15:05:08] et01_r2_DFTB.prerun() finished
[15:05:08] Job et01_r2_DFTB previously run as ac01_r2_DFTB, using old results
[15:05:08] Copying results of ac01_r2_DFTB failed because of the following error: Using Results associated with crashed or failed job
To run et01_r2_DFTB, it is using ac01_r2_DFTB as it is the same structure in both reactions. if one of them failed irrespective of which one then qmworks automatically retrieving that information for the second rjob. As it is failed in the first run, it simply omits to re-run in the second job, that leads to loss of all files(except .dill file) in the second job folder.
Kind regards,
Satesh
I tried to run this script:
h2o = rdkitTools.smiles2plams('O')
h2o_freq = gamess(templates.freq, h2o, job_name = "freq", work_dir = "/home/lars/scr").hessian
s = Settings()
s.inithess = h2o_freq
h2o_opt = adf(templates.geometry.overlay(s), h2o, job_name = "opt")
It crashed with:
....
File "/home/lars/workspace/workflowengine/noodles/serial/registry.py", line 324, in encode
raise NotImplementedError(msg)
NotImplementedError: Cannot encode [ 5.83587488e-01 -3.38289452e-03 9.93129190e-14 -2.97726953e-01
-2.30248008e-01 -1.47776756e-14 -2.85862141e-01 2.33630549e-01
-3.73399228e-14 -3.38289452e-03 3.62691600e-01 2.51534904e-14
-1.53536080e-01 -1.75418715e-01 -2.25427316e-13 1.56919114e-01
-1.87275345e-01 1.28022593e-13 9.93129190e-14 2.51534904e-14
-1.58563114e-02 -5.85902854e-14 -5.06539255e-14 7.92743537e-03
-4.07660017e-14 2.55871713e-14 7.92968842e-03 -2.97726953e-01
-1.53536080e-01 -5.85902854e-14 3.32297741e-01 1.91303251e-01
2.97938757e-14 -3.45700387e-02 -3.77674987e-02 -4.11996826e-15
-2.30248008e-01 -1.75418715e-01 -5.06539255e-14 1.91303251e-01
1.71549718e-01 1.16616786e-13 3.89449502e-02 3.87034274e-03
-4.84855212e-14 -1.47776756e-14 -2.25427316e-13 7.92743537e-03
2.97938757e-14 1.16616786e-13 -6.96379742e-03 -1.50053581e-14
1.08680426e-13 -9.64044822e-04 -2.85862141e-01 1.56919114e-01
-4.07660017e-14 -3.45700387e-02 3.89449502e-02 -1.50053581e-14
3.20433037e-01 -1.95863384e-01 4.13297868e-14 2.33630549e-01
-1.87275345e-01 2.55871713e-14 -3.77674987e-02 3.87034274e-03
1.08680426e-13 -1.95863384e-01 1.83406115e-01 -7.96238075e-14
-3.73399228e-14 1.28022593e-13 7.92968842e-03 -4.11996826e-15
-4.84855212e-14 -9.64044822e-04 4.13297868e-14 -7.96238075e-14
-6.96604910e-03]: encoder for type `ndarray` is not implemented.
So we now run into the issue of serializing the hessian.
Here we are back to our discussion on how to handle potentially big data arrays and their storage in HDF5.
How about the following solution:
We could add a serializer in packages.py for numpy arrays. This serializer saves the numpy array to an HDF5 and returns the path of the HDF5.
What do you think?
The problem seems to be that the plams result object is stored in ORCA_Result
Book some dedicated time, e.g. at the eScience Center to discuss this and implement a solution.
When a gamess job is started without symmetry definition, plams selects by default C1 symmetry. However, it then crashes with the follow error:
THE POINT GROUP OF THE MOLECULE IS C1
THE ORDER OF THE PRINCIPAL AXIS IS 0
*** ERROR!
BLANK CARD FOUND WHILE TRYING TO READ INPUT ATOM 1
POSSIBLE ERRORS INCLUDE:
1. C1 GROUP SHOULD NOT HAVE A BLANK CARD AFTER IT.
Plams puts an empty line after the line specifying the symmetry, Apperently this is wrong with C1, but it is fine with other symmetries like Cs. Pretty weird behavior of Gamess, but confirmed on slide 26 of http://www.msg.ameslab.gov/tutorials/gamessintro.pdf
Needs to be solved in plams/gamessjob.py, function print_molecule
the Plams.init
command is not longer necessary. See ths issue
I already removed it from the test suite and now we should remove it from the examples.
Create a unit test that check that jobs can be restarted
Generate Documentation for the main features like: Settings, packages and Templates.
When calling h5py one user received the following error:
Warning! _HDF5 library version mismatched error_
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.8.10, library is 1.8.16
SUMMARY OF THE HDF5 CONFIGURATION
=================================General Information:
HDF5 Version: 1.8.16 Configured on: Mon Apr 4 16:08:17 CDT 2016 Configured by: [email protected] Configure mode: production Host system: x86_64-unknown-linux-gnu Uname information: Linux centos5x64.corp.continuum.io 2.6.18-400.1.1.el5 #1 SMP Thu Dec 18 00:59:53 EST 2014 x86_64 x86_64 x86_64 GNU/Linux Byte sex: little-endian Libraries: shared Installation point: /opt/anaconda1anaconda2anaconda3
Compiling Options:
Compilation Mode: production C Compiler: /usr/bin/gcc ( gcc (GCC) 4.4.7 20120313 ) CFLAGS: H5_CFLAGS: -std=c99 -pedantic -Wall -Wextra -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline -Wno-long-long -Wfloat-equal -Wmissing-format-attribute -Wmissing-noreturn -Wpacked -Wdisabled-optimization -Wformat=2 -Wunreachable-code -Wendif-labels -Wdeclaration-after-statement -Wold-style-definition -Winvalid-pch -Wvariadic-macros -Wnonnull -Winit-self -Wmissing-include-dirs -Wswitch-default -Wswitch-enum -Wunused-macros -Wunsafe-loop-optimizations -Wc++-compat -Wstrict-overflow -Wlogical-op -Wlarger-than=2048 -Wvla -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat -O3 AM_CFLAGS: CPPFLAGS: H5_CPPFLAGS: -D_GNU_SOURCE -D_POSIX_C_SOURCE=200112L -DNDEBUG -UH5_DEBUG_API AM_CPPFLAGS: -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE Shared C Library: yes Static C Library: no
Statically Linked Executables: no
LDFLAGS:
H5_LDFLAGS:
AM_LDFLAGS:
Extra libraries: -lrt -lz -ldl -lm
Archiver: ar
Ranlib: ranlib
Debugged Packages:
API Tracing: noLanguages:
Fortran: no C++: yes C++ Compiler: /usr/bin/g++ ( g++ (GCC) 4.4.7 20120313 ) C++ Flags: H5 C++ Flags: AM C++ Flags: Shared C++ Library: yes Static C++ Library: no
Features:
Parallel HDF5: no High Level library: yes Threadsafety: no Default API Mapping: v18
With Deprecated Public Symbols: yes
I/O filters (external): deflate(zlib)
MPE: no
Direct VFD: no
dmalloc: no
Clear file buffers before write: yes
Using memory checker: no
Function Stack Tracing: no
Strict File Format Checks: no
Optimization Instrumentation: no
Bye...
Aborted (core dumped)
There is not implemented the translation from generic to specific keywords in the handle_special_keywords
for Orca
class.
The first to keywords to translate should be:
Issued by Xiao
For example:
mol = rdkitTools.smiles2plams('CC.CO')
smirks = '[O:1][H]>>[O:1]C'
products = rdkitTools.apply_smirks(mol, smirks)
This results in COC
instead of the expected CC.COC
.
The notebooks should explain all the concepts: merging settings, templates, workflows etc.
Needs to be solved in Noodles
Inside the packages module there are several functions related to the noodles serialisation procedure, and it is not clear if they are actively use by Noodles. can someone please comment about the following function:
class SerMolecule(Serialiser):
def __init__(self):
super(SerMolecule, self).__init__(plams.Molecule)
def encode(self, obj, make_rec):
return make_rec(obj.as_dict())
def decode(self, cls, data):
return plams.Molecule.from_dict(**data)
class SerSettings(Serialiser):
def __init__(self):
super(SerSettings, self).__init__(Settings)
def encode(self, obj, make_rec):
return make_rec(obj.as_dict())
def decode(self, cls, data):
return Settings(data)
def registry():
return Registry(
parent=serial.base(),
types={
Package: AsDict(Package),
plams.Molecule: SerMolecule(),
Result: SerAutoStorable(Result),
Settings: SerSettings()})
There are several components modules that are not export by the package. Are they operational?
In the module package
in function awk_file there is a try/except
statement that does not have an associated error.
I have added a new test called test_ethene (this test was created originally by a student). The Worflow is failing with the following error:
File "$HOME/escience/src/qmworks/qmworks/packages/packages.py", line 61, in __call__
result = self.run_job(job_settings, mol, **kwargs)
File "$HOME/escience/src/qmworks/qmworks/packages/SCM.py", line 47, in run_job
settings=adf_settings).run()
File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/basejob.py", line 112, in run
jobrunner._run_job(self, jobmanager)
File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/jobrunner.py", line 33, in wrapper
func(self, *args, **kwargs)
File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/jobrunner.py", line 109, in _run_job
if job._prepare(jobmanager):
File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/basejob.py", line 182, in _prepare
prev = jobmanager._check_hash(self)
File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/jobmanager.py", line 100, in _check_hash
h = job.hash()
File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/basejob.py", line 327, in hash
h.update(self.get_input().encode())
File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/scmjob.py", line 182, in get_input
self._parsemol()
File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/scmjob.py", line 272, in _parsemol
self.settings.input.atoms['_'+str(i+1)] = ('%5i'%(i+1)) + atom.str(symbol=smb, suffix=suffix)
AttributeError: 'str' object has no attribute 'atoms'
The workflow corrected creates the structures and optimized them but it fails when Plams
calls the function _parsemol to create a new input using the optimized geometries.
Currently a new plams.XXX folder is created every time a workflow is rerun. We would like to define an output folder, e.g. as a parameter of the run function, in which all results are collected.
the calls to the plams library should follow the following name space convention
from scm import plams
Currently we are printing everything (Info
, Warnings
and Errors
) to the standard output. A proper logger is required.
Currently we can name the input jobs using the job_name
keyword, but there is not such label in the result object. User may want to do further processing of the result object using the job_name
label.
Currently the dirac is outdated and lacks examples.
Dear Johan and Lars,
I was running workflow for TS search with DFTB frequencies(test1_withDFTBfreq.py) and it was successful with results(slurm-2471538.out).
I made a small modification in the workflow that I removed DFTB frequency calculation in the middle of workflow(test1_withoutDFTBfreq.py).
I ran second script in the same folder of previous script. First few steps in the both workflows are similar and the results are retrieved from cashe.json from previous run and remaining jobs were run and the results were saved into new plams folder.
When I try to print table in the second workflow, it seems to be that it can't find results from first steps because the results are living in first plams folder.
I believe noodles thinks these as two different workflows irrespective of some common jobs. Is there any way to copy the retrieved information from cashe.json to new plams folder so that all the results from new workflows stays together. or am I missing something?
Could one of you please help me to solve these issue?
Kind regards,
Satesh
Configure the repo and the account to do continuos integration using travis
Currently the absolute path to the output file is serialized. It is desirable to use relative path to make easier the transfer of the folder between users and/or machines.
Currently there is not much in the tutorial.
The interface to pam
from plams controls the parameters that can be passed via the command line.
Those parameters are not accesible from QMWorks
. We should make those parameters available.
Noodles has a new feature that allows for easier conditional execution. Having code like this in QMworks, would allow you to run several codes until one is successful.
def find_first(pred, lst):
"""Receives a predicate (non-scheduled) and a list of promised objects,
Promises are executed until we find one for which the predicate returns
true."""
if lst:
return s_find_first(pred, lst[0], [quote(l) for l in lst[1:]])
else:
return None
@schedule
def s_find_first(pred, first, lst):
if pred(first):
return first
elif lst:
return s_find_first(pred, unquote(lst[0]), lst[1:])
else:
return None
Are we happy with the format of the generic2package.json files?
["key1.key2"]
as ["key1"]["key2"]
. If possible if would be great if we could stick to the plams Settings somehow. But: I don't know how to implement e.g. the basis
becomes basis.type
modification using just plams Setttings."generic_key": {
"key": "specific_key",
"value": "specific_value"}
Any idea's?
Since Plams molecule type now also starts counting atoms at 1, QMworks should adopt this counting as well.
In the class Settings we have several methods that modify the behaviour of Plams.Settings. Do we still need these modifications after merging our changes in the escience branch with the Main branch?
Some generic keywords are implemented for some package, but not for others.
Dear All,
Qmworks is giving an error when we type name of functional in upper case letters (e.g. BLYP). It is only working when we use all lower case letters to define functions (e.g. blyp instead of BLYP). We should get make them case insensitive.
Kind regards,
Satesh
Currently the job.hessian
property can be extract in ORCA
but not in ADF
.
Add the mechanism to read the Hessian in ADF
Currently if a computation fails in qmworks all the workflow is suspended. It is desirable that the workflow continues running as long as it can, issuing the corresponding warning.
Replace the exec
expression from tests
Right now the error handling mechanism is triggered when a property is requested from a Result
object. If the library can not get the queried property it may raise an error that is difficult to follow by the user.
see the get_property implementation.
Dear All,
I am querying for status to be successful (status == successful) in my workflow to print results. But I found that for some reason instead of status == successful, it is writing status ==copied after workflow execution and it obviously not print the results for me.
Many thanks in advance,
Satesh
The Package
class requires that its children implement the method run_job and handle_special_keywords. These methods do not use the self object for any operation but belong to the Package
class. Therefore these method should be static.
WE have been developing QMWORKS for almost a year and it is time to do a release. Notice that we don't a full-fledge but right now the packages is gaining momentum and users need several features,
the most important of all is a manual. So, I will star to do some code review together with an initial manual. Also, I will open issues about things that we must done urgently in oder to get accomplish our first release.
There are some packages, including CP2K, that report a normal termination event though the SCF did not converge.
For example
SCF WAVEFUNCTION OPTIMIZATION
Step Update method Time Convergence Total energy Change
------------------------------------------------------------------------------
..............................
196 P_Mix/Diag. 0.40E+00 11.4 165.68819872 4330.1476633861 2.12E+03
197 P_Mix/Diag. 0.40E+00 11.4 99.38644504 192.6044412835 -4.14E+03
198 P_Mix/Diag. 0.40E+00 11.4 94.46161095 2939.4526751424 2.75E+03
199 P_Mix/Diag. 0.40E+00 11.4 322.70007533 4201.4125916074 1.26E+03
200 P_Mix/Diag. 0.40E+00 11.4 165.15054963 3928.7731404706 -2.73E+02
Leaving inner SCF loop after reaching 200 steps.
Electronic density on regular grids: -2111.9999999764 0.0000000236
Core density on regular grids: 2111.9999997104 -0.0000002896
Total charge density on r-space grids: -0.0000002660
Total charge density g-space grids: -0.0000002660
Overlap energy of the core charge distribution: 0.00003969715808
Self energy of the core charge distribution: -9931.63094211405405
Core Hamiltonian energy: 4332.57900394370881
Hartree energy: 10407.72552904863733
Exchange-correlation energy: -879.90049010482494
Total energy: 3928.77314047062464
*** WARNING in qs_scf.F:479 :: SCF run NOT converged ***
This results are useless, but QMWorks keeps on running because CP2K reports a Normal terminantion
message.
the Question is then, who is responsible to terminste the calculation? The programer of the workflow, QMWorks?
Any suggestion about how to implement this?
QMWORKS fails with an error similar to:
Internal error encountered. Contact the developers:
<class 'NotImplementedError'> Cannot encode <noodles.files.path.Path object at
0x2aaac81ba6a0>: encoder for type Path
is not implemented.
Error raise by noodles serializer
in the Settings
definition in the master branch we are explicitly converting the name to lower case. While in the develop branch something different is going on. Are we going to inforce the conversation to lower-case or the user should be aware of potential pitfalls due to case-insensitive nature of the Settings?
The overlay test is failing in the develop branch because now if a keyword is upper-case in the template and the user redefines it using lower-case the results are two new branches with the same name but different case.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.