scm-nv / qmflows Goto Github PK

View Code? Open in Web Editor NEW

45.0 9.0 9.0 34.79 MB

This library tackles the construction and efficient execution of computational chemistry workflows

License: Other

Python 54.19% Jupyter Notebook 45.45% Shell 0.36%

quantum-mechanics science bioinformatics chemistry automation python-3 materials scientific-workflows

qmflows's People

Contributors

Stargazers

Watchers

Forkers

miroi iinfant76 mrauha boyuezhong weiliweili bzwang0320 00mjk chrinide quantumflow-open

qmflows's Issues

Remove Exec dependencies and hidden function in init files

I think that functions should not been in the init.py modules. It causes unexpected behavior. Also, the Python community in general disencourage the use of the exec function, because it can cause nasty bugs.

Parsing output files

The properties.json files should have the following format:

{property: {
    "parser": "awk" or "python" or "kfreader",
    "file_ext": "out" or "dat" or "hess" etc,
    "function": "some awk script" or "name_of_a_function_in_parser.py" or ["section", "property"] in t21
}}

Settings

User Settings should get preference over defaults Settings. But now it is printing both user and settings in the case of functionals. For example look at the XC section of following example script where it prints both PBE and LDA, where it should only give preference to PBE(user preferred functional). It should be fixed as we are now having more people started using our code.
`# Default imports
from qmworks import Settings, templates, run
from noodles import gather
from plams import Molecule

from qmworks.packages.SCM import adf

import plams

========== =============

plams.init()

import os
import sys
import fnmatch
from os.path import join

path = "/home/ganga/Workflows/Workflows3/Ayers/WithoutDFTBHessian/Moleculeswithmul2"
files = os.listdir(path)
xyzFiles = filter(lambda x: fnmatch.fnmatch(x,"*.xyz"), files)
pathsXYZ = map(lambda x: join(path,x), xyzFiles)
molecules = [Molecule(name, 'xyz') for name in pathsXYZ]

settings = Settings()
settings.functional = "pbe"
settings.basis = "TZ2P"
settings.specific.adf.charge = "0 2"
settings.specific.adf.unrestricted = ""

job_list = []
for m in molecules:
ts = adf(templates.ts.overlay(settings),m)
job_list.append(adf(templates.freq.overlay(settings), ts.molecule))

wf = gather(*job_list)

results = run(wf, n_processes = 1)`

Scf
  Converge 1e-06
  Iterations 100
End

Xc
    Gga PBE
    Lda
End

End Input

Storing and Retrieving results

Dear Johan,

I have a following problem with our current TS script.

script

Look at below example:

[14:59:46] Executing ac01_r2_DFTB.run
[14:59:57] Execution of ac01_r2_DFTB.run finished with returncode 154
[14:59:57] WARNING: Job ac01_r2_DFTB finished with nonzero return code
[14:59:57] Job ac01_r2_DFTB finished with status 'crashed'

So ac01_r2_DFTB is crashed due to exceeding number of geometry optimizations. It is pure DFTB issue nothing to do with qmworks. But if you see the below line:

Job et01_r2_DFTB started
[15:05:08] Starting et01_r2_DFTB.prerun()
[15:05:08] et01_r2_DFTB.prerun() finished
[15:05:08] Job et01_r2_DFTB previously run as ac01_r2_DFTB, using old results
[15:05:08] Copying results of ac01_r2_DFTB failed because of the following error: Using Results associated with crashed or failed job

To run et01_r2_DFTB, it is using ac01_r2_DFTB as it is the same structure in both reactions. if one of them failed irrespective of which one then qmworks automatically retrieving that information for the second rjob. As it is failed in the first run, it simply omits to re-run in the second job, that leads to loss of all files(except .dill file) in the second job folder.

Kind regards,
Satesh

Serialize numpy arrays or store in HDF5

I tried to run this script:

h2o = rdkitTools.smiles2plams('O')
h2o_freq = gamess(templates.freq, h2o, job_name = "freq", work_dir = "/home/lars/scr").hessian
s = Settings()
s.inithess = h2o_freq
h2o_opt = adf(templates.geometry.overlay(s), h2o, job_name = "opt")

It crashed with:

....
  File "/home/lars/workspace/workflowengine/noodles/serial/registry.py", line 324, in encode
    raise NotImplementedError(msg)
NotImplementedError: Cannot encode [  5.83587488e-01  -3.38289452e-03   9.93129190e-14  -2.97726953e-01
  -2.30248008e-01  -1.47776756e-14  -2.85862141e-01   2.33630549e-01
  -3.73399228e-14  -3.38289452e-03   3.62691600e-01   2.51534904e-14
  -1.53536080e-01  -1.75418715e-01  -2.25427316e-13   1.56919114e-01
  -1.87275345e-01   1.28022593e-13   9.93129190e-14   2.51534904e-14
  -1.58563114e-02  -5.85902854e-14  -5.06539255e-14   7.92743537e-03
  -4.07660017e-14   2.55871713e-14   7.92968842e-03  -2.97726953e-01
  -1.53536080e-01  -5.85902854e-14   3.32297741e-01   1.91303251e-01
   2.97938757e-14  -3.45700387e-02  -3.77674987e-02  -4.11996826e-15
  -2.30248008e-01  -1.75418715e-01  -5.06539255e-14   1.91303251e-01
   1.71549718e-01   1.16616786e-13   3.89449502e-02   3.87034274e-03
  -4.84855212e-14  -1.47776756e-14  -2.25427316e-13   7.92743537e-03
   2.97938757e-14   1.16616786e-13  -6.96379742e-03  -1.50053581e-14
   1.08680426e-13  -9.64044822e-04  -2.85862141e-01   1.56919114e-01
  -4.07660017e-14  -3.45700387e-02   3.89449502e-02  -1.50053581e-14
   3.20433037e-01  -1.95863384e-01   4.13297868e-14   2.33630549e-01
  -1.87275345e-01   2.55871713e-14  -3.77674987e-02   3.87034274e-03
   1.08680426e-13  -1.95863384e-01   1.83406115e-01  -7.96238075e-14
  -3.73399228e-14   1.28022593e-13   7.92968842e-03  -4.11996826e-15
  -4.84855212e-14  -9.64044822e-04   4.13297868e-14  -7.96238075e-14
  -6.96604910e-03]: encoder for type `ndarray` is not implemented.

So we now run into the issue of serializing the hessian.
Here we are back to our discussion on how to handle potentially big data arrays and their storage in HDF5.

How about the following solution:
We could add a serializer in packages.py for numpy arrays. This serializer saves the numpy array to an HDF5 and returns the path of the HDF5.
What do you think?

Serialization of ORCA_Result doesn't work

The problem seems to be that the plams result object is stored in ORCA_Result

Implement pyXenon runner to optimize usage of resources in NonAdiabaticCoupling application

Book some dedicated time, e.g. at the eScience Center to discuss this and implement a solution.

Gamess crashes when symmetry is not defined

When a gamess job is started without symmetry definition, plams selects by default C1 symmetry. However, it then crashes with the follow error:

 THE POINT GROUP OF THE MOLECULE IS C1      
 THE ORDER OF THE PRINCIPAL AXIS IS     0


 *** ERROR!
 BLANK CARD FOUND WHILE TRYING TO READ INPUT ATOM    1
 POSSIBLE ERRORS INCLUDE:
 1. C1 GROUP SHOULD NOT HAVE A BLANK CARD AFTER IT.

Plams puts an empty line after the line specifying the symmetry, Apperently this is wrong with C1, but it is fine with other symmetries like Cs. Pretty weird behavior of Gamess, but confirmed on slide 26 of http://www.msg.ameslab.gov/tutorials/gamessintro.pdf

Needs to be solved in plams/gamessjob.py, function print_molecule

Move examples to test and remove plams.init() calls

the Plams.init command is not longer necessary. See ths issue
I already removed it from the test suite and now we should remove it from the examples.

Test the restart funcionality

Create a unit test that check that jobs can be restarted

Adding Sphinx Documentation

Generate Documentation for the main features like: Settings, packages and Templates.

Packages documentation

in docs there are leaving all the tutorial and manuals that we are developing. Please fill free to change things and add new documentation specially to the package file.
Also some examples of how to run each of the packages will be nice.

hdf5 headers

When calling h5py one user received the following error:

Warning! _HDF5 library version mismatched error_
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.8.10, library is 1.8.16
SUMMARY OF THE HDF5 CONFIGURATION
=================================

General Information:
     HDF5 Version: 1.8.16
    Configured on: Mon Apr  4 16:08:17 CDT 2016
    Configured by: [email protected]
   Configure mode: production
      Host system: x86_64-unknown-linux-gnu
    Uname information: Linux centos5x64.corp.continuum.io 2.6.18-400.1.1.el5 #1 SMP Thu Dec 18 00:59:53 EST 2014 x86_64 x86_64 x86_64 GNU/Linux
         Byte sex: little-endian
        Libraries: shared
   Installation point: /opt/anaconda1anaconda2anaconda3
Compiling Options:
           Compilation Mode: production
                 C Compiler: /usr/bin/gcc ( gcc (GCC) 4.4.7 20120313 )
                     CFLAGS: 
                  H5_CFLAGS: -std=c99 -pedantic -Wall -Wextra -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline -Wno-long-long -Wfloat-equal -Wmissing-format-attribute -Wmissing-noreturn -Wpacked -Wdisabled-optimization -Wformat=2 -Wunreachable-code -Wendif-labels -Wdeclaration-after-statement -Wold-style-definition -Winvalid-pch -Wvariadic-macros -Wnonnull -Winit-self -Wmissing-include-dirs -Wswitch-default -Wswitch-enum -Wunused-macros -Wunsafe-loop-optimizations -Wc++-compat -Wstrict-overflow -Wlogical-op -Wlarger-than=2048 -Wvla -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat -O3
                  AM_CFLAGS: 
                   CPPFLAGS: 
                H5_CPPFLAGS: -D_GNU_SOURCE -D_POSIX_C_SOURCE=200112L   -DNDEBUG -UH5_DEBUG_API
                AM_CPPFLAGS: -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE 
           Shared C Library: yes
           Static C Library: no
Statically Linked Executables: no
LDFLAGS:
H5_LDFLAGS:
AM_LDFLAGS:
Extra libraries: -lrt -lz -ldl -lm
Archiver: ar
Ranlib: ranlib
Debugged Packages:
API Tracing: no

Languages:
                    Fortran: no

                        C++: yes
               C++ Compiler: /usr/bin/g++ ( g++ (GCC) 4.4.7 20120313 )
                  C++ Flags: 
               H5 C++ Flags:  
               AM C++ Flags: 
         Shared C++ Library: yes
         Static C++ Library: no
Features:
              Parallel HDF5: no
         High Level library: yes
               Threadsafety: no
        Default API Mapping: v18
With Deprecated Public Symbols: yes
I/O filters (external): deflate(zlib)
MPE: no
Direct VFD: no
dmalloc: no
Clear file buffers before write: yes
Using memory checker: no
Function Stack Tracing: no
Strict File Format Checks: no
Optimization Instrumentation: no
Bye...
Aborted (core dumped)

Implement generic input keywords for Orca

There is not implemented the translation from generic to specific keywords in the handle_special_keywords for Orca class.
The first to keywords to translate should be:

basis
functional

Raise error when a generic keyword is not recognized by a package

apply_smirks doesn't work for molecule objects containing disconnected fragments

Issued by Xiao

For example:

mol = rdkitTools.smiles2plams('CC.CO')
smirks = '[O:1][H]>>[O:1]C'
products = rdkitTools.apply_smirks(mol, smirks)

This results in COC instead of the expected CC.COC.

Create Notebooks from examples

The notebooks should explain all the concepts: merging settings, templates, workflows etc.

Catch time-out, terminate job after user-specified time.

Needs to be solved in Noodles

Is the registry still use this way by Noodles

Inside the packages module there are several functions related to the noodles serialisation procedure, and it is not clear if they are actively use by Noodles. can someone please comment about the following function:

class SerMolecule(Serialiser):
    def __init__(self):
        super(SerMolecule, self).__init__(plams.Molecule)

    def encode(self, obj, make_rec):
        return make_rec(obj.as_dict())

    def decode(self, cls, data):
        return plams.Molecule.from_dict(**data)

class SerSettings(Serialiser):
    def __init__(self):
        super(SerSettings, self).__init__(Settings)

    def encode(self, obj, make_rec):
        return make_rec(obj.as_dict())

    def decode(self, cls, data):
        return Settings(data)

def registry():
    return Registry(
        parent=serial.base(),
        types={
            Package: AsDict(Package),
            plams.Molecule: SerMolecule(),
            Result: SerAutoStorable(Result),
            Settings: SerSettings()})

components

There are several components modules that are not export by the package. Are they operational?

try with matching error type

In the module package in function awk_file there is a try/except statement that does not have an associated error.

Plams fails to read Geometries

I have added a new test called test_ethene (this test was created originally by a student). The Worflow is failing with the following error:

  File "$HOME/escience/src/qmworks/qmworks/packages/packages.py", line 61, in __call__
    result = self.run_job(job_settings, mol, **kwargs)
  File "$HOME/escience/src/qmworks/qmworks/packages/SCM.py", line 47, in run_job
    settings=adf_settings).run()
  File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/basejob.py", line 112, in run
    jobrunner._run_job(self, jobmanager)
  File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/jobrunner.py", line 33, in wrapper
    func(self, *args, **kwargs)
  File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/jobrunner.py", line 109, in _run_job
    if job._prepare(jobmanager):
  File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/basejob.py", line 182, in _prepare
    prev = jobmanager._check_hash(self)
  File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/jobmanager.py", line 100, in _check_hash
    h = job.hash()
  File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/basejob.py", line 327, in hash
    h.update(self.get_input().encode())
  File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/scmjob.py", line 182, in get_input
    self._parsemol()
  File "$HOME/miniconda/envs/qmworks/lib/python3.5/site-packages/plams/scmjob.py", line 272, in _parsemol
    self.settings.input.atoms['_'+str(i+1)] = ('%5i'%(i+1)) + atom.str(symbol=smb, suffix=suffix)
AttributeError: 'str' object has no attribute 'atoms'

The workflow corrected creates the structures and optimized them but it fails when Plams calls the function _parsemol to create a new input using the optimized geometries.

Collect results of multiple (re)runs of a workflow in one folder

Currently a new plams.XXX folder is created every time a workflow is rerun. We would like to define an output folder, e.g. as a parameter of the run function, in which all results are collected.

Replace the call to plams

the calls to the plams library should follow the following name space convention

from scm import plams

Give proper error message when a generic keyword value is not in the dictionary

Add a proper logger to QMWorks

Currently we are printing everything (Info, Warnings and Errors) to the standard output. A proper logger is required.

Job Name in the Result objects

Currently we can name the input jobs using the job_name keyword, but there is not such label in the result object. User may want to do further processing of the result object using the job_name label.

DIRAC on qmworks

Currently the dirac is outdated and lacks examples.

Querying for properties

Dear Johan and Lars,

I was running workflow for TS search with DFTB frequencies(test1_withDFTBfreq.py) and it was successful with results(slurm-2471538.out).

I made a small modification in the workflow that I removed DFTB frequency calculation in the middle of workflow(test1_withoutDFTBfreq.py).

I ran second script in the same folder of previous script. First few steps in the both workflows are similar and the results are retrieved from cashe.json from previous run and remaining jobs were run and the results were saved into new plams folder.

When I try to print table in the second workflow, it seems to be that it can't find results from first steps because the results are living in first plams folder.

I believe noodles thinks these as two different workflows irrespective of some common jobs. Is there any way to copy the retrieved information from cashe.json to new plams folder so that all the results from new workflows stays together. or am I missing something?

Could one of you please help me to solve these issue?

Kind regards,
Satesh

Use travis for continuous integration

Configure the repo and the account to do continuos integration using travis

Use a relative path for the output Plams Folder

Currently the absolute path to the output file is serialized. It is desirable to use relative path to make easier the transfer of the folder between users and/or machines.

Expand Jupyter notebook content

Currently there is not much in the tutorial.

Control settings for pam (Dirac) from QMWorks

The interface to pam from plams controls the parameters that can be passed via the command line.
Those parameters are not accesible from QMWorks. We should make those parameters available.

Conditional execution

Noodles has a new feature that allows for easier conditional execution. Having code like this in QMworks, would allow you to run several codes until one is successful.

def find_first(pred, lst):
    """Receives a predicate (non-scheduled) and a list of promised objects,
    Promises are executed until we find one for which the predicate returns
    true."""
    if lst:
        return s_find_first(pred, lst[0], [quote(l) for l in lst[1:]])
    else:
        return None


@schedule
def s_find_first(pred, first, lst):
    if pred(first):
        return first
    elif lst:
        return s_find_first(pred, unquote(lst[0]), lst[1:])
    else:
        return None

Generic keywords

Are we happy with the format of the generic2package.json files?

The interpretation of the json relies on the qmworks Settings subclass which interprets ["key1.key2"] as ["key1"]["key2"]. If possible if would be great if we could stick to the plams Settings somehow. But: I don't know how to implement e.g. the basis becomes basis.type modification using just plams Setttings.
Currently, if values in the dictionary are a list (of two items) the first item is the specific keyword and the second item is a specific value. If the second item is a dictionary, the dictionary defines how to translate the user specified generic value to a package specific value (see https://github.com/SCM-NV/qmworks/blob/master/qmworks/data/dictionaries/generic2ADF.json). This is not very intuitive.
One improvement could be to use a "key" and "value" attribute, so:

"generic_key": {
    "key": "specific_key",
    "value": "specific_value"}

Then still specific_value could be a dictionary defining how a user provided generic value should be translated into a package specific value.

Any idea's?

Atom numbering should consistently start at 1, not 0

Since Plams molecule type now also starts counting atoms at 1, QMworks should adopt this counting as well.

Use and test Xenon backend of Noodles

Settings from QMWORKS vs Settings from Plams

In the class Settings we have several methods that modify the behaviour of Plams.Settings. Do we still need these modifications after merging our changes in the escience branch with the Main branch?

Consistently implement generic keywords in all packages.

Some generic keywords are implemented for some package, but not for others.

Case sensitivity of functional names in qmworks

Dear All,

Qmworks is giving an error when we type name of functional in upper case letters (e.g. BLYP). It is only working when we use all lower case letters to define functions (e.g. blyp instead of BLYP). We should get make them case insensitive.

Kind regards,
Satesh

Add Hessian to the generic properties in ADF

Currently the job.hessian property can be extract in ORCA but not in ADF.
Add the mechanism to read the Hessian in ADF

Continue Workflow execution is a quantum package simulation Fails

Currently if a computation fails in qmworks all the workflow is suspended. It is desirable that the workflow continues running as long as it can, issuing the corresponding warning.

replace the ``exec`` infamous statement and get rid of the plams.init() statement

Replace the exec expression from tests

Improve Error reporting

Right now the error handling mechanism is triggered when a property is requested from a Result object. If the library can not get the queried property it may raise an error that is difficult to follow by the user.

see the get_property implementation.

Status Change

Dear All,

I am querying for status to be successful (status == successful) in my workflow to print results. But I found that for some reason instead of status == successful, it is writing status ==copied after workflow execution and it obviously not print the results for me.

Many thanks in advance,
Satesh

run_job and handle_special_keywords methods should be staticmethods

The Package class requires that its children implement the method run_job and handle_special_keywords. These methods do not use the self object for any operation but belong to the Package class. Therefore these method should be static.

First Release

WE have been developing QMWORKS for almost a year and it is time to do a release. Notice that we don't a full-fledge but right now the packages is gaining momentum and users need several features,
the most important of all is a manual. So, I will star to do some code review together with an initial manual. Also, I will open issues about things that we must done urgently in oder to get accomplish our first release.

Suspending a calculation if a warning is issue

There are some packages, including CP2K, that report a normal termination event though the SCF did not converge.
For example

 SCF WAVEFUNCTION OPTIMIZATION

  Step     Update method      Time    Convergence         Total energy    Change
  ------------------------------------------------------------------------------
..............................
   196 P_Mix/Diag. 0.40E+00   11.4   165.68819872      4330.1476633861  2.12E+03
   197 P_Mix/Diag. 0.40E+00   11.4    99.38644504       192.6044412835 -4.14E+03
   198 P_Mix/Diag. 0.40E+00   11.4    94.46161095      2939.4526751424  2.75E+03
   199 P_Mix/Diag. 0.40E+00   11.4   322.70007533      4201.4125916074  1.26E+03
   200 P_Mix/Diag. 0.40E+00   11.4   165.15054963      3928.7731404706 -2.73E+02

  Leaving inner SCF loop after reaching   200 steps.

  Electronic density on regular grids:      -2111.9999999764        0.0000000236
  Core density on regular grids:             2111.9999997104       -0.0000002896
  Total charge density on r-space grids:       -0.0000002660
  Total charge density g-space grids:          -0.0000002660

  Overlap energy of the core charge distribution:               0.00003969715808
  Self energy of the core charge distribution:              -9931.63094211405405
  Core Hamiltonian energy:                                   4332.57900394370881
  Hartree energy:                                           10407.72552904863733
  Exchange-correlation energy:                               -879.90049010482494
  Total energy:                                              3928.77314047062464

 *** WARNING in qs_scf.F:479 :: SCF run NOT converged ***

This results are useless, but QMWorks keeps on running because CP2K reports a Normal terminantion message.

the Question is then, who is responsible to terminste the calculation? The programer of the workflow, QMWorks?
Any suggestion about how to implement this?

issue-10

QMWORKS fails with an error similar to:

Internal error encountered. Contact the developers:
<class 'NotImplementedError'> Cannot encode <noodles.files.path.Path object at
0x2aaac81ba6a0>: encoder for type Path is not implemented.

Error raise by noodles serializer

Settings lower case

in the Settings definition in the master branch we are explicitly converting the name to lower case. While in the develop branch something different is going on. Are we going to inforce the conversation to lower-case or the user should be aware of potential pitfalls due to case-insensitive nature of the Settings?
The overlay test is failing in the develop branch because now if a keyword is upper-case in the template and the user redefines it using lower-case the results are two new branches with the same name but different case.

scm-nv / qmflows Goto Github PK

qmflows's People

Contributors

Stargazers

Watchers

Forkers

qmflows's Issues

========== =============

General Information:

Compiling Options:

Languages:

Features:

Recommend Projects

Recommend Topics

Recommend Org