Giter VIP home page Giter VIP logo

theochem / iodata Goto Github PK

View Code? Open in Web Editor NEW
118.0 8.0 43.0 7.63 MB

Python library for reading, writing, and converting computational chemistry file formats and generating input files.

Home Page: https://iodata.readthedocs.io/en/latest/index.html

License: GNU Lesser General Public License v3.0

Python 99.89% DIGITAL Command Language 0.11%
quantum-chemistry quantum-chemistry-programs quantum-chemistry-packages computational-chemistry computational-biology computational-physics file-format-converter data-parsing json-schema theoretical-chemistry

iodata's Introduction

IOData

GithubActions Conda Pypi Codecov Version CondaVersion License

About

IOData is a HORTON 3 module for input/output of quantum chemistry file formats. Documentation is here: https://iodata.readthedocs.io/en/latest/index.html

Citation

Please use the following citation in any publication using IOData library:

"IOData: A python library for reading, writing, and converting computational chemistry file formats and generating input files.", T. Verstraelen, W. Adams, L. Pujal, A. Tehrani, B. D. Kelly, L. Macaya, F. Meng, M. Richer, R. Hernandez‐Esparza, X. D. Yang, M. Chan, T. D. Kim, M. Cools‐Ceuppens, V. Chuiko, E. Vohringer‐Martinez,P. W. Ayers, F. Heidar‐Zadeh, J Comput Chem. 2021; 42: 458– 464.

Installation

In anticipation of the 1.0 release of IOData, install the latest git revision as follows:

python -m pip install git+https://github.com/theochem/iodata.git

Add the --user argument if you are not working in a virtual or conda environment. Note that there may be API changes between subsequent revisions.

See https://iodata.readthedocs.io/en/latest/install.html for full details.

iodata's People

Contributors

ali-tehrani avatar alimalek2000 avatar amandadumi avatar bradendkelly avatar evohringer avatar fanwangm avatar farnazh avatar fdroessler avatar kimt33 avatar leila-pujal avatar lmacaya avatar mcoolsce avatar msricher avatar paulwayers avatar peawagon avatar rayhe88 avatar richrick1 avatar sfias avatar shivupa avatar tczorro avatar thomaspigeon avatar tovrstra avatar wilhadams avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

iodata's Issues

Deploy fails

@tovrstra have you seen this error before?

Current tag is: 0.1.1
/home/travis/.rvm/gems/ruby-2.2.7/gems/octokit-4.6.2/lib/octokit/client/releases.rb:86:in `initialize': No such file or directory @ rb_sysopen - dist/iodata-0.1.1.tar.gz (Errno::ENOENT)

It only happens on py27 for some weird reason

Setup `codecov`

We should set up codecov to check the number of lines instead of the percentage.

Move to pkg_resources

Forgot to update iodata to pkg_resources. Still using file.

Maybe even jump to importlib.resources directly.

Drop character from basis ordering conventions for spherical shells.

This refers to the following convention:

iodata/iodata/basis.py

Lines 152 to 162 in 66c0553

{
(0, 'c'): ['1'],
(1, 'c'): ['x', 'y', 'z'],
# alphabetically ordered Cartesian functions
(2, 'c'): ['xx', 'xy', 'xz', 'yy', 'yz', 'zz'],
# or Wikipedia-ordered real solid spherical harmonics
# c = cosine-like
# s = sine-like
(2, 'p'): ['dc2', 'dc1', 'dc0', '-ds1', '-ds2'],
...
}

The convention for the pure functions can be simplified to

(2, 'p'): ['c2', 'c1', 'c0', '-s1', '-s2'],

Add module attributes to file format modules, to generate format docs

At the moment, the following file is hand-edited (and outdated), yet very convenient for users to look up all the features of supported file formats:

https://github.com/theochem/iodata/blob/master/doc/formats.rst

When we add the following attributes to the file format modules, this page can be generated automatically:

  • INTEROPERATION: description of other codes that use this format for input or output.
  • ALWAYS_LOAD: IOData attributes that are always loaded.
  • MAY_LOAD: IOData attributes that are loaded when data is present in file.
  • REQUIRED_DUMP: IOData attributes required for dumping a file.
  • OPTIONAL_DUMP: IOData attributes that can be written to file too, if present.

These (constant) lists can also be used to generate more convenient error messages in the load_one and dump_one functions.

Update installation instructions

The terminology in the install docs is not very clear. A few things to improve:

  • "Installation (from source)" should become "Build conda package (from source)". It should also be specified which source: a source release tar file or github clone, or do both work? One line could be added to explain how to install a locally built package. It is worth explaining that this is the safe alternative to a more conventional from-source installation.

  • "Installation (by hand)" should become "Installation (from source)" and should be a little more elaborate. For the version requirements, we can refer to to the conda build script, rather than reproducing it here. We should also point out the difference between a source release and a git clone. (version.py) We should explicitly mention that this option is something we do not support in case it goes wrong. An installation from source can go wrong in many ways. This is only suitable for devs who know how to fix things when they break.

Naming of density matrices should be simplified

Our naming of density matrix attribute in IOData is too verbose. It may be useful to make a distinction between dm_full_scf, dm_spin_scf, dm_full_post_scf and dm_spin_post_scf. Anything beyond adds more complication without being useful.

Add a few options to iodata-convert

  • -m, --many to convert files with multiple frames.
  • -i to select the input format.
  • -o to select the output format.
  • document the lists of possible input and output formats.

Intuitive representation of molecular orbitals

It's too complicated to make an Orbital instance right now. You need to assign the attributes by hand, which is really clunky. ie

a = Orbitals(*mol.occ_alpha)
a.coeffs[:] = mol.occ_alpha_coeffs
a.energies[:] = mol.occ_alpha_energies
a.occupations[:] = mol.occ_alpha_occs

Instead we should introduce a new parameter, orb_alpha_params (or something along those lines) to be fed into Orbitals in the same way we do GBasis (this also requires an API change to Orbitals within meanfield). Then the API becomes something like

Orbitals(**mol.occ_alpha_params)

ugrid attribute is inconvenient

At the moment data from a cube file is spread over two attributes: cube_data and ugrid, for which there is no real reason. We should either put these in one dictionary or have all different attributes.

File format API cleanups

Some more related suggestions to improve the API:

  • In modules in iodata.formats: load -> load_one, dump -> dump_one, for consistency with iodata.iodata.
  • In modules in iodata.formats: data argument for dump_one functions should be dictionaries to make these modules independent of iodata.iodata.
  • In iodata.iodata: add option to select file format. This is needed because different file formats sometimes use the same extension.

API Changes

Deprecations:

  • LockedH5 and the internal H5 file format have been removed.

Parameters:

  • The vasp dump_poscar routine now requires data to contain an attribute cell_frac which contains the return value of the cell.to_frac function.

Return values:

  • The obasis attribute no longer contains an instance of obasis. Rather it contains a dict of all elements necessary to instantiate GOBasis. Simply pass it via GObasis(*obasis.values())
  • The orb_alpha attribute no longer contains an Orbital instance. It has been replaced by four attributes:
    • orb_alpha: a tuple of (nbasis, nfn). Sufficient to instantiate a Orbital class.
    • orb_alpha_coeffs: formerly Orbitals.coefficients
    • orb_alpha_energies: formerly Orbitals.energies
    • orb_alpha_occs: formerly Orbitals.occs
  • The cell attribute is no longer a Cell instance. Rather it contains a numpy array rvecs which is sufficient to instantiate a Cell instance.
  • The grid attribute no longer contains an UniformGrid instance. It now contains a dict sufficient to instantiate a UniformGrid instance. It can be used via UniformGrid(*grid.values())
  • Vasp IOData instances now require a gvecs attribute to write the coordinates.

Non-API breaking changes:

  • The molden coefficient normalization now uses internal code to calculate the overlap integrals. The dependency on gbasis has been removed.

Make it possible to pass options to load and dump functions

Adding arbitrary optional arguments tp load and dump would make it possible to deal with variations of file formats, e.g. extra columns in XYZ files. Another use case can be found in #117 .

We could add **kwargs to the load_* and dump_* functions and pass them on to corresponding functions in the file format modules.

@FarnazH Do you know of other relevant use cases?

Add convenience properties to `MolecularOrbitals`: `coeffs_a`, ...

It may be useful to make coeffs_a and coeffs_b properties of the MolecularOrbitals class. Similarly, one could make energies_a, energies_b, occs_a and occs_b.

All of these should return None on the generalized case. In case of restricted, coeffs_b and energies_b should be None, and elements of occs_a and occs_b should be at most one

Originally posted by @tovrstra in #76

Spherical contraction ordering documentation

I can't seem to figure out what the spherical contraction ordering is. The docstring in MolecularBasis says "Wikipedia-ordered real solid spherical harmonic", but a quick look at
https://en.wikipedia.org/wiki/Solid_harmonics and https://en.wikipedia.org/wiki/Spherical_harmonics and https://en.wikipedia.org/wiki/Atomic_orbital makes no mention of dc2, dc1, dc0, -ds1, -ds2, c2, c1, c0, -s1, -s2, s1, or s2. Am I supposed to read this by removing the d, c, and s characters and read the resulting integers as m value?

I remember talking with @tovrstra about this a while back but I can't seem to remember what these mean. Could the docstrings be updated to either explicitly state their forms or point towards a reference that does it?

Overlap Helper Module

The iodata.overlap_helper.py contains a lot of pre-computed pure/cartesian transformation matrices generated by horton.tools.harmonics.py. The iodata.overlap_helper.py doesn't seem clear and nice, so I was wondering why we didn't include the functions generating them directly?

Do something about slow test: test_carbon_gs_ae_uncontracted

This test is slow due to a combination of factors: coverage analysis, lots of computation in one test and slow implementation of the overlap integrals. We should look into options for speeding this one up, because it slows down the typical development cycle a lot.

Use atgradient instead of atforces

The two relevant formats with this information (fchk and wfx) use gradient instead of forces. Our goal is just to represent the data in such files as closely as possible, such that gradients would be a better fit.

Write Molden files in double precision

The molden format does not specify how floats should be written. We can write orbital energies, occupation numbers and orbital coefficients in double precision scientific notation "%.15e".

Infrastructure for making input files for QC codes

We should have functions to easily write input files for a few quantum chemistry codes of interest (Gaussian, PSI4, ORCA). Suggested API:

from string import Template

# This is just a bare example template, probably lots of features can be added.
default_orca_template = """\
! ${lot} ${basis_name} ${runtype}

*xyz ${charge} ${spinmult}
${atom_lines}
*
"""

def write_input_orca(filename, iodata, template=None):
    """TODO docstring."""
    # some code here to translate iodata attributes to specific names and commands
    # of the QC code. At least the following should be present
    fields = vars(iodata)
    # which can be followed by some code that edits the fields dictionary.
    # For example, you may want to add an 'atom_lines' field that contains a mult-line
    # string with atomic elements, atomic coordinates, and possibly other things like
    # ghost-atom modifiers, etc.
    # Just make sure the values of the fields dictionary are not modified in-place.
    # In some cases, you want to use part of the filename as a field, e.g. to construct
    # the filename of the chk, molden or wfn file.
    # Finally, call a generic routine.
    if template is None:
        template = default_orca_template
    _write_input_generic(filename, fields, template)

def _write_input_generic(filename, fields, template):
    """Write a QC input file.
    
    Parameters
    ----------
    filename
        Filename of the input file to be written.
    fields
        Dictionary of parameters that could potentially be written into the input file.
    template
        A template string.
    """
    with open(filename, 'w') as f:
        f.write(Template(template).substitute(fields))

Required attributes for the IOData class are defined in #41.

Some remotely related code can be found here: https://github.com/theochem/horton/blob/master/horton/scripts/atomdb.py

Clean up iodata-convert script and use setuptools entrypoint.

This is just a reminder that it needs to be fixed. Entry points are more robust and make it easy to add unit tests for all code in the CLI scripts. While doing this, also fix the following:

  • move the exception handler for importing __version__ to the top of the module.
  • refresh the argparse strings. some of it is outdated.

Try sphinxcontrib-apidoc

This would allow us to just run sphinx-build without first running sphinx-apidoc. It allows sphinx-apidoc to be configured in doc/conf.py.

Add load & dump functions for atomic basis set definitions

This code was formerly present in HORTON2 in the module horton.gbasis.iobas. See https://github.com/theochem/horton/blob/master/horton/gbasis/iobas.py

The details should still be worked out:

  • Should we store such data also in the IOData object, or in its own class? In the latter case, how can we avoid code redundancy?
  • Which formats to support. Probably NWChem and Gaussian94 are sufficient?

Once this is in place, we it would require little code to construct molecular basis sets within IOData.

Get rid of `# pylint: disable=no-member`

When this is implemented, we should also find a way to get rid of the # pylint: disable=no-member lines in most unit tests. This pylint warning is caused by the rather sloppy design of the IOData class. (Attributes are not initialized in __init__.)

To solve the no-member issue, one should in principle just initialize all attributes in the constructor __init__, but this would get very tedious for IOData. Some alternatives are discussed in the documentation of the attr package: https://www.attrs.org/en/stable/why.html This overview is mainly intended to advocate the usage of the attr package, so critical reading is advised. Still attr offers a lot of nice features for the IOData class, including:

Every file format should become a module with consistent function & attribute names

We should remove all file-format specific information and code in iodata/iodata.py because it is fragile. In the documentation of the file formats, doc/tech_ref_file_formats.rst, there is an even worse duplication of information, which is hard to maintain.

These issues can be solved by representing each file format by an object with:

  • name string with a human-readable name of the file format
  • load method (accepts file or filename) or None
  • dump method (accepts file or filename) or None
  • pattern an fnmatch pattern, e.g. to recognize an extension
  • read_always: list of names that are always read
  • read_optional: list of names that may be read
  • write_always: list of names required when writing a file
  • write_optional: list of names that can be stored when writing a file
  • ao_index_info: names of arrays whose indexes correspond to AOs, needed for fixing order and sign conventions. e.g. {'dm_full': (0, 1), 'orbitals': (0,)}
  • programs: list of known other programs that can read or write this file format

Intuitive representation of molecular basis set

Our current representation of molecular Gaussian basis sets is not ideal:

  • not very intuitive
  • not very pythonic
  • mainly geared toward the FCHK format, but it requires non-intuitive manipulations for all other formats.

The Gaussian orbital basis can be described by a single attribute:

obasis of the type namedtuple('OBasisInfo', ['centers', 'shells', 'type', 'conventions'])

where:

  • centers is an array with of shape (ncenter, 3)
  • shells is a list of objects of the type namedtuple('Shell', ['icenter', 'iatom', 'angmoms', 'exponents', 'contractions'])
  • type: any of 'cart' or 'pure'
  • conventions: a dictionary with as key an angular momentum character ('s', 'p', ...) and as value a list of basis function strings, e.g.
# alphabetically ordered Cartesian functions
['1'],
['x', 'y', 'z'],
['xx', 'xy', 'xz', 'yy', 'yz', 'zz'],
# wikipedia ordered real solid spherical harmonics
['s'],
['pc1', 'pc0', 'ps1],
['dc2', 'dc1', 'dc0', 'ds1', 'ds2'],

Any of these strings can be prefixed with a minus sign (-) to denote sign conventions.

where:

  • icenter: an integer referring to a row of centers
  • iatom: an integer for the atom on which the shell is centered, or None. This could be convenient when implementing Pulay forces.
  • angmoms: a string where each character represents an angular momentum of one of the contractions in the shell. The length equals the number of contractions: len(angmoms)=ncon. Any of the following can be used: ["s", "p", "d", "f", "g", "h", "i", "k", "l", "m", "n", "o", "q", "r", "t", "u", "v", "w", "x", "y", "z", "a", "b", "c", "e"]
  • exponents: an array of exponents of primitives, with shape (nprim,).
  • contractions: an array with contraction coefficients, with shape (ncon, nprim). These coefficients assume that the primitives are L2-normalized, but contractions are not necessarily normalized.

Remarks:

  • A main downside of this idea is that IOData objects will become harder to store as npz files.
  • Attributes permutation and signs can be removed and should be replaced by convenience functions.

Example of an iterator over all basis functions:

def iterate_basis(obasis):
    for shell in obasis.shells:
        for angmom in shell.angmoms:
            for convention in obasis.conventions[angmom]:
                yield shell.icenter, shell.iatom, angmom, convention

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.