Comments (4)
@fwmeng88 actually, I was just needing to get some geometry information out of the Gaussian log file, so it would be very useful to parse more information out of the log file. As far as I can tell, we don't have any unmerged implementation, so please feel free to share your work.
from iodata.
Thanks for letting me know. @FarnazH
I have a working script for now, but I will refactor it and make a PR. The plan is to put all the information into extra
section in the dictionary.
Here is the script that I wrote,
import re
import pandas as pd
from iodata.utils import LineIterator
__author__ = "Fanwang Meng @ Ayers Lab"
__date__ = "2021.April.24"
__version__ = "0.0.2"
def extract_qm_log(log_fpath,
tag=None,
output_fname=None):
"""Extract quantum chemical descriptors from Gaussian optimization log file."""
lit = LineIterator(log_fpath)
data_dict = {}
electro_spat_ext = []
nuclear_repulsion_energies = []
# R6Disp: Grimme-D2 Dispersion energy
dispersion_energies = []
# nuclear repulsion after empirical dispersion term
nuclear_repulsion_dispersion = []
while True:
try:
line = next(lit).strip()
except StopIteration:
break
# dipole moment
if line.startswith("Dipole moment"):
line = next(lit).strip()
dipole_list = line.split()
data_dict["dipole_x"] = float(dipole_list[1])
data_dict["dipole_y"] = float(dipole_list[3])
data_dict["dipole_z"] = float(dipole_list[5])
data_dict["dipole_total"] = float(dipole_list[7])
# quadrupole moment
elif line.startswith("Quadrupole moment"):
line = next(lit).strip()
quadropole_list = line.split()
data_dict["quadropole_xx"] = float(quadropole_list[1])
data_dict["quadropole_yy"] = float(quadropole_list[3])
data_dict["quadropole_zz"] = float(quadropole_list[5])
line = next(lit).strip()
quadropole_list = line.split()
data_dict["quadropole_xy"] = float(quadropole_list[1])
data_dict["quadropole_xz"] = float(quadropole_list[3])
data_dict["quadropole_yz"] = float(quadropole_list[5])
# electronic spatial extent (au)
elif line.startswith("Electronic spatial extent"):
electro_spat_ext.append(float(line.split()[-1]))
# this is used to reset the list to be empty to store the last record
elif line.startswith("Population analysis using the SCF Density"):
alpha_occ_eigenvalues = []
alpha_virt_eigenvalues = []
# The last value in the Alphha Occ. eigenvalues gives the HOMO energy and the first
# value in the Alpha Virt. eigenfunction gives LUMO energy.
# HOMO
elif line.startswith("Alpha occ. eigenvalues --"):
alpha_occ_eigenvalues.extend(line.split()[4:])
# LUMO
elif line.startswith("Alpha virt. eigenvalues --"):
alpha_virt_eigenvalues.extend(line.split()[4:])
# rotational constants
elif line.startswith("Rotational constants (GHZ)"):
rotational_constants = line.split()[3:]
data_dict["rot_const_x"] = float(rotational_constants[0])
data_dict["rot_const_y"] = float(rotational_constants[1])
data_dict["rot_const_z"] = float(rotational_constants[2])
# symmetry point group
elif line.startswith("Full point group"):
data_dict["point_group"] = line.split()[3]
# nuclear repulsion energy in Hartrees
elif line.startswith("nuclear repulsion energy"):
nuclear_repulsion_energies.append(line.split()[-2])
# R6Disp: Grimme-D2 Dispersion energy in Hartrees
elif line.startswith("R6Disp: Grimme-D2 Dispersion energy"):
dispersion_energies.append(line.split()[-2])
# nuclear repulsion after empirical dispersion term
elif line.startswith("Nuclear repulsion after empirical dispersion term"):
nuclear_repulsion_dispersion.append(line.split()[-2])
# PCM non-electrostatic energy
elif line.startswith("PCM non-electrostatic energy"):
data_dict["PCM_non_electrostatic_energy"] = float(line.split()[-2])
# nuclear repulsion after PCM non-electrostatic terms
elif line.startswith("Nuclear repulsion after PCM non-electrostatic terms"):
data_dict["nuclear_repulsion_after_pcm"] = float(line.split()[-2])
# KE, PE and EE
elif line.startswith("KE="):
data_dict["KE"] = float(line.split()[1].replace("D", "e"))
data_dict["PE"] = float(line.split()[2].split("=")[-1].replace("D", "e"))
data_dict["EE"] = float(line.split()[-1].replace("D", "e"))
# SMD-CDS (non-electrostatic) energy, kcal/mol
elif line.startswith("SMD-CDS (non-electrostatic) energy"):
data_dict["SMD-CDS"] = float(line.split()[-1])
# GePol: Number of generator spheres
elif line.startswith("GePol: Number of generator spheres"):
data_dict["GePol_num_gen_spheres"] = int(line.split()[-1])
# GePol: Total number of spheres
elif line.startswith("GePol: Total number of spheres "):
data_dict["GePol_total_num_spheres"] = int(line.split()[-1])
# GePol: Number of exposed spheres
elif line.startswith("GePol: Number of exposed spheres"):
data_dict["GePol_num_exposed_spheres"] = int(re.split("=|\(", line)[1])
# GePol: Number of points
elif line.startswith("GePol: Number of points ="):
data_dict["GePol_num_points"] = int(line.split()[-1])
# GePol: Average weight of points
elif line.startswith("GePol: Average weight of points"):
data_dict["GePol_average_weight"] = float(line.split()[-1])
# GePol: Minimum weight of points
elif line.startswith("GePol: Minimum weight of points"):
data_dict["GePol_minimum_weight"] = float(line.split()[-1].replace("D", "e"))
# GePol: Minimum weight of points
elif line.startswith("GePol: Maximum weight of points"):
data_dict["GePol_maximum_weight"] = float(line.split()[-1].replace("D", "e"))
# GePol: Number of points with low weight
elif line.startswith("GePol: Number of points with low weight"):
data_dict["GePol_num_points_low_weight"] = int(line.split()[-1])
# GePol: Fraction of low-weight points (<1% of avg)
elif line.startswith("GePol: Fraction of low-weight"):
data_dict["GePol_frac_low_weight"] = float(line.split()[-1].strip('%')) / 100
# GePol: Cavity surface area, ang**2
elif line.startswith("GePol: Cavity surface area"):
data_dict["GePol_cavity_surface"] = float(line.split()[-2])
# GePol: Cavity volume, ang ** 3
elif line.startswith("GePol: Cavity volume"):
data_dict["GePol_cavity_volume"] = float(line.split()[-2])
data_dict["electro_spat_ext"] = electro_spat_ext[-1]
data_dict["HOMO"] = float(alpha_occ_eigenvalues[-1])
data_dict["LUMO"] = float(alpha_virt_eigenvalues[0])
data_dict["grimme_D2_dispersion_energy"] = float(dispersion_energies[-1])
data_dict["nuclear_repulsion_energy"] = float(nuclear_repulsion_energies[-1])
data_dict["nuclear_repulsion_dispersion"] = float(nuclear_repulsion_dispersion[-1])
if tag is not None:
data_dict = {k + "_" + tag: v for (k, v) in data_dict.items()}
df = pd.DataFrame(data_dict, index=[0])
if output_fname:
if output_fname.endswith(".csv"):
df.to_csv(output_fname, sep=",", index=None)
elif output_fname.endswith(".xlsx") or output_fname.endswith(".xls"):
df.to_excel(output_fname, index=None)
return data_dict, df
from iodata.
@fwmeng88 This would be a welcome addition!
I have a small question, bu it is likely not an issue: long chained elif
statements can become slow because each line is compared in all elif
statements. I'm actually not sure how to do it in any faster way. Dictionary lookups could be faster to select the right code for parsing a line, but one would need to know which part of the line to use, e.g.
def func1(line):
# do something with line
...
def func2(line):
# do something different with line
...
funcs = {"begin1": func1, "begin2": func2}
funcs[line[:6]](line)
This would not have a cost that scales linearly with the number of such functions.
The problem with my suggestion is that I don't see how to combine it with line.startswith
.
from iodata.
Thanks for the suggestions. @tovrstra
According to my usage experience, this parsing is fast, ~5-10 seconds. I think I am just going to follow this style but can fix it when it becomes a bottleneck.
from iodata.
Related Issues (20)
- PDB load_one issue with atom type CL HOT 5
- Compute electronic energy/gradient(force) in IOData
- Input writers for other quantum chemistry software HOT 2
- Install issue on Windows HOT 2
- Can I trust IOData for handling molden files generated from PySCF HOT 2
- Rename some fields read from PDB, to be more in line with PDB conventions HOT 3
- AttributeError: module 'numpy' has no attribute 'int'
- Scipy Factorial2 change HOT 4
- Issues related to factorial2 function HOT 2
- Can't install on macos with M1 HOT 3
- Computing Center of Mass HOT 3
- Support GPAW HOT 2
- Python 3.9 Numpy 1.20 Depreciation: np.int, np.float HOT 7
- Fix Factorial2 HOT 4
- Switch to more modern project structure with GitHub Actions HOT 5
- NumPy 2 compatibility
- fchk to FCIDUMP (with gbasis) HOT 5
- JSON QCSchema update HOT 1
- Runtime Type check bug in utils.py HOT 6
- Rename functions in `iodata.formats` submodules (?)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iodata.