doyle-lab-ucla / auto-qchem Goto Github PK

View Code? Open in Web Editor NEW

77.0 5.0 14.0 9.34 MB

Auto-QChem is an automated workflow for the generation and storage of DFT calculations for organic molecules.

Home Page: https://doyle-lab-ucla.github.io/auto-qchem/

License: GNU General Public License v3.0

Python 99.71% Shell 0.21% JavaScript 0.08%

cheminformatics dft-calculations automation feature machine-learning

auto-qchem's Introduction

Quick links

Installation instructions

Database user instructions

Code base documentation

Repo with example jupyter notebooks

Database link

Auto-QChem paper (publisher), free pdf

Note to external user

Auto-QChem currently supports Slurm scheduler at Princeton via slurm_manager.py; and SGE/UGE-type scheduler at UCLA via sge_manager.py.

If you are an external user and your computational cluster uses either slurm or sge scheduler, you just need to make some minor changes to either .py files to make sure you can log in with credentials at your institution.

If you are an external user and your computational cluster uses other schedulers, you can still modify either .py files to adapt to your cluster, but significant changes might be required and we unfortunatey won't be able to help without access to your cluster.

auto-qchem's People

Contributors

Stargazers

Watchers

Forkers

andrewolal flyingdorothia aspirincode jugoetz beef-broccoli yujikaiya rnaimehaom cvalsecchi adrianm0 turkialturaifi catsci dkesada tjxj yinpokwong

auto-qchem's Issues

Consider adding a description to the repo and some tags

Not something I can add as a PR, but e.g. "Machine learning tools for the automated computation of chemical, thermochemical, and steric features of molecules" (adapted the snippet from https://github.com/b-shields/auto-QChem)

For tags, maybe add:

https://github.com/topics/cheminformatics

Random, for extra visibility you could also submit a PR to https://github.com/hsiaoyi0504/awesome-cheminformatics

How to do quantum computation when there are 3 or more non-bonded fragments?

Through molecule.py, it is possible to count up to 2 or less non-bonded fragments.

For molecules with 3 or more non-bonded fragments, such as K2CO3, how can we do quantum calculations?

mongoDB multiple client connections for a single execution

Looks like this is the case, which might have caused app to regularly crash on AWS. Will try to enforce a single client that handle all DB operations

jobs

How to compute dft on computing cluster?

When I run sm.submit_jobs() in Tutorial_creating_descriptor_sets.ipynb file

An error like the message below occurs.

**UnexpectedExit: Encountered a bad command exit code!

Command: 'cd /scratch/a1422a01/gaussian && sbatch /scratch/a1422a01/gaussian/C15H24N4O6S2_e5f2_conf_0.sh'

Exit code: 1

Stdout:

Stderr:

sbatch: error: Batch job submission failed: Unspecified error**

As a result of checking the computing cluster,

Two files were created.

C15H24N4O6S2_e5f2_conf_0.gif
C15H24N4O6S2_e5f2_conf_0.sh

As a result of checking the contents of the ‘C15H24N4O6S2_e5f2_conf_0.sh’ file

A file was created with the following contents.

#!/bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=8
#SBATCH -t 23:59:00
#SBATCH --constraint="skylake"

input=C15H24N4O6S2_e5f2_conf_0

#run the code
cd /scratch/a1422a01/gaussian
g16 ${input}.gjf

I want the contents of the C15H24N4O6S2_e5f2_conf_0.sh file to be as follows.

#!/bin/sh
#SBATCH -J test
#SBATCH -p ivy_V100_2 # Neurons use tesla_node partition.
#SBATCH -N 1
#SBATCH -n 20
#SBATCH -o gautest.o%j # Define the filename of the standard output
#SBATCH -e gautest.e%j # Define filename for standard error
#SBATCH -t 12:00:00
#SBATCH --gres=gpu:2
#SBATCH --comment gaussian
module load gaussian/g16

input=C15H24N4O6S2_e5f2_conf_0

#run the code
cd /scratch/a1422a01/gaussian
g16 ${input}.gjf

I would like some help on how to modify the code in slurm_manager.py .

I tried modifying the latter part of the slurm_manager.py file, but didn't get any good results.

I need your help.

how to get all features like your previous paper provided

Our lab has Gaussian 16 software. So I have tried to run opt freq TD calculations according to the Gaussian gjf file settings you provided (basically just changing the molecule, the settings remain unchanged).

At present, I have used the gaussian_log_extractor you provided to successfully obtain a semi-finished feature.

I would like to ask, what else needs to be done to convert it to a complete feature dataframe(or csv)?

I want to have the molecule feature like this file:
https://github.com/b-shields/edbo/blob/master/experiments/data/aryl_amination/aryl_halide_dft.csv

My Current Result: A dict like below

{
    'descriptors': {'number_of_atoms': 12, 'charge': 0, 'multiplicity': 1......},
    'atom_descriptors': {'X': [-2.241815, -0.719692, 0.0, 1.460496, 2.......},
    'modes': {'Frequencies': [99.7815, 166.9938, 244.151, 347.9461, 393.1......},
    'mode_vectors': {'mode_number': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,......},
    'transitions': {'ES_transition': [161.05, 154.99, 154.72, 134.54, 132.51, 127.49, 127.4......},
    'labels': ['C', 'C', 'C', 'C', 'N', 'H', 'H', 'H', 'H', 'H', 'H', 'H']
}

Previous paper by your group: Bayesian reaction optimization as a tool for chemical synthesis
Corresponding Github: https://github.com/b-shields/edbo

Thank you for your kind assistance.

Slurm manager timeout error

When running the code of the Tutorial_creating_descriptor_sets.ipynb notebook in jupyter-notebooks, the "ServerSelectionTimeoutError" is raised.

I apologize in advance, I am a chemist with little programming experience, so if more detail is needed, please let me know.

DB Interface: downloading descriptors downloads all molecules in database

After searching for a molecule, using the download descriptors page results in downloading an Excel file containing 17275 molecules, and not just the one searched for.

dependency issue that involves python 3.7, openbabel 2.4 and numpy 1.21

vulnerability issue with numpy 1.21, 1.22 recommended, which requires python >=3.8

however, openbabel 2.4 seems to be more stable than openbabel 3.x. openbabel 2.4 appears to work only with python <= 3.7

Set python to >= 3.7 for now. In the future, might update to python >=3.8, which requires the use of openbabel 3.x, which needs more testing on stability.

DB Descriptors: duplicate molecules results in sum of molecule descriptors not Boltzmann average

When duplicate molecules exist, downloading Boltzmann average descriptors results in the sum of both molecule's descriptors being downloaded as a single molecule.

For example, a search for triethylamine SMILES: CCN(CC)CC results in two molecules:
https://autoqchem.org/?tag=ALL&solvent=ALL&functional=ALL&basis_set=ALL&substructure=&smiles=CCN%28CC%29CC

Downloading the global descriptors of these as Boltzmann average gives a single molecule where the descriptors are the sum of both molecules rather than an averaged value. For example: number_of_atoms is 44, not 22, E is -584.151 the sum of -291.95 and -292.2.

Please feel free to correct me if my usage is wrong or this is expected functionality. Fantastic work by the way!

Unable to connect to the server

To whom it may concerns,

Hi! I'm a master student from Imperial College London doing a research project involving auto-qchem. I have installed the autoqchem environment to my laptop and so on but unable to connect to your server using 'sm=slurm_manager(user='zuranski', host='della.princeton.edu').' Do your still allow external users? Do I need to do something else to connect?
This is the link of the error video:
https://drive.google.com/file/d/1wgDmkRbFidGKhxFOaYcYcaBwx_IvjG6g/view?usp=share_link

Sincerely,
Chenlin Cai
01708103

Error caused by capital letter (Density/density))

when using get_descriptors(), the error below ocurred.

--> 258 string = re.search("Population.?SCF density.?(\sAlph.?)\n\sCondensed", text, re.DOTALL).group(1)
259 if self.descriptors['multiplicity'] == 1:
260 energies = [re.findall(f"({float_or_int_regex})", s_part) for s_part in string.split("Alpha virt.", 1)]

AttributeError: 'NoneType' object has no attribute 'group'

In the output report of gaussian in our environment (Windows, G16) , the target text is below.

Population analysis using the SCF Density

So, I changed the code

string = re.search("Population.*?SCF [Dd]ensity.*?(\sAlph.*?)\n\s*Condensed", text, re.DOTALL).group(1)

then, error was fixed.

Abnormal molecule structure

I installed auto-qchem according to installation instructions (Windows), and run example notebook "framework_functionality_test.ipynb", but generated chemical structure is highly distorted.

Is it common error when using Openbabel for molecular conformation generaion??
(I do not have experience using Openbabel for such purpose.)

How can I fix it?? (Installing different version of OpenBabel or using Linux instead of Windows?)

Here is my environment.

openbabel v2.4.1
Python 3.7

autoqchem.openbabel_functions

Hi,

I am trying to run Fetch log files but I get the following error message:

NameError Traceback (most recent call last)
in
1 # add fs_name to the table
----> 2 mol = input_to_OBMol(can, "string", "smi")
3 mol_df['file_base_name'] = mol_df['can'].map(mol_fs_name)

NameError: name 'input_to_OBMol' is not defined

I am checking in your Github but I can not find the file openbabel_functions. How can I fix this?

Thanks

Gaussian inputs for differents computational parameters have same name

Hi,
When trying to generate molecular descriptors for molecules in different solvents, the input generator generates gaussian inputs with a name that is the inchikey and the conformation index. Thus, when trying to launch computations of the same molecules within different solvents the input files are overwritten which causes a problem in the retrieving and uploading processes. I guess it may be possible to add solvent or DFT level information to the input name to avoid this issue but I am not sure if this would cause a problem in the job_retrieving process.
All my best,
Jules

Known bug for v1.2.6

bad port number for db connection; change to int.
Did not fix because the transition to Python 3.8 and auto-qchem 1.3.1, but in case older versions are needed this will need to be addressed at helper_classes.py

500 Internal Server Error on Database Website

When visiting https://autoqchem.org/ the navbar is displayed but no content on the site appears.

Checking the console log shows this:

500 Internal Server ErrorInternal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
message: "Callback error updating page-content.children"