This README also serves as a design document. Any heading marked TODO may as well be wishful thinking.
cmcl is a high-level library for aggregating chemical datasets and computing a variety of features
It includes a simple user interface for aggregating data from computational experiments and committing the results to a local database. It also includes tools for sharing this database with collaborators.
cmcl is built around Pandas. This ensures data is handled efficiently as it is processed through any variety of inquiries, transformations, and mapping operations enabled through the succinct and powerful Pandas api
- [ ] VASP parser
- [ ] sqlite rclone interface
- [ ] cmcl database
- State “NEXT” from “TODO” [2022-02-13 Sun 16:02]
Tabular records of chemical formula and associated observations need only be loaded as a dataframe before cmcl’s formula parser can be used to convert formula strings into equivalent numerical descriptors.
- [X] descriptors->relational queries via pandas
- [ ] descriptors->Composition objects->pymatgen feature descriptors
- [ ] structure object generation and feature calculations
These compositions – themselves a dataframe – can then be further processed into other feature sets using a variety of libraries
- mendeleev cite:&mendeleev2014
- Matminer
- DScribe cite:&himanen-2020-dscrib
- MEGnet cite:&chen-2019-graph-networ
- [ ] compute SLME for photovoltaic absorbers. Uses SL3ME Implementation of cite:&yu-2012-ident-poten by @Idwillia on Github
- [ ]
cmcl exposes itself as a commandline tool for aggregate computational data from VASP and Quantum Espresso experiment directory trees
includes tools for randomly creating formula from a set of rules. Usually better to systematically plan an experiment though
cmcl includes some pretrained models which may be used to infer the properties of chemistries
cmcl is very early in development.
yogi can be installed into a standard python environment. It is a poetry project and may be installed using pip.
proceed to run your python process/jupyter kernel of choice and enjoy.
Yes Please.
To create clean development environment, simply fork/clone the repository and the poetry.lock file will take care of dependency management.
$ cd /to/experiment/dir $ python >>> cmcl aggregate *
For collecting VASP results
use nomad for metadata generation and more?
cmcl will create a local database upon a call to a dataframe’s cmclwrite method.
this database can then be freely populated with dataframes
cmcl also provides a “push” method that allows users to choose a remote host
and share local tables with it. cmcl is of the philosophy that ALL data is good data
so, “pull” is implicit. the database only ever grows. nothing is ever overwritten.
$ rclone sync purduebox:/Mannodi_group_research_material/Perovskite\ Dataset/perovskites.db
cmcl implements OPTIMATE to provide an easy universal query and, where possible, publish option for sharing your data with global platforms
compare model to experimental results for validation
- cite:&almora-2020-devic-perfor meta-analysis of Perovskite PV devices.
- more literature compounds.
- Materials Zone aggregate database.
bibliographystyle:authordate1 bibliography:~/org/bibliotex/bibliotex.bib