materialsproject / emmet Goto Github PK
View Code? Open in Web Editor NEWBe a master builder of databases of material properties. Avoid the Kragle.
Home Page: https://materialsproject.github.io/emmet/
License: Other
Be a master builder of databases of material properties. Avoid the Kragle.
Home Page: https://materialsproject.github.io/emmet/
License: Other
As mentioned in the email thread, the recent database updates has caused SEVERE issues with Co materials. E.g., Layered LiCoO2 is 200 meV/atom above hull. @computron and I have debugged and it is clear that this is because:
You can prove this is the case by searching for LiCoO2 in both prev.materialsproject.org and www.materialsproject.org.
I recommend the following corrective steps:
This is required due to some known bad calculations, e.g. task mp-1771082.
e.g. if the task has a structure with any abs(Cr moment) > 5
/materials/formula_autocomplete
builtins.IndexError: tuple index out of range
Traceback (most recent call last):
File "/root/.local/lib/python3.9/site-packages/ddtrace/contrib/asgi/middleware.py", line 173, in __call__
return await self.app(scope, receive, wrapped_send)
File "/root/.local/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
raise exc
File "/root/.local/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
await self.app(scope, receive, sender)
File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 656, in __call__
await route.handle(scope, receive, send)
File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 259, in handle
await self.app(scope, receive, send)
File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 61, in app
response = await func(request)
File "/root/.local/lib/python3.9/site-packages/fastapi/routing.py", line 216, in app
solved_result = await solve_dependencies(
File "/root/.local/lib/python3.9/site-packages/fastapi/dependencies/utils.py", line 527, in solve_dependencies
solved = await run_in_threadpool(call, **sub_values)
File "/root/.local/lib/python3.9/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/root/.local/lib/python3.9/site-packages/anyio/to_thread.py", line 28, in run_sync
return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
File "/root/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
File "/root/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 754, in run
result = context.run(func, *args)
File "/emmet-api/emmet/api/routes/materials/query_operators.py", line 360, in query
comp_red = comp.reduced_composition.items()
File "/root/.local/lib/python3.9/site-packages/pymatgen/core/composition.py", line 349, in reduced_composition
return self.get_reduced_composition_and_factor()[0]
File "/root/.local/lib/python3.9/site-packages/pymatgen/core/composition.py", line 359, in get_reduced_composition_and_factor
factor = self.get_reduced_formula_and_factor()[1]
File "/root/.local/lib/python3.9/site-packages/pymatgen/core/composition.py", line 384, in get_reduced_formula_and_factor
(formula, factor) = reduce_formula(d, iupac_ordering=iupac_ordering)
File "/root/.local/lib/python3.9/site-packages/pymatgen/core/composition.py", line 1183, in reduce_formula
factor = abs(gcd(*(int(i) for i in sym_amt.values())))
File "/root/.local/lib/python3.9/site-packages/monty/fractions.py", line 17, in gcd
n = numbers[0]
IndexError: tuple index out of range
The MP DFT mixing scheme requires all ComputedEntry
to have unique entry_id
in order to process them. Currently the entry_id
is set equal to the material_id
, which means that, for example, a GGA and a R2SCAN calculation for a particular material will have the same entry_id
. It might be helpful to store some type of modified mpid such as mp-1234-GGA
in entry.data to facilitate use of the mixing scheme in ThermoBuilder
. Alternatively we could use these suffixed mpids as the entry_id
in entries.< functional>
by default, without changing the material_id
in the task document.
Flagging @arosen93 and @munrojm in case they have thoughts.
Hi MP devs, would there be any interest in exposing (presumably PBE) formation energies and hull distances via the MP OPTIMADE API? This would involve adding custom _mp_hull_distance
(or whatever) fields and listing them in the config of your OPTIMADE server (which I guess is exterior to this repo).
OQMD currently provide this data with the _oqmd_stability
field, which is very useful when using their OPTIMADE API as part of an experimental workflow, e.g., automated XRD refinement. Eventually, it would be great to get the big DFT databases to agree on a standard prefix for this kind of data so that cross-database queries for proposed stable materials can be performed.
Cheers!
/tasks/trajectory/mp-1354331/
builtins.ValueError: Out of range float values are not JSON compliant
Traceback (most recent call last):
File "/root/.local/lib/python3.9/site-packages/ddtrace/contrib/asgi/middleware.py", line 173, in __call__
return await self.app(scope, receive, wrapped_send)
File "/root/.local/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
raise exc
File "/root/.local/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
await self.app(scope, receive, sender)
File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 656, in __call__
await route.handle(scope, receive, send)
File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 259, in handle
await self.app(scope, receive, send)
File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 61, in app
response = await func(request)
File "/root/.local/lib/python3.9/site-packages/fastapi/routing.py", line 250, in app
response = actual_response_class(response_data, **response_args)
File "/root/.local/lib/python3.9/site-packages/starlette/responses.py", line 49, in __init__
self.body = self.render(content)
File "/root/.local/lib/python3.9/site-packages/starlette/responses.py", line 174, in render
return json.dumps(
File "/usr/local/lib/python3.9/json/__init__.py", line 234, in dumps
return cls(
File "/usr/local/lib/python3.9/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/lib/python3.9/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
ValueError: Out of range float values are not JSON compliant
Even if a material does not have optional properties, this builder should set the has
field in the target store to an empy list.
Rather than trying to manually document them (which is proving time-consuming), I propose including 1 sample output document as JSON for each store used by the builders. I think it'll be a lot easier for debugging and for writing new builders.
As title says, most likely in the aggregation step (fitting is unit tested).
My task collection includes valid task documents generated by different codes (specifically, VASP and Qchem). If I have both VASP and Qchem task docs for materials with the same formula (in this case, Cl), MaterialsBuilder
fails to build a materials document.
The failure is apparently a result of a KeyError, since Qchem task docs do not contain an orig_inputs
key. Once this error is encountered, it seems to prevent .get_items()
from finding any new materials to update.
2021-02-11 14:44:09,466 - MaterialsBuilder - ERROR - 'orig_inputs'
2021-02-11 14:44:09,466 - MaterialsBuilder - INFO - No items to update
I admit this is probably a corner case, but I think it's important that we make MaterialsBuilder
as robust as possible.
This is really a validation issue. MaterialsBuilder
relies on the task_types
collection to determine which tasks are valid, but 'valid' only means valid within a particular code. It seems like we need some additional information in task_types
to indicate what code generated the task doc in the first place. Then MaterialsBuilder
could just filter for VASP docs.
MaterialsBuilder
ignores invalid task documents, or task documents of the wrong type, and builds materials documents from the valid (VASP) task documents.
MaterialsBuilder
does not generate any materials docs for the material.
Should make a builder to back out more material metadata:
1.) Find materials that differ by one missing atom
2.) Find materials that are frameworks - IE electrodes
3.) Find materials that are substitutions
4.) Find materials related by a doping transformation
Need to implement a Robocrystallographer document and builder
I patched the production collection to include an nkpts
field in each document, which is expected for the "# of K-points" cell of the "Structure Optimization" subsection of the "Calculation Summary" section on a material detail page (see screenshot below).
My procedure for the patch is shown below to indicate how I derived nkpts
from currently-built fields.
docs = list(db.materials.find({}, ["task_id", "blessed_tasks"]))
for d in docs:
d["opt_id"] = d["blessed_tasks"][next(k for k in d["blessed_tasks"]
if "Optimization" in k)]
rv = list(db.tasks.aggregate([
{"$match": {"task_id": {"$in": [d["opt_id"] for d in docs]}}},
{"$project": {
"task_id": 1,
"firstcalc_kpts": {
"$arrayElemAt": [
"$calcs_reversed.input.kpoints",
-1]
}
}},
{"$project": {
"task_id": 1,
"nkpts": {"$size": "$firstcalc_kpts.actual_points"}}},
]))
rv_map = {d["task_id"]: d["nkpts"] for d in rv}
from pymongo import UpdateOne
requests = []
for d in docs:
requests.append(UpdateOne(
{"task_ids": d["opt_id"]},
{"$set": {"nkpts": rv_map[d["opt_id"]]}}
))
db_rw.materials.bulk_write(requests, ordered=False)
Atomate drone is designed to run after a success RunVasp firetask so it misses edge cases where calculations are failed. We need to account for this maybe in a EmmetVaspDrone?
Could contain both the material id and the blessed task id for provenance.
We should rotate the crystal to the "standard" convention in the website builder. Spglib already has this but we may need to add it to SpaceGroupAnalyzer
https://discuss.matsci.org/t/space-group-pbnm-pnma-62-bug-or-not/3301
Sample code provided below. This is not meant to provide any guarantees but might be a useful as a first pass.
from atomate.vasp.drones import VaspDrone
from emmet.core.vasp.task import TaskDocument
from emmet.core.vasp.validation import ValidationDoc
from warnings import warn
from pymatgen.entries.compatibility import MaterialsProject2020Compatibility
def is_path_valid(path) -> bool:
"""
If True, path _may_ be a valid VASP calculation for MP.
Otherwise, will raise Exception.
"""
drone = VaspDrone()
try:
doc = drone.assimilate(path)
except Exception as exc:
raise Exception(f"Atomate unable to parse this directory without changes: {exc}")
doc["task_id"] = "mp-00000" # dummy task_id
try:
task_document = TaskDocument(**doc)
except Exception as exc:
raise Exception(f"Unable to construct a valid TaskDocument: {exc}")
try:
validation_doc = ValidationDoc.from_task_doc(task_document)
except Exception as exc:
raise Exception(f"Unable to construct a valid ValidationDoc: {exc}")
if validation_doc.valid != True:
raise ValueError(f"Not valid: {validation_doc.reasons}")
if validation_doc.warnings:
warn(validation_doc.warnings)
try:
entry = MaterialsProject2020Compatibility().process_entry(task_document.structure_entry)
except Exception as exc:
raise Exception(f"Unable to apply corrections: {exc}")
return True
@tschaume will defer to you on where best to put this
I've started working on the elastic builder in the updates
branch and made some fixes to make it work with latest maggma
. This is still WIP, but I want to check what is the plan for the elastic builder; seems it is not in the main branch. Do we need to migrate it to main from updates but haven't done it yet, or is there something else need to happen?
Rename Status
enum to QChemState
.
emmet/emmet-core/emmet/core/qchem/task.py
Line 24 in 839421b
The function task_type
is moved to vasp/calc_type
. This issue will cause
https://github.com/materialsproject/emmet/blob/master/emmet-cli/emmet/cli/calc.py#L17
to fail.
Fix: change to
from emmet.core.vasp.calc_types import task_type
We of course eventually want to push this to the cheese shop. It appears that "emmet" is taken, but I'm not sure the underlying code is in active development -- I can't download it or find it elsewhere on the web. @i2y, would you consider letting us have the name for PyPi?
emmet/emmet/materials/electrodes.py
Lines 182 to 186 in f17eccf
emmet/emmet/materials/electrodes.py
Lines 215 to 229 in f17eccf
emmet/emmet/materials/electrodes.py
Lines 254 to 266 in f17eccf
emmet/emmet/materials/electrodes.py
Lines 276 to 280 in f17eccf
emmet/emmet/materials/electrodes.py
Lines 362 to 382 in f17eccf
Haven't tracked this down yet but see e.g. mp-546794 in the current db release
In materials.core it's reporting nsites as 48, but in materials (via the website builder) it's reporting nsites as 6 (the correct value is 48)
Thanks to @sivonxay for reporting
Interesting issue where ICSD compound can be isotopically labeled (e.g. Deuterium instead of Hydrogen). We should sanitize this before it goes into our MongoDB of SNLs in the ICSD to Mongo adapter.
It might make sense to inherit from Atomate's Drone and add in some functionality that is unique for ingesting VASP calculations from files for emmet.
composition
and composition_reduced
appear to be serialized to a string and not to a dictionary within the oxidation states document.
The requirements.txt
for emmet-builders
has an old version of maggma in it that is not compatible with pymongo 4. If confirmed to be fully compatible, maggma should ideally be set to 0.38.1.
New base model document that always contains:
It appears that the task_type
field is blank or null for all non-GGA tasks in Knowhere/tasks
. I discovered this when testing the API tasks endpoint on r2SCAN, SCAN, and PBEsol tasks. Checking the database collection gives
For example, try task_ids
For all of these, the task_type
field is actually missing from the document.
We should include build data as a default in models:
Right now, sandboxing is still explicit and somewhat necessary for operation. The builders should work without requiring sandboxing.
This request comes up often enough that I think it's worth saving as an issue here.
I think we should consider tiered tagging. Off the top of my head, one possible scheme:
Database builder for surface properties. Will combine raw data from oriented unit cell calculations and slab calculations of different facets to generate a Wulff shape and obtained related properties such as the weighted_surface_energy, weighted_work_function, shape_factor, etc.
Is there a class or function in emmet that can take any kind of taskdoc or maybe just a vasp calculation task doc and analyze it for common types of anomalies that might occur in a calculation e.g. too much relaxation.
Or maybe there should be a separate folder in the emmet/vasp directory called warnings where there is a generic_warnings.py module for general warnings that can be applied to most calculations and a elastic.py, surface.py, diffraction.py modules for more specific warnings related to each dataset?
Line 339 in 36a3cbc
e.g. check out the doc returned for "mp-13", "elasticity" -- could result in a non-negligible hit on API performance, a rough measurement seems to show that this document for mp-13 alone is about 2.5 MB.
(Issue raised in response to an email from SP)
I'm currently trying to debug my SurfaceBuilder right now, but I seem to be running into an issue every time I try using it to post-process and insert data into my store. When initializing the SurfaceBuilder class, I insert two stores, one containing raw data from vasp (called materials) and the other for holding post processed data (surfprops_store). I have a fairly typical get_items() class method in the SurfaceBuilder and if the surfprops_store is empty (there is no data inserted yet) and I run my builder, it works just fine. However, if data already exists, the get_items() method will run into an issue with the following line of code:
f = self.materials.lu_filter(self.surfprops_store)
which causes a KeyError:
"""
/maggma/maggma/stores.py in lu_filter(self, targets)
95 targets = [targets]
96
---> 97 lu_list = [t.last_updated for t in targets]
98 return {self.lu_field: {"$gt": self.lu_func1}}
99
/maggma/maggma/stores.py in (.0)
95 targets = [targets]
96
---> 97 lu_list = [t.last_updated for t in targets]
98 return {self.lu_field: {"$gt": self.lu_func1}}
99
/maggma/maggma/stores.py in last_updated(self)
79 [(self.lu_field, pymongo.DESCENDING)]).limit(1), None)
80 # Handle when collection has docs but NoneType
lu_field.
---> 81 return (self.lu_func0 if (doc and doc[self.lu_field])
82 else datetime.min)
83
KeyError: 'last_updated'
"""
I'm assuming the way lu_fillter() works is that it reads through the store with the raw data (materials) and filters out any data that already exists in the post processed store. But I can't seem to figure out why this would cause an error if data already exists in the post-processed store.
We're still at a conundrum as to how to determine the best task for the energy in a materials doc
Right now in #203, this was reverted back to the old logic:
The first part is the biggest issue. How hard of a requirement should this be? Should we do more careful structure matching first to make sure the statics reflect energy minima? What if we have a lone static that doesn't match to a structure opt?
If we change the builders/document formats in future, it'll be a lot easier to filter which docs were generated by which version of emmet, and update as appropriate.
Wasn't sure where to put this however... seems like a sensible place to put it would be in the maggma Store code, rather than having to update each individual builder, except maggma won't know the emmet version.
emmet.builder.SETTINGS
and emmet.core.SETTINGS
should be context vars that can be "localy" overridden by builders. This makes it easier to change settings on a builder-by-builder basis and know that you're affecting all appropriate utility functions, etc.
This should be fixed in jsanitize
which should convert python set
to a list
'allowDiskUsage' needs to be added to stores within the api.
This is happening before the mp_website
build stage. Perhaps these are based on converted tasks? In any case, these material documents fail to deserialize to pymatgen objects if the "input" field is requested in a query.
db['materials.core'].count_documents(
{'inputs.structure_optimization.kpoints.@module': {'$regex': '^pymatgen.io.vaspio'}})
# -> 116
In the example MaterialsBuilder script under 'Running a Builder' in the root directory's README I'm trying to understand why the MongoStore
constructor receives the collection_name="materials"
kwarg for instantiating the tasks_store
object, and collection_name="tasks"
for instantiating materials_store
. Should these parameters be swapped? Or am I misunderstanding the conceptual/functional role of each Store instance?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.