Giter VIP home page Giter VIP logo

emmet's People

Contributors

acrutt avatar andrew-s-rosen avatar dbroberg avatar dependabot[bot] avatar dwinston avatar esoteric-ephemera avatar espottesmith avatar fraricci avatar gpetretto avatar hmlli avatar jageo avatar janosh avatar jmmshn avatar jpalakapilly avatar kim-jiyoon avatar kmu avatar mattmcdermott avatar mjwen avatar mkhorton avatar montoyjh avatar munrojm avatar nisse3000 avatar orionarcher avatar rdguha1995 avatar rkingsbury avatar shyamd avatar tschaume avatar tsmathis avatar utf avatar yang-ruoxi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emmet's Issues

Bad Co data

As mentioned in the email thread, the recent database updates has caused SEVERE issues with Co materials. E.g., Layered LiCoO2 is 200 meV/atom above hull. @computron and I have debugged and it is clear that this is because:

  1. The new static runs are being done with Co high spin.
  2. These new runs are blessed despite having much higher energies (0.5 eV/atom) than previous tasks.

You can prove this is the case by searching for LiCoO2 in both prev.materialsproject.org and www.materialsproject.org.

I recommend the following corrective steps:

  1. Immediate - rebuild and release a database that only include Co static runs if the static runs are not more than 50 meV/atom higher in energy than the lowest energy in all previous relaxation and static tasks. This is of immediate priority. Right now, MP is basically reporting garbage for all Co compounds.
  2. Immediate - write a validator that forbids new calculations with the same functional to be more than 50 meV/atom higher in energy than the lowest energy structure for all cases.
  3. Within 1-2 months: Redo all Co static calculations in low spin.

formula autocomplete: tuple index out of range

/materials/formula_autocomplete

builtins.IndexError: tuple index out of range
Traceback (most recent call last):
  File "/root/.local/lib/python3.9/site-packages/ddtrace/contrib/asgi/middleware.py", line 173, in __call__
    return await self.app(scope, receive, wrapped_send)
  File "/root/.local/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc
  File "/root/.local/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 259, in handle
    await self.app(scope, receive, send)
  File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 61, in app
    response = await func(request)
  File "/root/.local/lib/python3.9/site-packages/fastapi/routing.py", line 216, in app
    solved_result = await solve_dependencies(
  File "/root/.local/lib/python3.9/site-packages/fastapi/dependencies/utils.py", line 527, in solve_dependencies
    solved = await run_in_threadpool(call, **sub_values)
  File "/root/.local/lib/python3.9/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/root/.local/lib/python3.9/site-packages/anyio/to_thread.py", line 28, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
  File "/root/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
    return await future
  File "/root/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 754, in run
    result = context.run(func, *args)
  File "/emmet-api/emmet/api/routes/materials/query_operators.py", line 360, in query
    comp_red = comp.reduced_composition.items()
  File "/root/.local/lib/python3.9/site-packages/pymatgen/core/composition.py", line 349, in reduced_composition
    return self.get_reduced_composition_and_factor()[0]
  File "/root/.local/lib/python3.9/site-packages/pymatgen/core/composition.py", line 359, in get_reduced_composition_and_factor
    factor = self.get_reduced_formula_and_factor()[1]
  File "/root/.local/lib/python3.9/site-packages/pymatgen/core/composition.py", line 384, in get_reduced_formula_and_factor
    (formula, factor) = reduce_formula(d, iupac_ordering=iupac_ordering)
  File "/root/.local/lib/python3.9/site-packages/pymatgen/core/composition.py", line 1183, in reduce_formula
    factor = abs(gcd(*(int(i) for i in sym_amt.values())))
  File "/root/.local/lib/python3.9/site-packages/monty/fractions.py", line 17, in gcd
    n = numbers[0]
IndexError: tuple index out of range

Thermo: mixing scheme requires unique entry_id

The MP DFT mixing scheme requires all ComputedEntry to have unique entry_id in order to process them. Currently the entry_id is set equal to the material_id, which means that, for example, a GGA and a R2SCAN calculation for a particular material will have the same entry_id. It might be helpful to store some type of modified mpid such as mp-1234-GGA in entry.data to facilitate use of the mixing scheme in ThermoBuilder. Alternatively we could use these suffixed mpids as the entry_id in entries.< functional> by default, without changing the material_id in the task document.

Flagging @arosen93 and @munrojm in case they have thoughts.

Exposing formation energies and stability data through the OPTIMADE API

Hi MP devs, would there be any interest in exposing (presumably PBE) formation energies and hull distances via the MP OPTIMADE API? This would involve adding custom _mp_hull_distance (or whatever) fields and listing them in the config of your OPTIMADE server (which I guess is exterior to this repo).

OQMD currently provide this data with the _oqmd_stability field, which is very useful when using their OPTIMADE API as part of an experimental workflow, e.g., automated XRD refinement. Eventually, it would be great to get the big DFT databases to agree on a standard prefix for this kind of data so that cross-database queries for proposed stable materials can be performed.

Cheers!

out of range float values not JSON compliant

/tasks/trajectory/mp-1354331/

builtins.ValueError: Out of range float values are not JSON compliant
Traceback (most recent call last):
  File "/root/.local/lib/python3.9/site-packages/ddtrace/contrib/asgi/middleware.py", line 173, in __call__
    return await self.app(scope, receive, wrapped_send)
  File "/root/.local/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc
  File "/root/.local/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 259, in handle
    await self.app(scope, receive, send)
  File "/root/.local/lib/python3.9/site-packages/starlette/routing.py", line 61, in app
    response = await func(request)
  File "/root/.local/lib/python3.9/site-packages/fastapi/routing.py", line 250, in app
    response = actual_response_class(response_data, **response_args)
  File "/root/.local/lib/python3.9/site-packages/starlette/responses.py", line 49, in __init__
    self.body = self.render(content)
  File "/root/.local/lib/python3.9/site-packages/starlette/responses.py", line 174, in render
    return json.dumps(
  File "/usr/local/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/usr/local/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
ValueError: Out of range float values are not JSON compliant

Documentation: example output

Rather than trying to manually document them (which is proving time-consuming), I propose including 1 sample output document as JSON for each store used by the builders. I think it'll be a lot easier for debugging and for writing new builders.

Feature Request: make `MaterialsBuilder` robust to task docs from multiple codes

My task collection includes valid task documents generated by different codes (specifically, VASP and Qchem). If I have both VASP and Qchem task docs for materials with the same formula (in this case, Cl), MaterialsBuilder fails to build a materials document.

The failure is apparently a result of a KeyError, since Qchem task docs do not contain an orig_inputs key. Once this error is encountered, it seems to prevent .get_items() from finding any new materials to update.

2021-02-11 14:44:09,466 - MaterialsBuilder - ERROR - 'orig_inputs'
2021-02-11 14:44:09,466 - MaterialsBuilder - INFO - No items to update

I admit this is probably a corner case, but I think it's important that we make MaterialsBuilder as robust as possible.

This is really a validation issue. MaterialsBuilder relies on the task_types collection to determine which tasks are valid, but 'valid' only means valid within a particular code. It seems like we need some additional information in task_types to indicate what code generated the task doc in the first place. Then MaterialsBuilder could just filter for VASP docs.

Expected behavior:

MaterialsBuilder ignores invalid task documents, or task documents of the wrong type, and builds materials documents from the valid (VASP) task documents.

Actual behavior:

MaterialsBuilder does not generate any materials docs for the material.

Structure metadata

Should make a builder to back out more material metadata:
1.) Find materials that differ by one missing atom
2.) Find materials that are frameworks - IE electrodes
3.) Find materials that are substitutions
4.) Find materials related by a doping transformation

Set nkpts field in materials.mp_website.MPBuilder.website store

I patched the production collection to include an nkpts field in each document, which is expected for the "# of K-points" cell of the "Structure Optimization" subsection of the "Calculation Summary" section on a material detail page (see screenshot below).

image

My procedure for the patch is shown below to indicate how I derived nkpts from currently-built fields.

docs = list(db.materials.find({}, ["task_id", "blessed_tasks"]))
for d in docs:
    d["opt_id"] = d["blessed_tasks"][next(k for k in d["blessed_tasks"]
                                          if "Optimization" in k)]
rv = list(db.tasks.aggregate([
    {"$match": {"task_id": {"$in": [d["opt_id"] for d in docs]}}},
    {"$project": {
        "task_id": 1,
        "firstcalc_kpts": {
            "$arrayElemAt": [
                "$calcs_reversed.input.kpoints",
                -1]
        }
    }},
    {"$project": {
        "task_id": 1,
        "nkpts": {"$size": "$firstcalc_kpts.actual_points"}}},
]))
rv_map = {d["task_id"]: d["nkpts"] for d in rv}

from pymongo import UpdateOne
requests = []
for d in docs:
    requests.append(UpdateOne(
        {"task_ids": d["opt_id"]},
        {"$set": {"nkpts": rv_map[d["opt_id"]]}}
    ))
db_rw.materials.bulk_write(requests, ordered=False)

Bug in Atomate Drone

Atomate drone is designed to run after a success RunVasp firetask so it misses edge cases where calculations are failed. We need to account for this maybe in a EmmetVaspDrone?

Add convenience method to `emmet-cli` to validate a VASP calculation

Sample code provided below. This is not meant to provide any guarantees but might be a useful as a first pass.

from atomate.vasp.drones import VaspDrone
from emmet.core.vasp.task import TaskDocument
from emmet.core.vasp.validation import ValidationDoc
from warnings import warn
from pymatgen.entries.compatibility import MaterialsProject2020Compatibility

def is_path_valid(path) -> bool:
    """
    If True, path _may_ be a valid VASP calculation for MP.
    
    Otherwise, will raise Exception.
    """
    
    drone = VaspDrone()
    try:
        doc = drone.assimilate(path)
    except Exception as exc:
        raise Exception(f"Atomate unable to parse this directory without changes: {exc}")
        
    doc["task_id"] = "mp-00000"  # dummy task_id

    try:
        task_document = TaskDocument(**doc)
    except Exception as exc:
        raise Exception(f"Unable to construct a valid TaskDocument: {exc}")
        
    try:
        validation_doc = ValidationDoc.from_task_doc(task_document)
    except Exception as exc:
        raise Exception(f"Unable to construct a valid ValidationDoc: {exc}")
        
    if validation_doc.valid != True:
        raise ValueError(f"Not valid: {validation_doc.reasons}")
    
    if validation_doc.warnings:
        warn(validation_doc.warnings)
        
    try:
        entry = MaterialsProject2020Compatibility().process_entry(task_document.structure_entry)
    except Exception as exc:
        raise Exception(f"Unable to apply corrections: {exc}")

    return True

@tschaume will defer to you on where best to put this

Elastic builder

I've started working on the elastic builder in the updates branch and made some fixes to make it work with latest maggma. This is still WIP, but I want to check what is the plan for the elastic builder; seems it is not in the main branch. Do we need to migrate it to main from updates but haven't done it yet, or is there something else need to happen?

`task_type` is moved to `vasp/calc_type`

The function task_type is moved to vasp/calc_type. This issue will cause

https://github.com/materialsproject/emmet/blob/master/emmet-cli/emmet/cli/calc.py#L17

to fail.

Fix: change to

from emmet.core.vasp.calc_types import task_type

PyPi package

We of course eventually want to push this to the cheese shop. It appears that "emmet" is taken, but I'm not sure the underlying code is in active development -- I can't download it or find it elsewhere on the web. @i2y, would you consider letting us have the name for PyPi?

Conflicts in electrodes.py

<<<<<<< HEAD
docs = [] # results
=======
docs = [] # results
>>>>>>> master

<<<<<<< HEAD
group_sbx = list(
filter(
lambda ent: (isbx in ent.data["_sbxn"])
or (ent.data["_sbxn"] == ["core"]),
group,
)
)
self.logger.debug(
f"Grouped entries in sandbox {', '.join([en.name for en in group_sbx])}"
)
=======
group_sbx = list(filter(lambda ent : (isbx in ent.data['_sbxn']) or (ent.data['_sbxn']==['core']), group))
self.logger.debug(f"Grouped entries in sandbox {isbx} -- {', '.join([en.name for en in group_sbx])}")
>>>>>>> master

<<<<<<< HEAD
d["battid"] = lowest_id + "_" + self.working_ion
# Only allow one sandbox value for each electrode
if isbx != "core":
d["_sbxn"] = isbx
=======
if isbx == 'core':
d['battid'] = lowest_id+'_'+self.working_ion
else:
d['battid'] = lowest_id+'_'+self.working_ion+'_'+isbx
# Only allow one sandbox value for each electrode
d['_sbxn'] = [isbx]
>>>>>>> master

<<<<<<< HEAD
self.electro.update(docs=items, key="battid")
=======
self.electro.update(docs=items, key=['battid'])
>>>>>>> master

<<<<<<< HEAD
struct = Structure.from_dict(d["structure"])
en = ComputedStructureEntry(
structure=struct,
energy=d["thermo"]["energy"],
parameters=d["calc_settings"],
entry_id=d["task_id"],
)
if "_sbxn" in d:
en.data["_sbxn"] = d["_sbxn"]
else:
en.data["_sbxn"] = ["core"]
=======
struct = Structure.from_dict(d['structure'])
en = ComputedStructureEntry(structure=struct,
energy=d['thermo']['energy'],
parameters=d['calc_settings'],
entry_id=d['task_id'],
)
en.data['_sbxn'] = d['_sbxn']
>>>>>>> master

nsites issue, incorrect value in website builder

Haven't tracked this down yet but see e.g. mp-546794 in the current db release

In materials.core it's reporting nsites as 48, but in materials (via the website builder) it's reporting nsites as 6 (the correct value is 48)

Thanks to @sivonxay for reporting

Dealing with odd compositions in ICSD to Mongo

Interesting issue where ICSD compound can be isotopically labeled (e.g. Deuterium instead of Hydrogen). We should sanitize this before it goes into our MongoDB of SNLs in the ICSD to Mongo adapter.

VaspDrone for emmet?

It might make sense to inherit from Atomate's Drone and add in some functionality that is unique for ingesting VASP calculations from files for emmet.

Update emmet-builders requirements.txt

The requirements.txt for emmet-builders has an old version of maggma in it that is not compatible with pymongo 4. If confirmed to be fully compatible, maggma should ideally be set to 0.38.1.

missing task_type field for non-GGA calculations

It appears that the task_type field is blank or null for all non-GGA tasks in Knowhere/tasks. I discovered this when testing the API tasks endpoint on r2SCAN, SCAN, and PBEsol tasks. Checking the database collection gives

image

For example, try task_ids

  • mp-1942988
  • mp-1943782
  • mp-1536256

For all of these, the task_type field is actually missing from the document.

Tag materials with experimental citations

This request comes up often enough that I think it's worth saving as an issue here.

I think we should consider tiered tagging. Off the top of my head, one possible scheme:

  • tier 1: matches to experimental ordered structures. no false positives.
  • tier 2: orderings of experimental disordered structures. no false positives.
  • tier 3: matches ICSD structure, but we're not sure if it's an experimental structure

SurfacePropertiesBuilder

Database builder for surface properties. Will combine raw data from oriented unit cell calculations and slab calculations of different facets to generate a Wulff shape and obtained related properties such as the weighted_surface_energy, weighted_work_function, shape_factor, etc.

A generalized warning generator for task documents?

Is there a class or function in emmet that can take any kind of taskdoc or maybe just a vasp calculation task doc and analyze it for common types of anomalies that might occur in a calculation e.g. too much relaxation.

Or maybe there should be a separate folder in the emmet/vasp directory called warnings where there is a generic_warnings.py module for general warnings that can be applied to most calculations and a elastic.py, surface.py, diffraction.py modules for more specific warnings related to each dataset?

KeyError when running lu_field with a store already containing data

I'm currently trying to debug my SurfaceBuilder right now, but I seem to be running into an issue every time I try using it to post-process and insert data into my store. When initializing the SurfaceBuilder class, I insert two stores, one containing raw data from vasp (called materials) and the other for holding post processed data (surfprops_store). I have a fairly typical get_items() class method in the SurfaceBuilder and if the surfprops_store is empty (there is no data inserted yet) and I run my builder, it works just fine. However, if data already exists, the get_items() method will run into an issue with the following line of code:

f = self.materials.lu_filter(self.surfprops_store)

which causes a KeyError:
"""
/maggma/maggma/stores.py in lu_filter(self, targets)
95 targets = [targets]
96
---> 97 lu_list = [t.last_updated for t in targets]
98 return {self.lu_field: {"$gt": self.lu_func1}}
99

/maggma/maggma/stores.py in (.0)
95 targets = [targets]
96
---> 97 lu_list = [t.last_updated for t in targets]
98 return {self.lu_field: {"$gt": self.lu_func1}}
99

/maggma/maggma/stores.py in last_updated(self)
79 [(self.lu_field, pymongo.DESCENDING)]).limit(1), None)
80 # Handle when collection has docs but NoneType lu_field.
---> 81 return (self.lu_func0 if (doc and doc[self.lu_field])
82 else datetime.min)
83

KeyError: 'last_updated'
"""

I'm assuming the way lu_fillter() works is that it reads through the store with the raw data (materials) and filters out any data that already exists in the post processed store. But I can't seem to figure out why this would cause an error if data already exists in the post-processed store.

Better determination of which calc in Vasp MaterialsDoc

We're still at a conundrum as to how to determine the best task for the energy in a materials doc

Right now in #203, this was reverted back to the old logic:

  • Prefer statics over structure optimizations
  • Prefer spin-polarized
  • Prefer aspherical corrections

The first part is the biggest issue. How hard of a requirement should this be? Should we do more careful structure matching first to make sure the statics reflect energy minima? What if we have a lone static that doesn't match to a structure opt?

Suggestion: include emmet version in all output docs

If we change the builders/document formats in future, it'll be a lot easier to filter which docs were generated by which version of emmet, and update as appropriate.

Wasn't sure where to put this however... seems like a sensible place to put it would be in the maggma Store code, rather than having to update each individual builder, except maggma won't know the emmet version.

Fix and standardize settings context var

emmet.builder.SETTINGS and emmet.core.SETTINGS should be context vars that can be "localy" overridden by builders. This makes it easier to change settings on a builder-by-builder basis and know that you're affecting all appropriate utility functions, etc.

Documentation: example MaterialsBuilder script

In the example MaterialsBuilder script under 'Running a Builder' in the root directory's README I'm trying to understand why the MongoStore constructor receives the collection_name="materials" kwarg for instantiating the tasks_store object, and collection_name="tasks" for instantiating materials_store. Should these parameters be swapped? Or am I misunderstanding the conceptual/functional role of each Store instance?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.