Giter VIP home page Giter VIP logo

materials-synthesis-generative-models's Introduction

Generative models and NLP resources for materials synthesis

Public release of data and code for materials synthesis generation, along with NLP resources for materials science. ๐ŸŽ‰

This code and data is a companion to the paper, "Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks."

Demo ๐Ÿ

demo.ipynb (or demo.html) contains a Python demo showcasing the fine-tuned word embeddings introduced in this paper. The demo also provides an example of building and inspecting the autoencoder models.

Annotated NER Data ๐Ÿ“

data/ner_annotations.json contains tokenized and labelled NER information for 235 synthesis recipes. Each annotated recipe is marked by a "split" key which may be "train", "test", or "dev" - and there are also five papers (which were used for interannotator agreement internally) marked with a "metrics" split. These splits are merely suggested (and were indeed computed randomly), and so we encourage others to use whatever splits of the data they deem appropriate. This file should be usable as-is for training NER models. Each annotated document contains equal-length arrays of tokens and their respective labels.

data/brat/ contains raw annotation files in the BRAT annotation format. You can load these into your own instance of BRAT and modify the annotations however you like! These files contain event/relation annotations as well (e.g., "heat" acts on "titania").

NLP Resource Downloads ๐Ÿ’ฝ

Along with this work, we also open-source two pre-trained word embedding models: FastText and ELMo, each trained on our internal database of over 2.5 million materials science articles.

The FastText model follows the gensim Python library, and can be loaded as a keyedvectors object. Please see the gensim documentation for more details. Note that our version FastText is trained on lowercase text only.

The ELMo model follows the weights/options layout in the allenai/bilm-tf public GitHub repository. You can load the embeddings as described in their README (or just use the code in this repo, at models/token_classifier.py), but simply swap out the weight and options files. We found that using the default vocab.txt works fine, so there's no need to swap anything out in that case. As per the recommendations of the ELMo authors, we don't perform lowercase normalization for ELMo, so you can compute word vectors for text "as-is."

Links to the trained models/weights are as follows:

Neural Network Models/Data ๐Ÿง 

models/action_generator.py contains the architecture for the CVAE (synthesis action generation).

models/material_generator.py contains the architecture for the CVAE (precursor generation).

model/token_classifier.py contains the architecture for the NER model. The methods used for loading in a pretrained ELMo model (via Tensorflow) are also provided here.

model/paragraph_classifier.py contains the architecture and code used for the paragraph classifier model.

data/unsynth_recipes_w_citations.json collects the suggested recipes produced by the CVAE model for screening unsynthesized ABO3-perovskite compounds. The document also contains CVAE-suggested nearest-neighbor literature.

Citing ๐Ÿ“š

If you use this work (e.g., the NER model, the generative models, the pre-trained embeddings), please cite the following work(s) as appropriate:

[Citation(s) to be included soon]

materials-synthesis-generative-models's People

Contributors

eddotman avatar jppgks avatar zjensen262 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.