Giter VIP home page Giter VIP logo

synthgpt's Introduction

SynthGPT

This repository contains the data and code for Large Language Models for Inorganic Synthesis Predictions by Seongmin Kim, Yousung Jung, and Joshua Schrier.

graphical table of contents

Organization

Input data and pre-defined training and cross-validation and train/test splits are found in the data_MP and data folders, for the synthesizability and precursor selection tasks, respectively.

Results are in the results_MP and results folders, for the synthesizability and precursor selection tasks, respectively. We have used a JSON format to facilitate interpretation of the results.

Prompts for the LLM are in the prompts folder as plain text files; they can also be found in the online Supporting Information file.

Source code is in the src folder; some haphazard tests are included in tests.

Instructions

Run the notebooks in the top-level directory in order. Mathematica code (.wls) uses Mathematica 14.0 and no other libraries. Python code (.py) uses python 3.8.13 and requires libraries; Numpy (version == 1.22.3), PyTorch (version == 1.11.0), and Pymatgen (version == 2022.9.21).

The directory is organized around the order in which we performed the work, dividing the work into discrete tasks:

  • Precursor selection (scripts 00_Data_Curation.py - 07_Estimate_Perfect_Elemwise.py)
  • Synthesizability prediction (08_Data_Preparation_Synthesizability.wls - 11_Score_GPT_Outputs_Synthesizability.wls)
  • Evaluation of precursor rescoring results with GPT-4 (12a_SetupData_Combined.wls and 12b_Evaluate_Combined.wls ) and by removing recommendations that do not consist of only allowed precursors (13_Precursor_Compliance.wls and 14_Evaluate_Combination_Retaining_Only_Allowed_Precursors.wls)
  • Evaluation of the effects of prompt modification on the synthesizability prediction. These are each evaluated for only the first 5000 test items. They include modifying the prompt to add additional specialization ("You are an expert oxide inorganic chemist...", 15a_Prompt_Modification_Oxide.wls), removing specialization ("You are a magician..." 15b_Prompt_Modification_Magician.wls), and alternate ways of expressing the positive-unlabelled training task ("...items labeled "U" could be positive or negative (i.e., synthesizable or unsynthesizable"), 15c_Prompt_Modification_Labeling.wls).

Yes, this is different from the order the paper. "Life can only be understood backwards; but it must be lived forwards." --Søren Kierkegaard

Cite

A preprint appears on the ChemRXiv as doi:10.26434/chemrxiv-2024-9bmfj

synthgpt's People

Contributors

jschrier avatar seongminkim-0215 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

synthgpt's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.