Giter VIP home page Giter VIP logo

llm4sd's Introduction

Large Language Model for Scientific Discovery (LLM4SD)

LLM4SD is an open-source initiative that aims to leverage large language models for scientific discovery. We have now released the complete code πŸ˜†.

Code Description

QuickStart:

🌟 First, requirements are shown in the requirements.txt. Please use the requirements.txt to create the environment for running LLM4SD.

🌟 Second, please put your Openai API key in the bash file before you run the bash file. The Openai API will be used to call GPT-4 to conduct text summarisation for knowledge inference information and automatic code generation.

To run tasks for ["bbbp" "bace" "clintox" "esol" "freesolv" "hiv" "lipophilicity"]. Please run:

bash run_others.sh

To run tasks for "Tox21" and "Sider". Please run:

bash run_tox21.sh
bash run_sider.sh

To run tasks for "Qm9". Please run:

bash run_qm9.sh

The Process of LLM4SD Code Pipeline:

In the bash file, the LLM4SD is conducted in the following process:

πŸ‘‰: "Knowledge synthesize from the literature", this step will call python synthesize.py The synthesized rules are stored under the prior_knowledge folder.

πŸ‘‰: "Knowledge inference from data", this step will call python inference.py The inferred rules are stored under the data_knowledge folder.

πŸ‘‰: "Inferred Knowledge Summarization", this step will call python summarize_rules.py The summarized rules are stored under the summarized_inference_rules folder. --> The purpose of this step is to drop duplicate rules.

πŸ‘‰: "Automatic Code Generation & Evaluation", this step will call python auto_gen_and_eval.py This step will automatically generate the code using GPT-4 and run experiments to get the model performance. Please note that, in practice, human experts would review the code before usage. However, even with automatic code generation and direct evaluation, the code achieves pretty much the same performance.

πŸ““Notes: We have also provided an advanced automatic code generation tool based on the newly released OpenAI Assistant. If you are interested in trying the assistant version of code generation, please check out the "code_gen.py" and "eval.py" files in the folder "LLM4SD-gpt4-demo".

PS: To obtain an explanation, you can use the information provided by the trained interpretable model and structure a prompt to let an LLM explain the result as shown in the paper.

Direct Evaluation:

A direct evaluation of the generated code of a specific task. You can run:

python eval.py --dataset ${dataset} --subtask "{subtask_name}" --model ${model_name} --knowledge_type ${knowledge_type} [if evaluating inference code or combined code specify --num_samples ${number of responses during inference}]

A direct evaluation of all generated code in all tasks. You can run:

bash eval_code.sh

Architecture of LLM4SD

image

Web-based application developed based on LLM4SD (Will be released soon)

Comments are welcome to help us improve the web-based application:exclamation::exclamation::exclamation:

1.Knowledge Synthesis (Derive Knowledge from Scientific Literature)

image image

2.Knowledge Inference (Derive Knowledge from Analyzing Scientific Data)

image image

3.Prediction with Explanation (Explaining how the Prediction is derived)

image image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.