Giter VIP home page Giter VIP logo

causal_llm's Introduction

Is Knowledge all Large Language Models Needed for Causal Reasoning?

This includes the original implementation of Is Knowledge all Large Language Models Needed for Causal Reasoning?

In this implementation, we encapsulate the series of novel experiments that generate counterfactual settings to estimate attribution scores of LLMs in the proposed causal attribution model. We also include two additional experiments: reversal causal inference and pairwise causal discovery. In each case, true discovery rate, false discovery rate, F1 score, and Structual Hamming Distance are also provided to improve the understanding of LLMs' ability to do causal reasoning.

Our implementation can be easily adapted to testing other LLMs and settings by defining the interface of them without modifying the source code.

Content

  1. Installation
  2. Quick Start
  3. Input Argument
  4. Run Example
  5. Output Explaination

Installation

Install dependent Python libraries by running the command below.

pip install -r requirements.txt

Please use python version >= 3.10.0, as the Match Case is not applied into the older version. Besides, you need to create .env file to store the openai api key.

Quick start

Once you finish the environment setup, run the command below to run the test script with the default input argument

python test.py

The default setup tests the full graph causal discovery capability of GPT4 turbo on the Sachs dataset.

Input Argument

  • llm: The LLMs model to be tested. User could choose from [gpt4, claude2, jurassic, llama2].
  • data: The path of the data to be tested. It needs to be the format of csv or txt file, separated by "\t".
  • relations: The path of the true causal relations for the data to be tested. It needs to be the format of a txt file. For example, if the true causal relationships are Father's Height -> Child's Height, Mother's Height -> Child's Height, and Child's Gender -> Child's Height. It should store [("Father's Height", " Child's Height"), ("Mother's Height", "Child's Height"), ("Child's Gender", "Child's Height")] in the text file.
  • task_type: There are two types of tasks: full_graph_discovery and two_variable_discovery.
  • linear_coefficient: The linear coefficient in the simulation setting of pairwise causal discovery.
  • df: The degree of freedom (parameter of Chi-square distribution) in the simulation setting of pairwise causal discovery.
  • count: The number of trials to be tested
  • max_tokens: The maximum number of tokens to be used in the evaluation (usually the context length of LLMs)
  • reserved_ratio: The ratio of the max_tokens to be used in the evaluation test (to control the amount of numerical data used in the trial)
  • model_name: The OpenAI GPT model to be tested. User could choose from [gpt-3.5-turbo, gpt-4, gpt-4-1106-preview]

Run Example

Full Graph Discovery

To test full causal graph discovery capability of GPT4 on the Sachs Data for 15 trials, run following command below

python test.py --llm gpt4 \
               --data ./data/sachs.txt \
               --relations ./relation/relation.txt \
               --task_type full_graph_discovery \
               --count 15 \
               --max_tokens 8000 \
               --reserved_ratio 0.5 \
               --model_name gpt4

To test pairwise causal discovery capability of GPT4 on the Sachs Data for 15 trials, run following command below

python test.py --llm gpt4 \
               --data ./data/sachs.txt \
               --relations ./relation/relation.txt \
               --task_type two_variable_discovery \
               --linear_coefficient 1 \
               --df 3 \
               --count 15 \
               --max_tokens 8000 \
               --reserved_ratio 0.5 \
               --model_name gpt4

Output Explaination

Illustration of output for full causal graph discovery (2 trials)

({1: {'tdr': (0.55, 0.04999999999999999), 'fdr': (0.45, 0.04999999999999999), 'F1': (0.5499950000458329, 0.049999999995833405), 'shd': (16.5, 0.5)}, 
  2: {'tdr': (0.72275, 0.17725000000000002), 'fdr': (0.27725, 0.17725000000000002), 'F1': (0.7227450000368034, 0.1772499999909743), 'shd': (16.0, 0.0)}, 
  3: {'tdr': (0.05, 0.05), 'fdr': (0.95, 0.04999999999999999), 'F1': (0.049997500124993745, 0.049997500124993745), 'shd': (29.5, 5.5)}, 
  4: {'tdr': (0.05, 0.05), 'fdr': (0.95, 0.04999999999999999), 'F1': (0.049997500124993745, 0.049997500124993745), 'shd': (21.5, 0.5)}, 
  5: {'tdr': (0.75, 0.25), 'fdr': (0.5, 0.0), 'F1': (0.5833286111509256, 0.0833336111009261), 'shd': (17.5, 1.5)}, 
  6: {'tdr': (0.0, 0.0), 'fdr': (1.0, 0.0), 'F1': (0.0, 0.0), 'shd': (20.0, 0.0)}}, 
  {'CAK': (0.5, 0.0), 'CAD': (-0.17275000000000001, 0.12725000000000003), 'MAD': (0.05, 0.05), 'MAK': (0.72275, 0.17725000000000002)})
  • 1 denotes the Raw Data setting
  • 2 denotes the Omit Data setting
  • 3 denotes the Omit Knowledge setting
  • 4 denotes the Reverse Causal Inference setting with reversed causal graph as the true causal graph
  • 5 denotes the Reverse Causal Inference setting with original causal graph as the true causah graph
  • 6 denotes the Random Guess setting
  • tdr: true discovery rate, fdr: false discovery rate, F1: F1 score, shd: Structual Hamming Distance
  • CAK: Conditional attribution of knowledge, CAD: Conditional attribution of data, MAK: Marginal attribution of Knowledge, MAD: Marginal attribution of data
  • (*, *) denotes the mean and the standard deviation with respect to total number of trials

Illustration of outout for pairwise causal graph discovery (2 trials)

{'cdm': 1.0, 'llm': 0.5, 'llm_native': 0.0}
  • cdm: mean accuracy of pairwise causal discovery by running LinGAM algorithm
  • llm: mean accuracy of pairwise causal discovery by instructing LLM the procedure of LinGAM
  • llm_native: mean accuracy of pairwise causal discovery by direclty asking LLM to generate the causal direction

causal_llm's People

Contributors

ncsulsj avatar jingwu6 avatar

Stargazers

Huiyang Yi avatar  avatar 鳴轉 avatar Jeff Carpenter avatar Hunter avatar Jingyuan Bao avatar Xin Du avatar Aditya Mavle avatar Madhumita Sushil avatar SANKAR N avatar Shu avatar  avatar  avatar Travis Li avatar

Watchers

Kostas Georgiou avatar Hunter avatar  avatar

Forkers

jeremite

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.