Giter VIP home page Giter VIP logo

stark's Introduction

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

License: MIT

Interactive interface: STaRK SKB Explorer

NEWS

  • [May 2024] 🔥 We have augmented our benchmark with three high-quality human-generated query datasets which are open to access. See more details in our updated arxiv!
  • [May 11th 2024] We upgrade our Amazon knowledge base and uploaded datasets to huggingface. Now you can download the SKB data from our huggingface repo!
  • [May 9th 2024] We release STaRK SKB Explorer on Hugging Face, an interactive interface for you to explore our knowledge bases! A demo video will be out soon.
  • [May 7th 2024] We present STaRK in the 2024 Stanford Annual Affiliates Meeting and 2024 Stanford Data Science Conference.
  • [May 5th 2024] STaRK was reported on Marketpost and 智源社区 BAAI. Thanks for writing about our work!
  • [Apr 21st 2024] We release the STaRK benchmark.

What is STaRK?

STaRK is a large-scale semi-structure retrieval benchmark on Textual and Relational Knowledge Bases. Given a user query, the task is to extract nodes from the knowledge base that are relevant to the query.

Why STaRK?

  • Novel Task: Recently, large language models have demonstrated significant potential on information retrieval tasks. Nevertheless, it remains an open question how effectively LLMs can handle the complex interplay between textual and relational requirements in queries.

  • Large-scale and Diverse KBs: We provide three large-scale knowledge bases across three areas, which are constructed from public sources.

  • Natural-sounding and Practical Queries: The queries in our benchmark are crafted to incorporate rich relational information and complex textual properties, and closely mirror questions in real-life scenarios, e.g., with flexible query formats and possibly with extra contexts.

Access benchmark data

1) Env Setup

Create a conda env with python 3.8 and install required packages in requirements.txt.

conda create -n stark python=3.8 
conda activate stark
pip install -r requirements.txt

2) Data loading

Demo: See load_dataset.ipynb for more

from src.benchmarks.get_qa_dataset import get_qa_dataset
from src.benchmarks.get_semistruct import get_semistructured_data

dataset_name = 'amazon'

# Load the retrieval dataset
qa_dataset = get_qa_dataset(dataset_name)
idx_split = qa_dataset.get_idx_split()

# Load the knowledge base
kb = get_semistructured_data(dataset_name, download_processed=True)

Data of the Retrieval Task

Question answer pairs for the retrieval task are locally included in data/{dataset}/stark_qa. We provided official split in data/{dataset}/split.

Data of the Knowledge Bases

There are two ways to load the knowledge base data:

  • (Recommended) Instant downloading: The knowledge base data of all three benchmark will be automatically downloaded and loaded when setting download_processed=True.
  • Process data from raw: We also provided all of our preprocessing code for transparency. Therefore, you can process the raw data from scratch via setting download_processed=False. In this case, STaRK-PrimeKG takes around 5 minutes to download and load the processed data. STaRK-Amazon and STaRK-MAG may takes around an hour to process from the raw data.

3) Evaluation on benchmark

  • Our evaluation requires embed the node documents into candidate_emb_dict.pt, which is a dictionary node_id -> torch.Tensor. Query embeddings will be automatically generated if not available. You can either run the following the python script to download query embeddings and document embeddings generated by text-embedding-ada-002. (We provide them so you can run on our benchmark right away.)

    python download_emb.py --dataset amazon --emb_dir emb/

    Or you can run the following code to generate the query or document embeddings by yourself. E.g.,

    python generate_emb.py --dataset amazon --mode query --emb_dir emb/ --emb_model text-embedding-ada-002
    • dataset: one of amazon, mag or primekg.
    • mode: the content to embed, one of query or doc (node documents).
    • emb_dir: the directory to store embeddings.
    • emb_model: the LLM name to generate embeddings, such as text-embedding-ada-002, text-embedding-3-large.
    • See generate_emb.py for other arguments.
  • Run the python script for evaluation. E.g.,

    python eval.py --dataset amazon --model VSS --emb_dir emb/ --output_dir output/ --emb_model text-embedding-ada-002 --split test --save_pred 
    python eval.py --dataset amazon --model LLMReranker --emb_dir emb/ --output_dir output/  --emb_model text-embedding-ada-002 --split test --llm_model gpt-4-1106-preview --save_pred
    • dataset: the dataset to evaluate on, one of amazon, mag or primekg.
    • model: the model to be evaluated, one of VSS, MultiVSS, LLMReranker.
      • Please specify the name of embedding model with argument --emb_model.
      • If you are using LLMReranker, please specify API keys at config/openai_api_key.txt or config/claude_api_key.txt and the LLM name with argument --llm_model.
    • emb_dir: the directory to store embeddings.
    • split: the split to evaluate on, one of train, val, test, and human_generated_eval (to be evaluated on the human generated query dataset).
    • output_dir: the directory to store evaluation outputs.

Reference

Please consider citing our paper if you use our benchmark or code in your work:

@article{wu24stark,
    title        = {STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases},
    author       = {
        Shirley Wu and Shiyu Zhao and 
        Michihiro Yasunaga and Kexin Huang and 
        Kaidi Cao and Qian Huang and 
        Vassilis N. Ioannidis and Karthik Subbian and 
        James Zou and Jure Leskovec
    },
    eprinttype   = {arXiv},
    eprint       = {2404.13207},
  year           = {2024}
}

stark's People

Contributors

bechbd avatar wuyxin avatar zsyjosh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

stark's Issues

What do the ids mean after each query?

Initially I thought it is "qualified product ids", then I realize the query id is also in the list.

e.g.
334460,What are some Tercel women's cycling gloves made in China that you would recommend?,"[334457, 334458, 334460, 334461]"

Searched file folder but couldn't find an interpretation to the data format.

Problems with Python 3.8

Hi,

you mentioned that Python 3.8 is required. I tried this on my mac + conda I got an error

ERROR: Ignored the following versions that require a different python version: 1.2.0 Requires-Python >=3.9; 1.2.1 Requires-Python >=3.9; 1.2.1rc1 Requires-Python >=3.9 ERROR: Could not find a version that satisfies the requirement cupy-cuda11x==12.2.0 (from versions: none) ERROR: No matching distribution found for cupy-cuda11x==12.2.0

Full log

(base) tobiasoberrauch@Tobiass-MBP stark % conda create -n stark python=3.8
Channels:

  • defaults
    Platform: osx-arm64
    Collecting package metadata (repodata.json): done
    Solving environment: done

Package Plan

environment location: /opt/anaconda3/envs/stark

added / updated specs:
- python=3.8

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
pip-23.3.1                 |   py38hca03da5_0         2.6 MB
python-3.8.19              |       hb885b13_0        12.5 MB
setuptools-68.2.2          |   py38hca03da5_0         934 KB
wheel-0.41.2               |   py38hca03da5_0         107 KB
xz-5.4.6                   |       h80987f9_0         372 KB
------------------------------------------------------------
                                       Total:        16.5 MB

The following NEW packages will be INSTALLED:

ca-certificates pkgs/main/osx-arm64::ca-certificates-2024.3.11-hca03da5_0
libcxx pkgs/main/osx-arm64::libcxx-14.0.6-h848a8c0_0
libffi pkgs/main/osx-arm64::libffi-3.4.4-hca03da5_0
ncurses pkgs/main/osx-arm64::ncurses-6.4-h313beb8_0
openssl pkgs/main/osx-arm64::openssl-3.0.13-h1a28f6b_0
pip pkgs/main/osx-arm64::pip-23.3.1-py38hca03da5_0
python pkgs/main/osx-arm64::python-3.8.19-hb885b13_0
readline pkgs/main/osx-arm64::readline-8.2-h1a28f6b_0
setuptools pkgs/main/osx-arm64::setuptools-68.2.2-py38hca03da5_0
sqlite pkgs/main/osx-arm64::sqlite-3.41.2-h80987f9_0
tk pkgs/main/osx-arm64::tk-8.6.12-hb8d0fd4_0
wheel pkgs/main/osx-arm64::wheel-0.41.2-py38hca03da5_0
xz pkgs/main/osx-arm64::xz-5.4.6-h80987f9_0
zlib pkgs/main/osx-arm64::zlib-1.2.13-h5a0b063_0

Proceed ([y]/n)?

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: | WARNING conda.core.path_actions:verify(1055): Unable to create environments file. Path not writable.
environment location: /Users/tobiasoberrauch/.conda/environments.txt

done
Executing transaction: \ WARNING conda.core.envs_manager:register_env(66): Unable to register environment. Path not writable or missing.
environment location: /opt/anaconda3/envs/stark
registry file: /Users/tobiasoberrauch/.conda/environments.txt
done

To activate this environment, use

$ conda activate stark

To deactivate an active environment, use

$ conda deactivate

(base) tobiasoberrauch@Tobiass-MBP stark % conda activate stark
pip install -r requirements.txt
Collecting anthropic==0.25.0 (from -r requirements.txt (line 1))
Downloading anthropic-0.25.0-py3-none-any.whl.metadata (18 kB)
Collecting async-timeout==4.0.3 (from -r requirements.txt (line 2))
Using cached async_timeout-4.0.3-py3-none-any.whl.metadata (4.2 kB)
Collecting attrs==23.1.0 (from -r requirements.txt (line 3))
Downloading attrs-23.1.0-py3-none-any.whl.metadata (11 kB)
Collecting bs4==0.0.1 (from -r requirements.txt (line 4))
Downloading bs4-0.0.1.tar.gz (1.1 kB)
Preparing metadata (setup.py) ... done
Collecting certifi==2023.7.22 (from -r requirements.txt (line 5))
Downloading certifi-2023.7.22-py3-none-any.whl.metadata (2.2 kB)
Collecting click==8.1.7 (from -r requirements.txt (line 6))
Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting cmake==3.27.7 (from -r requirements.txt (line 7))
Downloading cmake-3.27.7-py2.py3-none-macosx_10_10_universal2.macosx_10_10_x86_64.macosx_11_0_arm64.macosx_11_0_universal2.whl.metadata (6.7 kB)
Collecting comm==0.1.4 (from -r requirements.txt (line 8))
Downloading comm-0.1.4-py3-none-any.whl.metadata (4.2 kB)
Collecting contourpy==1.1.1 (from -r requirements.txt (line 9))
Downloading contourpy-1.1.1-cp38-cp38-macosx_11_0_arm64.whl.metadata (5.9 kB)
ERROR: Ignored the following versions that require a different python version: 1.2.0 Requires-Python >=3.9; 1.2.1 Requires-Python >=3.9; 1.2.1rc1 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement cupy-cuda11x==12.2.0 (from versions: none)
ERROR: No matching distribution found for cupy-cuda11x==12.2.0

Error during processing of Amazon data from scratch

There is an error during reprocessing from the raw "amazon" data that looks to be because it is missing a reference to self.review_columns:

Traceback (most recent call last):
  File "/home/ec2-user/stark/main.py", line 7, in <module>
    kb = get_semistructured_data(dataset_name, download_processed=False)
  File "/home/ec2-user/stark/src/benchmarks/get_semistruct.py", line 9, in get_semistructured_data
    kb = AmazonSemiStruct(root=data_root,
  File "/home/ec2-user/stark/src/benchmarks/semistruct/amazon.py", line 113, in __init__
    processed_data = self._process_raw(categories)
  File "/home/ec2-user/stark/src/benchmarks/semistruct/amazon.py", line 344, in _process_raw
    node_info = self.construct_raw_node_info(df_meta_reduced, df_review_reduced, df_qa_reduced)
  File "/home/ec2-user/stark/src/benchmarks/semistruct/amazon.py", line 525, in construct_raw_node_info
    df_row_to_dict(df_i, colunm_names=review_columns \
NameError: name 'review_columns' is not defined

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.