Giter VIP home page Giter VIP logo

cmath's Introduction

CMATH

介绍

本项目中我们提出了CMATH数据集,包括1.7k个小学水平的数学应用题和详细的注释。本数据集旨在提供一个基准工具来评估以下问题:当前流行的大模型的数学能力对应小学数学几年级的水平?我们评估了各种流行的大模型,发现只有GPT-4能通过所有六个年级的数学考试(准确率>=60%)。此外,我们通过在CMATH数据集中添加干扰信息来评估大模型的稳健性。我们的研究结果表明,GPT-4是唯一保持鲁棒性的模型。

数据集

cmath_dev

我们分两批开源CMATH数据集中的样本。第一批开源600条样本,每个年级100条。首次开源的样本可以视为一个dev集。剩余的样本(可以视为test集)将在年底开源。

样本示例

CMATH样本及标注

模型表现对比

模型表现对比

distractor

为了评估大模型面对干扰信息的稳健性,我们创建了一个小型“干扰集”,包含60条样本。每条样本中包含1个原始问题,以及5个由我们手工添加干扰信息后的“增广问题”,共6个问题。

样本示例

样本

模型表现对比

模型表现对比

代码

我们提供了一个脚本eval.py用于对模型生成的回复进行自动化的评估。

开源协议

  • 源代码:MIT license
  • 数据集:CC BY 4.0

引用我们

@misc{wei2023cmath,
      title={CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?}, 
      author={Tianwen Wei and Jian Luan and Wei Liu and Shuang Dong and Bin Wang},
      year={2023},
      eprint={2306.16636},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

cmath's People

Contributors

qinzuoyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cmath's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.