Giter VIP home page Giter VIP logo

japanese-llm-evaluation's Introduction

jrank: Ranking Japanese LLMs

en jp

| Ranking | Blog | Discord |

This repository supports YuzuAI's Rakuda leaderboard of Japanese LLMs, which is a Japanese-focused version of LMSYS' LLM Judge.

Usage

Rakuda follows the same API as LLM Judge. First start with a question list you wish to compare the models on. These questions can be multi-turn. The default Rakuda question list is jrank/data/rakuda_v2/questions.jsonl (HF).

Then generate model answers to these questions using jrank/gen_model_answer.py:

python3 gen_model_answer.py --bench_name rakuda_v2 --model-path line-corporation/japanese-large-lm-1.7b-instruction-sft --model-id line-1.7b --conv_template ./templates/line.json

For API models, use gen_api_answer.py instead.

After generating model answers, generate judgements of these answers using gen_judgement.py.

python gen_judgment.py --bench-name rakuda_v2 --model-list chatntq-7b-jpntuned claude-2 gpt-3.5-turbo-0301-20230614 gpt-4-20230713 elyza-7b-fast-instruct elyza-7b-instruct jslm7b-instruct-alpha line-3.6b-sft rinna-3.6b-ppo rinna-3.6b-sft rwkv-world-jp-v1 stablebeluga2 weblab-10b-instruction-sft super-trin --parallel 2 --mode pairwise-n --judge-model claude-2 --n 2000

The mode option determines what kind of judgements are performed. The default for rakuda is pairwise-n, in which model answers are compared pairwise until n judgements have been reached.

Finally, fit a Bradley-Terry model to these judgements to create a model ranking.

python make_ranking.py --bench-name rakuda_v2 --judge-model claude-2 --mode pairwise --compute mle --make-charts --bootstrap-n 500 --plot-skip-list rinna-3.6b-sft super-trin elyza-7b-instruct

japanese-llm-evaluation's People

Contributors

eltociear avatar leemengtw avatar nahi-ux avatar passaglia avatar sosuke115 avatar

japanese-llm-evaluation's Issues

特殊トークンの影響を調べる

if model_id == "matsuo-lab/weblab-10b-instruction-sft":
tokenizer.pad_token_id = 1
tokenizer.eos_token_id = 0
tokenizer.bos_token_id = tokenizer.pad_token_id

  • 特殊トークンをベースllmごとに設定する必要がありそうか(性能にクリティカルに影響しそうか)
  • ありそうなら自動で適切な特殊トークンを設定するコードを追加する

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.