Giter VIP home page Giter VIP logo

llm-jeopardy's Introduction

llm-jeopardy

Season 39 of Jeopardy! has concluded with air date 2023-07-28. Final LLMs vs. Double Jeopardy results are below.

  • Answers were re-written as questions, using the category when necessary
  • Only Double Jeopardy $2000 questions were prompted
  • Time averages shown include model loading
  • Only GGML format models were used

Thank you to TheBloke for massive and timely model conversions, ggerganov and all llama.cpp developers and to everyone creating such fantastic models and tools for LLMs!

Double

Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts

Install and run:
git clone https://github.com/aigoopy/llm-jeopardy.git
npm install
node . --help

llm-jeopardy framework uses llama.cpp for model execution and GGML models from Hugging Face. Updated with GGMLv3 models.

name percent modelcorrect modeltotal elapsed answerlen msize mdate
StableBeluga-2 70B-8_0 84.87 303 357 61.894 50.13 73.23 2023/07/27 23:37:53
Airoboros-Gpt4-l2-1.4.1 70B-8_0 84.59 302 357 51.987 21.11 73.23 2023/07/28 07:52:12
Airoboros-Gpt4-1.2 65B-8_0 82.07 293 357 40.233 12.80 69.37 2023/06/14 16:35:46
Airoboros-Gpt4-1.2 65B-5_1 80.67 288 357 37.950 12.92 48.97 2023/06/14 15:25:37
Airoboros-Gpt4-1.4 65B-8_0 79.27 283 357 45.639 21.29 69.37 2023/06/29 20:25:57
Airoboros-Gpt4-1.4 33B-8_0 75.35 269 357 23.282 17.86 34.56 2023/06/26 17:53:42
Airoboros-Gpt4-l2-1.4.1 13B-8_0 73.95 264 357 8.975 31.41 13.83 2023/07/24 12:29:49
WizardLM 30B-8_0 73.95 264 357 54.938 217.44 34.56 2023/06/06 21:08:15
Alpaca-Lora 65B-5_1 73.39 262 357 45.564 35.81 48.97 2023/05/20 12:57:30
Guanaco 65B-8_0 73.39 262 357 101.707 184.43 69.37 2023/05/26 08:46:34
WizardLM 30B-6_K 72.83 260 357 47.011 227.69 26.69 2023/06/06 19:03:43
GPT4-Alpaca-Lora 30B-8_0 72.55 259 357 49.301 159.72 34.56 2023/05/20 04:13:39
WizardLM-Unc 30B-8_0 72.55 259 357 49.972 166.25 34.56 2023/05/22 14:34:25
Upstage-Llama 30B-8_0 72.27 258 357 24.602 44.80 34.56 2023/07/20 00:49:04
Platypus 30B-8_0 72.27 258 357 25.235 21.94 34.56 2023/06/29 01:30:17
GPlatty 30B-8_0 71.99 257 357 28.163 34.95 34.56 2023/06/29 00:01:01
Wizard-Vicuna-Unc 30B-8_0 71.71 256 357 42.428 127.17 34.56 2023/05/30 04:33:26
Hippogriff 30B-8_0 71.71 256 357 48.153 156.24 34.56 2023/05/31 09:16:01
VicUnlocked-Alpaca 65B-8_0 71.71 256 357 101.302 172.31 69.37 2023/05/30 00:09:02
Llama-2 70B-8_0 71.43 255 357 98.502 141.85 73.23 2023/07/23 20:38:41
Chronoboros 33B-8_0 71.15 254 357 18.975 24.04 34.56 2023/07/10 09:16:27
Llama-Supercot 30B-8_0 71.15 254 357 36.285 93.50 34.56 2023/05/28 12:22:12
Alpaca-Lora 30B-8_0 70.59 252 357 31.041 62.35 34.56 2023/06/01 07:50:56
Samantha 33B-8_0 70.59 252 357 52.000 194.18 34.56 2023/05/29 10:18:08
GPT4-Alpaca-Lora-mlp 65B-5_1 70.31 251 357 74.840 149.92 48.97 2023/05/20 17:04:49
Guanaco 65B-5_1 70.31 251 357 84.868 186.22 48.97 2023/05/25 18:58:18
SuperPlatty 30B-8_0 70.03 250 357 19.865 23.56 34.56 2023/07/03 21:07:50
Epsilon 30B-8_0 70.03 250 357 46.188 167.89 34.56 2023/07/21 10:52:59
Ouroboros 13B-8_0 69.47 248 357 9.719 73.59 13.83 2023/07/21 12:24:28
Airoboros-Gpt4-1.4 13B-8_0 69.47 248 357 11.081 18.88 13.83 2023/06/22 08:32:58
Airochronos 33B-8_0 69.47 248 357 17.756 17.86 34.56 2023/07/10 22:07:12
Airoboros-Gpt4-1.2 13B-8_0 68.91 246 357 10.105 13.09 13.83 2023/06/16 13:03:21
Minotaur 13B-8_0 68.63 245 357 19.676 173.38 13.83 2023/06/08 21:45:25
Lazarus 30B-8_0 68.63 245 357 45.327 148.20 34.56 2023/06/07 15:58:57
WizardLM-Unc-Supercot 30B-8_0 68.63 245 357 45.739 147.25 34.56 2023/06/01 11:07:15
WizardLM-1.0 13B-8_0 68.07 243 357 25.956 230.32 13.83 2023/05/27 16:17:01
Bluemethod 13B-8_0 67.79 242 357 11.850 90.92 13.83 2023/07/21 15:44:28
Vicuna 33B-8_0 67.79 242 357 52.619 179.24 34.56 2023/06/30 16:08:40
WizardLM-Unc-1.0 13B-8_0 66.67 238 357 31.845 264.77 13.83 2023/06/20 07:44:48
WizardLM-1.1 13B-8_0 66.67 238 357 34.965 442.92 13.83 2023/07/07 16:35:42
Nous-Hermes 13B-8_0 66.39 237 357 16.097 102.92 13.83 2023/06/03 13:44:45
Tulu 30B-8_0 66.39 237 357 22.354 18.65 34.56 2023/06/10 21:47:05
Mythologic 13B-8_0 65.83 235 357 17.505 174.59 13.83 2023/07/17 10:44:11
Chronos-Hermes 13B-8_0 65.83 235 357 22.245 189.51 13.83 2023/06/13 11:02:08
Llama 30B-8_0 65.55 234 357 51.478 168.71 34.56 2023/05/20 19:50:17
Vicuna-1.3.0 13B-8_0 65.27 233 357 34.635 314.50 13.83 2023/06/25 11:15:58
Wlzard-Mega 13B-8_0 64.99 232 357 20.941 172.95 13.83 2023/05/20 03:50:25
Chimera 13B-8_0 64.71 231 357 16.359 120.17 13.83 2023/06/03 13:08:37
Chronos-WizardLM-Unc-Sc 13B-8_0 64.71 231 357 23.355 203.90 13.83 2023/06/07 14:08:04
Gpt4-X-Vicuna 13B-8_0 64.43 230 357 21.964 192.33 13.83 2023/05/20 05:02:06
Wizard-Vicuna-Unc 13B-8_0 64.15 229 357 15.259 95.08 13.83 2023/05/20 02:05:09
Baize-v2 13B-8_0 64.15 229 357 22.704 187.64 13.83 2023/05/24 12:00:06
Airoboros-Gpt4-1.4 7B-8_0 63.31 226 357 6.674 21.17 7.16 2023/06/22 07:53:28
Manticore 13B-8_0 63.31 226 357 18.155 134.92 13.83 2023/05/20 14:17:21
Based 30B-8_0 63.31 226 357 24.734 36.87 34.56 2023/06/03 10:54:07
Hypermantis 13B-8_0 62.46 223 357 14.078 85.19 13.83 2023/06/03 00:38:54
Llama-2 13B-8_0 62.18 222 357 15.439 114.92 13.83 2023/07/18 17:36:27
Guanaco 7B-8_0 62.18 222 357 33.903 705.24 7.16 2023/05/25 20:18:25
Airoboros-Gpt4-1.2 7B-8_0 61.34 219 357 6.362 16.66 7.16 2023/06/16 12:45:31
Godzilla 30B-8_0 60.78 217 357 46.349 218.28 34.56 2023/07/09 12:43:22
Selfee 13B-8_0 59.94 214 357 26.389 183.36 13.83 2023/06/06 14:23:41
AlpacaCielo 13B-8_0 59.38 212 357 16.166 162.98 13.83 2023/07/24 20:17:42
Vigogne-Instruct 13B-8_0 59.10 211 357 19.068 126.99 13.83 2023/05/25 21:58:38
Wizard-Vicuna 13B-8_0 57.70 206 357 15.309 101.37 13.83 2023/05/20 02:44:04
WizardLM-Unc-1.0 7B-8_0 57.70 206 357 18.160 292.69 7.16 2023/06/18 12:59:11
Redmond-Puffin 13B-8_0 56.02 200 357 16.205 145.26 13.83 2023/07/19 10:59:10
Wizard-Vicuna-Unc 7B-8_0 55.46 198 357 8.141 81.47 7.16 2023/05/20 01:07:29
Baize-v2 7B-8_0 55.46 198 357 12.777 182.04 7.16 2023/05/24 11:38:45
UltraLM 13B-8_0 54.90 196 357 18.542 134.40 13.83 2023/06/29 21:37:25
Orca-mini-v2 7B-8_0 54.62 195 357 7.599 125.42 7.16 2023/07/04 08:19:54
Luna-AI-Llama2-Unc 7B-8_0 54.34 194 357 8.552 136.64 7.16 2023/07/19 21:01:02
Koala 13B-8_0 54.34 194 357 38.931 423.96 13.83 2023/05/20 05:33:31
VicUnlocked-LoRA 30B-8_0 54.06 193 357 80.979 263.08 34.56 2023/05/20 22:52:56
Llama-2 7B-8_0 52.38 187 357 9.036 140.79 7.16 2023/07/18 17:16:54
WizardLM 7B-8_0 52.38 187 357 13.171 193.13 7.16 2023/05/20 00:19:49
Airoboros-Gpt4-l2-1.4.1 7B-8_0 52.10 186 357 5.218 40.92 7.16 2023/07/24 11:37:06
Open-Llama-Instruct 13B-8_0 50.98 182 357 18.954 122.75 13.83 2023/06/20 14:21:08
Koala 7B-8_0 50.14 179 357 23.667 454.51 9.76 2023/05/20 00:45:54
Open-Orca-Preview1 13B-8_0 49.58 177 357 8.179 29.38 13.83 2023/07/12 21:32:00
GPT4All-Snoozy 13B-8_0 45.66 163 357 14.654 92.66 13.83 2023/05/20 03:14:27

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.