Season 39 of Jeopardy! has concluded with air date 2023-07-28. Final LLMs vs. Double Jeopardy results are below.
- Answers were re-written as questions, using the category when necessary
- Only Double Jeopardy $2000 questions were prompted
- Time averages shown include model loading
- Only GGML format models were used
Thank you to TheBloke for massive and timely model conversions, ggerganov and all llama.cpp developers and to everyone creating such fantastic models and tools for LLMs!
Automated prompting and scoring framework to evaluate LLMs using updated human knowledge prompts
Install and run:
git clone https://github.com/aigoopy/llm-jeopardy.git
npm install
node . --help
llm-jeopardy framework uses llama.cpp for model execution and GGML models from Hugging Face. Updated with GGMLv3 models.
name | percent | modelcorrect | modeltotal | elapsed | answerlen | msize | mdate |
---|---|---|---|---|---|---|---|
StableBeluga-2 70B-8_0 | 84.87 | 303 | 357 | 61.894 | 50.13 | 73.23 | 2023/07/27 23:37:53 |
Airoboros-Gpt4-l2-1.4.1 70B-8_0 | 84.59 | 302 | 357 | 51.987 | 21.11 | 73.23 | 2023/07/28 07:52:12 |
Airoboros-Gpt4-1.2 65B-8_0 | 82.07 | 293 | 357 | 40.233 | 12.80 | 69.37 | 2023/06/14 16:35:46 |
Airoboros-Gpt4-1.2 65B-5_1 | 80.67 | 288 | 357 | 37.950 | 12.92 | 48.97 | 2023/06/14 15:25:37 |
Airoboros-Gpt4-1.4 65B-8_0 | 79.27 | 283 | 357 | 45.639 | 21.29 | 69.37 | 2023/06/29 20:25:57 |
Airoboros-Gpt4-1.4 33B-8_0 | 75.35 | 269 | 357 | 23.282 | 17.86 | 34.56 | 2023/06/26 17:53:42 |
Airoboros-Gpt4-l2-1.4.1 13B-8_0 | 73.95 | 264 | 357 | 8.975 | 31.41 | 13.83 | 2023/07/24 12:29:49 |
WizardLM 30B-8_0 | 73.95 | 264 | 357 | 54.938 | 217.44 | 34.56 | 2023/06/06 21:08:15 |
Alpaca-Lora 65B-5_1 | 73.39 | 262 | 357 | 45.564 | 35.81 | 48.97 | 2023/05/20 12:57:30 |
Guanaco 65B-8_0 | 73.39 | 262 | 357 | 101.707 | 184.43 | 69.37 | 2023/05/26 08:46:34 |
WizardLM 30B-6_K | 72.83 | 260 | 357 | 47.011 | 227.69 | 26.69 | 2023/06/06 19:03:43 |
GPT4-Alpaca-Lora 30B-8_0 | 72.55 | 259 | 357 | 49.301 | 159.72 | 34.56 | 2023/05/20 04:13:39 |
WizardLM-Unc 30B-8_0 | 72.55 | 259 | 357 | 49.972 | 166.25 | 34.56 | 2023/05/22 14:34:25 |
Upstage-Llama 30B-8_0 | 72.27 | 258 | 357 | 24.602 | 44.80 | 34.56 | 2023/07/20 00:49:04 |
Platypus 30B-8_0 | 72.27 | 258 | 357 | 25.235 | 21.94 | 34.56 | 2023/06/29 01:30:17 |
GPlatty 30B-8_0 | 71.99 | 257 | 357 | 28.163 | 34.95 | 34.56 | 2023/06/29 00:01:01 |
Wizard-Vicuna-Unc 30B-8_0 | 71.71 | 256 | 357 | 42.428 | 127.17 | 34.56 | 2023/05/30 04:33:26 |
Hippogriff 30B-8_0 | 71.71 | 256 | 357 | 48.153 | 156.24 | 34.56 | 2023/05/31 09:16:01 |
VicUnlocked-Alpaca 65B-8_0 | 71.71 | 256 | 357 | 101.302 | 172.31 | 69.37 | 2023/05/30 00:09:02 |
Llama-2 70B-8_0 | 71.43 | 255 | 357 | 98.502 | 141.85 | 73.23 | 2023/07/23 20:38:41 |
Chronoboros 33B-8_0 | 71.15 | 254 | 357 | 18.975 | 24.04 | 34.56 | 2023/07/10 09:16:27 |
Llama-Supercot 30B-8_0 | 71.15 | 254 | 357 | 36.285 | 93.50 | 34.56 | 2023/05/28 12:22:12 |
Alpaca-Lora 30B-8_0 | 70.59 | 252 | 357 | 31.041 | 62.35 | 34.56 | 2023/06/01 07:50:56 |
Samantha 33B-8_0 | 70.59 | 252 | 357 | 52.000 | 194.18 | 34.56 | 2023/05/29 10:18:08 |
GPT4-Alpaca-Lora-mlp 65B-5_1 | 70.31 | 251 | 357 | 74.840 | 149.92 | 48.97 | 2023/05/20 17:04:49 |
Guanaco 65B-5_1 | 70.31 | 251 | 357 | 84.868 | 186.22 | 48.97 | 2023/05/25 18:58:18 |
SuperPlatty 30B-8_0 | 70.03 | 250 | 357 | 19.865 | 23.56 | 34.56 | 2023/07/03 21:07:50 |
Epsilon 30B-8_0 | 70.03 | 250 | 357 | 46.188 | 167.89 | 34.56 | 2023/07/21 10:52:59 |
Ouroboros 13B-8_0 | 69.47 | 248 | 357 | 9.719 | 73.59 | 13.83 | 2023/07/21 12:24:28 |
Airoboros-Gpt4-1.4 13B-8_0 | 69.47 | 248 | 357 | 11.081 | 18.88 | 13.83 | 2023/06/22 08:32:58 |
Airochronos 33B-8_0 | 69.47 | 248 | 357 | 17.756 | 17.86 | 34.56 | 2023/07/10 22:07:12 |
Airoboros-Gpt4-1.2 13B-8_0 | 68.91 | 246 | 357 | 10.105 | 13.09 | 13.83 | 2023/06/16 13:03:21 |
Minotaur 13B-8_0 | 68.63 | 245 | 357 | 19.676 | 173.38 | 13.83 | 2023/06/08 21:45:25 |
Lazarus 30B-8_0 | 68.63 | 245 | 357 | 45.327 | 148.20 | 34.56 | 2023/06/07 15:58:57 |
WizardLM-Unc-Supercot 30B-8_0 | 68.63 | 245 | 357 | 45.739 | 147.25 | 34.56 | 2023/06/01 11:07:15 |
WizardLM-1.0 13B-8_0 | 68.07 | 243 | 357 | 25.956 | 230.32 | 13.83 | 2023/05/27 16:17:01 |
Bluemethod 13B-8_0 | 67.79 | 242 | 357 | 11.850 | 90.92 | 13.83 | 2023/07/21 15:44:28 |
Vicuna 33B-8_0 | 67.79 | 242 | 357 | 52.619 | 179.24 | 34.56 | 2023/06/30 16:08:40 |
WizardLM-Unc-1.0 13B-8_0 | 66.67 | 238 | 357 | 31.845 | 264.77 | 13.83 | 2023/06/20 07:44:48 |
WizardLM-1.1 13B-8_0 | 66.67 | 238 | 357 | 34.965 | 442.92 | 13.83 | 2023/07/07 16:35:42 |
Nous-Hermes 13B-8_0 | 66.39 | 237 | 357 | 16.097 | 102.92 | 13.83 | 2023/06/03 13:44:45 |
Tulu 30B-8_0 | 66.39 | 237 | 357 | 22.354 | 18.65 | 34.56 | 2023/06/10 21:47:05 |
Mythologic 13B-8_0 | 65.83 | 235 | 357 | 17.505 | 174.59 | 13.83 | 2023/07/17 10:44:11 |
Chronos-Hermes 13B-8_0 | 65.83 | 235 | 357 | 22.245 | 189.51 | 13.83 | 2023/06/13 11:02:08 |
Llama 30B-8_0 | 65.55 | 234 | 357 | 51.478 | 168.71 | 34.56 | 2023/05/20 19:50:17 |
Vicuna-1.3.0 13B-8_0 | 65.27 | 233 | 357 | 34.635 | 314.50 | 13.83 | 2023/06/25 11:15:58 |
Wlzard-Mega 13B-8_0 | 64.99 | 232 | 357 | 20.941 | 172.95 | 13.83 | 2023/05/20 03:50:25 |
Chimera 13B-8_0 | 64.71 | 231 | 357 | 16.359 | 120.17 | 13.83 | 2023/06/03 13:08:37 |
Chronos-WizardLM-Unc-Sc 13B-8_0 | 64.71 | 231 | 357 | 23.355 | 203.90 | 13.83 | 2023/06/07 14:08:04 |
Gpt4-X-Vicuna 13B-8_0 | 64.43 | 230 | 357 | 21.964 | 192.33 | 13.83 | 2023/05/20 05:02:06 |
Wizard-Vicuna-Unc 13B-8_0 | 64.15 | 229 | 357 | 15.259 | 95.08 | 13.83 | 2023/05/20 02:05:09 |
Baize-v2 13B-8_0 | 64.15 | 229 | 357 | 22.704 | 187.64 | 13.83 | 2023/05/24 12:00:06 |
Airoboros-Gpt4-1.4 7B-8_0 | 63.31 | 226 | 357 | 6.674 | 21.17 | 7.16 | 2023/06/22 07:53:28 |
Manticore 13B-8_0 | 63.31 | 226 | 357 | 18.155 | 134.92 | 13.83 | 2023/05/20 14:17:21 |
Based 30B-8_0 | 63.31 | 226 | 357 | 24.734 | 36.87 | 34.56 | 2023/06/03 10:54:07 |
Hypermantis 13B-8_0 | 62.46 | 223 | 357 | 14.078 | 85.19 | 13.83 | 2023/06/03 00:38:54 |
Llama-2 13B-8_0 | 62.18 | 222 | 357 | 15.439 | 114.92 | 13.83 | 2023/07/18 17:36:27 |
Guanaco 7B-8_0 | 62.18 | 222 | 357 | 33.903 | 705.24 | 7.16 | 2023/05/25 20:18:25 |
Airoboros-Gpt4-1.2 7B-8_0 | 61.34 | 219 | 357 | 6.362 | 16.66 | 7.16 | 2023/06/16 12:45:31 |
Godzilla 30B-8_0 | 60.78 | 217 | 357 | 46.349 | 218.28 | 34.56 | 2023/07/09 12:43:22 |
Selfee 13B-8_0 | 59.94 | 214 | 357 | 26.389 | 183.36 | 13.83 | 2023/06/06 14:23:41 |
AlpacaCielo 13B-8_0 | 59.38 | 212 | 357 | 16.166 | 162.98 | 13.83 | 2023/07/24 20:17:42 |
Vigogne-Instruct 13B-8_0 | 59.10 | 211 | 357 | 19.068 | 126.99 | 13.83 | 2023/05/25 21:58:38 |
Wizard-Vicuna 13B-8_0 | 57.70 | 206 | 357 | 15.309 | 101.37 | 13.83 | 2023/05/20 02:44:04 |
WizardLM-Unc-1.0 7B-8_0 | 57.70 | 206 | 357 | 18.160 | 292.69 | 7.16 | 2023/06/18 12:59:11 |
Redmond-Puffin 13B-8_0 | 56.02 | 200 | 357 | 16.205 | 145.26 | 13.83 | 2023/07/19 10:59:10 |
Wizard-Vicuna-Unc 7B-8_0 | 55.46 | 198 | 357 | 8.141 | 81.47 | 7.16 | 2023/05/20 01:07:29 |
Baize-v2 7B-8_0 | 55.46 | 198 | 357 | 12.777 | 182.04 | 7.16 | 2023/05/24 11:38:45 |
UltraLM 13B-8_0 | 54.90 | 196 | 357 | 18.542 | 134.40 | 13.83 | 2023/06/29 21:37:25 |
Orca-mini-v2 7B-8_0 | 54.62 | 195 | 357 | 7.599 | 125.42 | 7.16 | 2023/07/04 08:19:54 |
Luna-AI-Llama2-Unc 7B-8_0 | 54.34 | 194 | 357 | 8.552 | 136.64 | 7.16 | 2023/07/19 21:01:02 |
Koala 13B-8_0 | 54.34 | 194 | 357 | 38.931 | 423.96 | 13.83 | 2023/05/20 05:33:31 |
VicUnlocked-LoRA 30B-8_0 | 54.06 | 193 | 357 | 80.979 | 263.08 | 34.56 | 2023/05/20 22:52:56 |
Llama-2 7B-8_0 | 52.38 | 187 | 357 | 9.036 | 140.79 | 7.16 | 2023/07/18 17:16:54 |
WizardLM 7B-8_0 | 52.38 | 187 | 357 | 13.171 | 193.13 | 7.16 | 2023/05/20 00:19:49 |
Airoboros-Gpt4-l2-1.4.1 7B-8_0 | 52.10 | 186 | 357 | 5.218 | 40.92 | 7.16 | 2023/07/24 11:37:06 |
Open-Llama-Instruct 13B-8_0 | 50.98 | 182 | 357 | 18.954 | 122.75 | 13.83 | 2023/06/20 14:21:08 |
Koala 7B-8_0 | 50.14 | 179 | 357 | 23.667 | 454.51 | 9.76 | 2023/05/20 00:45:54 |
Open-Orca-Preview1 13B-8_0 | 49.58 | 177 | 357 | 8.179 | 29.38 | 13.83 | 2023/07/12 21:32:00 |
GPT4All-Snoozy 13B-8_0 | 45.66 | 163 | 357 | 14.654 | 92.66 | 13.83 | 2023/05/20 03:14:27 |