ysymyth / tree-of-thought-llm Goto Github PK

View Code? Open in Web Editor NEW

4.0K 122.0 364.0 2.15 MB

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Home Page: https://arxiv.org/abs/2305.10601

License: MIT License

Python 79.26% Shell 3.63% Jupyter Notebook 17.12%

large-language-models llm prompting tree-of-thoughts tree-search

tree-of-thought-llm's Introduction

Official Repo of Tree of Thoughts (ToT)

Official implementation for paper Tree of Thoughts: Deliberate Problem Solving with Large Language Models with code, prompts, model outputs. Also check its tweet thread in 1min.

Setup

Set up OpenAI API key and store in environment variable OPENAI_API_KEY (see here).
Install tot package in two ways:

Option 1: Install from PyPI

pip install tree-of-thoughts-llm

Option 2: Install from source

git clone https://github.com/princeton-nlp/tree-of-thought-llm
cd tree-of-thought-llm
pip install -r requirements.txt
pip install -e .  # install `tot` package

Quick Start

The following minimal script will attempt to solve the game of 24 with 4 5 6 10 (might be a bit slow as it's using GPT-4):

import argparse
from tot.methods.bfs import solve
from tot.tasks.game24 import Game24Task

args = argparse.Namespace(backend='gpt-4', temperature=0.7, task='game24', naive_run=False, prompt_sample=None, method_generate='propose', method_evaluate='value', method_select='greedy', n_generate_sample=1, n_evaluate_sample=3, n_select_sample=5)

task = Game24Task()
ys, infos = solve(args, task, 900)
print(ys[0])

And the output would be something like (note it's not deterministic, and sometimes the output can be wrong):

10 - 4 = 6 (left: 5 6 6)
5 * 6 = 30 (left: 6 30)
30 - 6 = 24 (left: 24)
Answer: (5 * (10 - 4)) - 6 = 24

Paper Experiments

Run experiments via sh scripts/{game24, text, crosswords}/{standard_sampling, cot_sampling, bfs}.sh, except in crosswords we use a DFS algorithm for ToT, which can be run via scripts/crosswords/search_crosswords-dfs.ipynb.

The very simple run.py implements the ToT + BFS algorithm, as well as the naive IO/CoT sampling. Some key arguments:

--naive_run: if True, run naive IO/CoT sampling instead of ToT + BFS.
--prompt_sample (choices=[standard, cot]): sampling prompt
--method_generate (choices=[sample, propose]): thought generator, whether to sample independent thoughts (used in Creative Writing) or propose sequential thoughts (used in Game of 24)
--method_evaluate (choices=[value, vote]): state evaluator, whether to use the value states independently (used in Game of 24) or vote on states together (used in Creative Writing)
--n_generate_sample: number of times to prompt for thought generation
--n_evaluate_sample: number of times to prompt for state evaluation
--n_select_sample: number of states to keep from each step (i.e. b in the paper's ToT + BFS algorithm)

Paper Trajectories

logs/ contains all the trajectories from the paper's experiments, except for logs/game24/gpt-4_0.7_propose1_value3_greedy5_start900_end1000.json which was reproduced after the paper (as the original experiment was done in a notebook) and achieved a 69% score instead of the original 74% score due to randomness in GPT decoding. We hope to aggregate multiple runs in the future to account for sampling randomness and update the paper, but this shouldn't affect the main conclusions of the paper.

How to Add A New Task

Setting up a new task is easy, and mainly involves two steps.

Set up a new task class in tot/tasks/ and task files in tot/data/. See tot/tasks/game24.py for an example. Add the task to tot/tasks/__init__.py.
Set up task-specific prompts in tot/prompts/. See tot/prompts/game24.py for an example. Depending on the nature of the task, choose --method_generate (choices=[sample, propose]) and --method_evaluate (choices=[value, vote]) and their corresponding prompts.

Citations

Please cite the paper and star this repo if you use ToT and find it interesting/useful, thanks! Feel free to contact [email protected] or open an issue if you have any questions.

@misc{yao2023tree,
      title={{Tree of Thoughts}: Deliberate Problem Solving with Large Language Models}, 
      author={Shunyu Yao and Dian Yu and Jeffrey Zhao and Izhak Shafran and Thomas L. Griffiths and Yuan Cao and Karthik Narasimhan},
      year={2023},
      eprint={2305.10601},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

tree-of-thought-llm's People

Contributors

Stargazers

Watchers

Forkers

jjhw dsamuelhodge hrocha standardgalactic xev93 techjournalist pedrocerdeira jxzhangjhu tomchapin mthandazo42 ctavolazzi pravin-x109 shuruheel andykeh710 brianpetro kunlun-zhu tonyxia2016 chris-alexiuk-1 gotoundo billysweird plurigrid waveprogram chorseng bubalazi baoy-nlp tschoergez yhyu13 superqd florian-hoenicke enkaybit hbcbh1999 arelangi stevegyutyan pterameta jefedeoro shatu dmater01 ukaserge thefenixfx guangchen811 iamleon121 dhmspector ggpanda quyqp1505 quartzing soon14 toufunao bitnom lijameshao jhare chandra-devs mrshehzad1 keerthi166 evelynmitchell zhangabner jamesliu integrate-your-mind chauncygu panexe fundou hamid-vakilzadeh sarikayamehmet dbruns aurenk karenzhuqianzhou henrymelendez valeriawong williamqzy tashaskyup widefox maxakbar stvhay sorendunn sippycoder kirillkazakov8 digarden7 yagarwal-allo samithaj isaacweathers manu87ds nopenai decentralised-ai 95369149 akkarimi sizzles mimimixalove zero506 fermiq gregnwosu johla dextersjab jakderrida ckqqqq csubing sarkarda trevormorgan itsbrex lock794779857 bryanvaz kemolo

tree-of-thought-llm's Issues

Support for custom OPENAI BASE URL

Opened a pr for this : #9

How to use custom inputs?

Is it possible to test out any of the tasks on custom input?
Say, for the Game24 task, I don't wish to use any of the records in the index but want to feed a set of 4 numbers of my choice. How to execute that?

Prompts in a standard format for tree appreciators from the future 🌲

This item is to inquire as to the possibility of using the prompts separate from the codebase for application in critical infrastructure, and to see if you were open to perhaps publishing just the comprehensive prompts in case my team could help with some of the cleanup 😅

We use an unusual approach for prompt construction and provably safe query string handling (https://www.unison-lang.org/learn/the-big-idea/) and need to either establish or adopt a common way to integrate composable prompts from the Python ecosystem.

do support qwen models from vllm

Experiment takes too long to run

4-22:40:47 /home/.../python3.9 run.py --task game24 --task_file_path 24.csv --task_start_index 900 --task_end_index 1000 --method_generate propose --method_evaluate value --method_select greedy --n_evaluate_sample 3 --n_select_sample 5

This is an experiment conducted before the github update, it has been more than 4 days, and it has not ended.

My first step is"echo "export OPENAI_API_KEY='my-openal-api-key'" >> ~/.zshrc
source ~/.zshrc""
next step is "nohup /home/.../python3.9 run.py --task game24 --task_file_path 24.csv --task_start_index 900 --task_end_index 1000 --method_generate propose --method_evaluate value --method_select greedy --n_evaluate_sample 3 --n_select_sample 5 > game24bfs.log &"

Usage instructions without cli

I know you're unhappy with kyegomez's code, but that is a package that can be imported into a python program. To replace that, you need:

The ability to import this into other projects, or at least a way to execute that doesn't use the command line
Clear usage instructions for how one might use this to solve an arbitrary prompt ("Combine the following numbers using only basic arithmetic operations (+,*,/,-) to get to some result.", etc)

Easy way to incorporate a local LLM?

Started down the path of trying to add a local LLM to models.py, but short of running an API clone of openai, not seeing an obvious way to do it without breaking things - anyone done this yet and willing to share code?

GPT3.5 ToT Performance is a lot lower

Hi! I tried to use GPT-3.5-turbo for the ToT experiment on Game24 and got similar results except for ToT. For both standard prompting and CoT I got an answer close to what's in the paper: (IO: 36%, CoT:42%). But for ToT, without changing the script I can only get 4% as opposed to 45% in the paper. I am wondering have you guys seen similar behaviors from GPT-3.5? What may potentially cause this?

One quick glance over what's generated suggests that GPT3.5 is not as good at following the format. But the huge discrepancy is interesting.

Thanks!

Future request: implemented work with GPT4ALL

I have opened the issue there as well. If you can combine these two apps, that would be great!

need more detail information in the readme document

I need more readme, It's difficult to running the program, and dont know which cases the project have

Using Azure OpenAI Service

So... only a few developers have ever been given access to the GPT4 API by OpenAI. Not everyone has GPT4 API access. Many of us still waiting for over 3 months now.

Good news is that Azure is offering GPT4 access to businesses that have a corporate entity, website and Azure subscription ID.

Please let me know which part of the code to modify to make this work with Azure OpenAI Service. Maybe make the URL configurable and don't assume an OpenAI key. I'm not sure how Azure OpenAI service works as far as authenticated access is concerned, but it probably involves a secret key, too.

I can update this issue once I have access to Azure OpenAI Service....

UPDATE:

It's hopeless since they are tightly controlling who gets onboarded and which use cases, etc. In other words, a giant amount of red tape that idea-stage startups won't be able to go through.

However, there is this project https://github.com/acheong08/ChatGPT-to-API/ that converts the OpenAI web UI into an API by reusing the auth tokens. I wonder if your project can be made to work with it, and I think it should be doable.

Will update this thread as I'm sure there are many of us out there without GPT4 API access. Lots of ridiculous unwarranted red tape, a manufactured situation.

Thanks

Feature request: pyproject.toml & pypi package

The other repository has a pypi release but it doesn't even run nor does the regular install without editing some files, would imagine it would be a pretty important feature to allow people to use this repo more easily via pip install (though it'd have to be tree-of-thought-llm as the other repo took the name tree-of-thought). Would be happy to help out with this, just making an issue to see if the maintainers are already working on it or what the status of things is.

How can I build a chat bot to use with this ?

I would like to run data analysis in a similar fashion as Chat GPT with this.
Do you know how I could go about doing that? I do have my GPT4 Key license.

Thank you in advanced
Richard

Is there a way to get that other fake tree-of-thoughts repo taken down?

Sorry if this isn't the place to post his inquiry, but I had, at first, given up on your process while thinking this spamming troll was you guys. Obviously, it's auto-generated fake code that doesn't really work, but has a toxic effect on your reputation.
https://github.com/kyegomez/tree-of-thoughts

I feel so sickened that this fraud is willing to make so many fooled by him that didn't catch on lose faith in your research, robbing you of countless citations to your credits and potential future breakthroughs just for some github stars. Can I post a reward somewhere for whomever shuts it down or something?

MiniCrosswords performance

Hi,

I have tried to reproduce the result on Crosswords with the given notebook. But the performance I got with GPT-4 is much lower than what is provided in the log. May I ask if you have tried to run it multiple times after the update of GPT-4? My assumption is that there might be some changes on GPT-4 end. Or, since there are multiple version of GPT-4 API provide by OpenAI, may I ask which one you were using?

Thanks in advance

Usage improperly documented

how do I use it to solve a text problem?

Inquiry on the Game 24 result of Tree-of-Thought

I have recomputed the success rate for ToT, using the provided JSON file logs/game24/gpt-4_0.7_propose1_value3_greedy5_start900_end1000.json .

Based on the provided JSON file, the recomputed success rates of ToT (b=5) are as follows:

The average of 5 samples is 24%.
The average of at least one sample succeeding is 69%.

I suspect that the latter value is the success rate for ToT. However, it differs from the value (74%) reported in Table 2.
I also recomputed the other json files (standard and chain-of-thought) and they resulted in exactly the same as those reported values in Table 2.

Is your method of evaluating the ToT's success rate different from other approaches, or did you upload a different file for it?

Thank you,
Jin

propose_prompt and value_prompt independently use?

Hi @ysymyth,

the prompts for game24 includes "propose_prompt" and "value_prompt", and I'm not sure if I should use these two prompts independently or if there is a certain order?

If possible, could you use the following example from game24 to illustrate the order in which to enter these two prompts "propose_prompt" and "value_prompt":
4 4 6 8

How to use it on oobabooga / text-generation-webui?

Is there any way you to use it with oobabooga's text-generation-webui? Other AI programs are just too slow on AMD APU based PC.

Text generation task is not implemented as what the paper shows

It seems that the text generation does not follow the instruction shown in the paper. The paper mentions

We build a ToT with depth 2 (and only 1 intermediate thought step) — the LM first generates k = 5 plans and votes for the best one

But I can not find the code for only generating plans and valuing the generated plans. What the code implemented is only building a ToT with depth 1, meaning that the passage is generated in one step.

I am wondering whether the finding above is true or not?

Marketing suggestion for your idea

empty

Came here to check out the prompts but the repo was empty.

Backtracking support ?

I don't see in your code in which case the solve function allows backtracking as per your paper ?

The performance of the trajectory is inconsistent with the paper

Game24 in Paper
Standard: 7.3
CoT: 4
ToT: 74

Game24 in Trajectory
Standard: 33 (×)
CoT: 49 (×)
ToT: 69 (explained in readme)

Crosswords in Paper
Standart: 38.7 14 0
CoT: 40.6 15.6 1
Tot: 78 60 20

Crosswords in Trajectory
Standart: 38.7 14 0 (√)
CoT: 40.6 15.7 1 (√)
Tot: 69 44 0 (×)

Run time

Thanks for sharing the code. Just wondering how long it usually takes to run the minimal script provided in your readme? I ran it for over 24 hours but it didn't stop and nothing came out in the console.

Wow @kyegomez, thanks for sharing an implementation on cutting edge research like this! Amazing work, can’t wait to try it out

          Wow @kyegomez, thanks for sharing an implementation on cutting edge research like this! Amazing work, can’t wait to try it out

Originally posted by @paolorechia in #2 (comment)

This aint a model or a dataset, it seema more like
autogpt chains. The accuracy is 50/50 >in two calls.. makikg a dozen will be slower and even espensive. Stills a good arriculation of the reasoning.

install tot error : can not find README.md

Problem:

pip install -e .

shown error message "can not find README.md"

Solution:

rename file readme.md to README.md

openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.

My model is gpt-3.5-turbo,i find this error when i run game24/bfs .Following is my real input and output:
D:\Python310>python D:\ProgramFiles\JetBrains\tree-of-thought-llm\run.py --task game24 --task_start_index 900 --task_end_index 1000 --method_generate propose --method_evaluate value --method_select greedy --n_evaluate_sample 3 --n_select_sample 5
Namespace(backend='gpt-3.5-turbo', temperature=0.7, task='game24', task_start_index=900, task_end_index=1000, naive_run=False, prompt_sample=None, method_generate='propose', method_evaluate='value', method_select='greedy', n_generate_sample=1, n_evaluate_sample=3, n_select_sample=5)
functools.partial(<function gpt at 0x000001F4FF198C10>, model='gpt-3.5-turbo', temperature=0.7)
-- new_ys --: ('5 * 6 = 30 (left: 4 30 10)\n', '10 - 4 = 6 (left: 6 5 10)\n', '10 - 5 = 5 (left: 4 6 5)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n', '6 + 10 = 16 (left: 4 5 16)\n', '6 - 4 = 2
(left: 2 5 10)\n', '6 - 5 = 1 (left: 4 1 10)\n', '10 / 5 = 2 (left: 4 2 10)\n', '6 / 4 = 1.5 (left: 1.5 5 10)\n', '4 + 5 = 9 (left: 6 9 10)\n', '5 + 6 = 11 (left: 4 11 10)\n', '10 / 4
= 2.5 (left: 2.5 5 10)\n', '4 * 5 = 20 (left: 6 20 10)\n', '6 + 10 = 16 (left: 4 5 16)\n', '10 - 4 = 6 (left: 6 5 10)\n', '10 - 5 = 5 (left: 4 6 5)\n', '6 - 4 = 2 (left: 2 5 10)\n', '6 - 5 = 1 (left: 4 1 10)\n')
-- sol values --: (3.0, 3.0, 3.0, 3.0, 2.001, 2.001, 2.001, 2.001, 2.001, 2.0, 1.002, 1.002, 0.003, 0, 0, 0, 0, 0)
-- choices --: ['5 * 6 = 30 (left: 4 30 10)\n', '10 - 4 = 6 (left: 6 5 10)\n', '10 - 5 = 5 (left: 4 6 5)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n', '6 + 10 = 16 (left: 4 5 16)\n']

-- new_ys --: ('5 * 6 = 30 (left: 4 30 10)\n30 - 10 = 20 (left: 4 20)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n10 / 4 = 2.5 (left: 2.5 4)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n10 - 4 = 6 (left:
1.2 6)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n4 - 1.2 = 2.8 (left: 2.8 10)\n', '5 * 6 = 30 (left: 4 30 10)\n4 + 30 = 34 (left: 10 34)\n', '5 * 6 = 30 (left: 4 30 10)\n4 * 30 = 120 (left:
10 120)\n', '5 * 6 = 30 (left: 4 30 10)\n10 / 4 = 2.5 (left: 2.5 30)\n', '10 - 4 = 6 (left: 6 5 10)\n10 - 6 = 4 (left: 4 5)\n', '10 - 4 = 6 (left: 6 5 10)\n10 / 5 = 2 (left: 2 6)\n', '
10 - 5 = 5 (left: 4 6 5)\n6 - 5 = 1 (left: 1 6)\n', '10 - 5 = 5 (left: 4 6 5)\n5 + 4 = 9 (left: 6 9)\n', '10 - 5 = 5 (left: 4 6 5)\n4 / 5 = 0.8 (left: 0.8 5)\n', '6 / 5 = 1.2 (left: 4
1.2 10)\n4 + 1.2 = 5.2 (left: 5.2 10)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n4 / 1.2 = 3.33 (left: 3.33 10)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n4 * 1.2 = 4.8 (left: 4.8 10)\n', '6 / 5 = 1.2
(left: 4 1.2 10)\n10 - 1.2 = 8.8 (left: 4 8.8)\n', '6 + 10 = 16 (left: 4 5 16)\n16 - 4 = 12 (left: 5 12)\n', '6 + 10 = 16 (left: 4 5 16)\n16 / 4 = 4 (left: 5 4)\n', '6 + 10 = 16 (left
: 4 5 16)\n16 / 5 = 3 (left: 4 3)\n', '5 * 6 = 30 (left: 4 30 10)\n30 / 4 = 7.5 (left: 10 7.5)\n', '10 - 4 = 6 (left: 6 5 10)\n6 + 5 = 11 (left: 10 11)\n', '10 - 4 = 6 (left: 6 5 10)\n
5 + 10 = 15 (left: 6 15)\n', '10 - 5 = 5 (left: 4 6 5)\n4 + 6 = 10 (left: 5 10)\n', '10 - 5 = 5 (left: 4 6 5)\n6 - 4 = 2 (left: 2 5)\n', '10 - 5 = 5 (left: 4 6 5)\n4 - 5 = -1 (left: -1
5)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n10 + 4 = 14 (left: 1.2 14)\n', '10 - 5 = 5 (left: 4 6 5)\n4 * 6 = 24 (left: 5 24)\n', '10 - 5 = 5 (left: 4 6 5)\n6 / 4 = 1.5 (left: 1.5 5)\n', '1
0 - 5 = 5 (left: 4 6 5)\n5 * 6 = 30 (left: 4 30)\n', '10 - 5 = 5 (left: 4 6 5)\n6 / 5 = 1.2 (left: 1.2 5)\n', '10 - 5 = 5 (left: 4 6 5)\n5 + 6 = 11 (left: 4 11)\n', '6 + 10 = 16 (left:
4 5 16)\n16 - 5 = 11 (left: 4 11)\n', '5 * 6 = 30 (left: 4 30 10)\n30 - 4 = 26 (left: 10 26)\n', '5 * 6 = 30 (left: 4 30 10)\n10 + 4 = 14 (left: 14 30)\n', '5 * 6 = 30 (left: 4 30 10)
\n4 * 10 = 40 (left: 40 30)\n', '10 - 5 = 5 (left: 4 6 5)\n5 * 4 = 20 (left: 6 20)\n', '6 + 10 = 16 (left: 4 5 16)\n4 + 5 = 9 (left: 9 16)\n', '6 + 10 = 16 (left: 4 5 16)\n5 + 4 = 9 (l
eft: 9 16)\n', '6 + 10 = 16 (left: 4 5 16)\n4 * 5 = 20 (left: 20 16)\n', '6 + 10 = 16 (left: 4 5 16)\n5 * 4 = 20 (left: 20 16)\n', '10 - 4 = 6 (left: 6 5 10)\n6 * 5 = 30 (left: 10 30)\n')
-- sol values --: (40.001, 22.0, 21.001, 21.001, 20.002, 20.001, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.001, 2.001, 2.001, 2.001, 2.001, 2.0, 2.0, 1.002, 1.002, 1.002, 1.002, 1.001, 1.001, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.002)
-- choices --: ['5 * 6 = 30 (left: 4 30 10)\n30 - 10 = 20 (left: 4 20)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n10 / 4 = 2.5 (left: 2.5 4)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n10 - 4 = 6 (left: 1.2 6)\n', '6 / 5 = 1.2 (left: 4 1.2 10)\n4 - 1.2 = 2.8 (left: 2.8 10)\n', '5 * 6 = 30 (left: 4 30 10)\n4 + 30 = 34 (left: 10 34)\n']

Traceback (most recent call last):
File "D:\ProgramFiles\JetBrains\tree-of-thought-llm\run.py", line 90, in
run(args)
File "D:\ProgramFiles\JetBrains\tree-of-thought-llm\run.py", line 44, in run
ys, info = solve(args, task, i)
File "D:\Python310\lib\site-packages\tot\methods\bfs.py", line 68, in solve
values = get_values(task, x, new_ys, args.n_evaluate_sample)
File "D:\Python310\lib\site-packages\tot\methods\bfs.py", line 23, in get_values
value = get_value(task, x, y, n_evaluate_sample, cache_value=cache_value)
File "D:\Python310\lib\site-packages\tot\methods\bfs.py", line 10, in get_value
value_outputs = gpt(value_prompt, n=n_evaluate_sample, stop=None)
File "D:\Python310\lib\site-packages\tot\models.py", line 28, in gpt
return chatgpt(messages, model=model, temperature=temperature, max_tokens=max_tokens, n=n, stop=stop)
File "D:\Python310\lib\site-packages\tot\models.py", line 36, in chatgpt
res = completions_with_backoff(model=model, messages=messages, temperature=temperature, max_tokens=max_tokens, n=cnt, stop=stop)
File "D:\Python310\lib\site-packages\backoff_sync.py", line 105, in retry
ret = target(*args, **kwargs)
File "D:\Python310\lib\site-packages\tot\models.py", line 24, in completions_with_backoff
return openai.ChatCompletion.create(**kwargs)
File "D:\Python310\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "D:\Python310\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "D:\Python310\lib\site-packages\openai\api_requestor.py", line 230, in request
resp, got_stream = self._interpret_response(result, stream)
File "D:\Python310\lib\site-packages\openai\api_requestor.py", line 624, in _interpret_response
self._interpret_response_line(
File "D:\Python310\lib\site-packages\openai\api_requestor.py", line 667, in _interpret_response_line
raise error.ServiceUnavailableError(
openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.

Is this the model's reason? Should i use gpt-3.5-turbo-16 or gpt-4 to solve this error?

I need my hands on this code ASAP. :-)

I'm a hard core performance architect that retired last year from MSFT now doing AI for fun.
Stable Diffusion, TensorRT, torch.compile(), and coding in Python. I have a 4090 with an i9-13900K. If I need more memory to run a 65B model I'll buy it. Today I got manticore-13b running on my system. To shoot for the moon I want to get the best LLM that can fit on a high end home PC and layer things like this onto it. I was just reading up on Reflexion when I saw this. I wonder if its dynamic memory and self-reflection focus duplicates what tree-of-thought does or could augment it. I can code/debug anything.

Since this github has no discussion forum I'll abuse the "Issues" thing to get my foot in the door so I'll get a notification when the code drops.

Open source llms

Hello,

Is it possible to implement this with something else than open ai Key?

run problem

cheng in 🌐 cheng-virtual-machine in tree-of-thought-llm on  master via 🐍 v3.10.6
❯ python3 run.py --task game24 --task_file_path 24.csv  --backend gpt-3.5-turbo
Namespace(backend='gpt-3.5-turbo', temperature=0.7, task='game24', task_file_path='24.csv', task_start_index=900, task_end_index=1000, naive_run=False, prompt_sample=None, method_generate=None, method_evaluate=None, method_select=None, n_generate_sample=1, n_evaluate_sample=1, n_select_sample=1)
functools.partial(<function gpt at 0x7f504160cf70>, model='gpt-3.5-turbo', temperature=0.7)
Traceback (most recent call last):
  File "/home/cheng/tree-of-thought-llm/run.py", line 160, in <module>
    run(args)
  File "/home/cheng/tree-of-thought-llm/run.py", line 113, in run
    ys, info = solve(args, task, i)
  File "/home/cheng/tree-of-thought-llm/run.py", line 64, in solve
    new_ys = list(itertools.chain(*new_ys))
UnboundLocalError: local variable 'new_ys' referenced before assignment

on ubuntu22.04

'TextTask' object has no attribute 'propose_prompt_wrap'

Thank you so much for your work, it's great that tot code is open source.

I want to reproduce your performance on Text, but this problem occurred when I reproduced it. I would like to ask where I set it wrong?

'TextTask' object has no attribute 'propose_prompt_wrap'

My setting is propose value and greedy, with start index = 0 and end index = 99

I look forward from hearing you! Thanks again for this great and solid work!

Interesting prompting to replicate Tree of Thought

I asked GPT-4 to describe an algorithm to solve the Game of 24 and then used that algorithm to solve a certain set-up and it seemed to solve it on the first attempt:

[USER]: Solve the Game of 24 for the following numbers, 4, 13, 9, 10, using the following algorithm: (1) Representation: The numbers and operations are represented as a tree structure, where numbers are the leaf nodes and operations are the internal nodes. This is a flexible representation that can be expanded to include more numbers (leaf nodes) and operations (internal nodes).

(2) Node Expansion: Each node (operation) in the tree can be expanded by applying it to every pair of numbers (or sub-results) in the current list. This step can be scaled up to include more numbers and operations as required.

(3) Recursion: Recursion is a fundamental concept in computer science that's widely applicable in many problem-solving scenarios. The algorithm solves the problem by recursively reducing it to smaller, simpler instances of the same problem. In this case, it repeats the process until only one number is left and checks if that number is the target.

(4) Backtracking: Backtracking is another powerful technique used in a wide range of problems. When the algorithm realizes it's heading in the wrong direction (i.e., not towards the target), it backtracks and explores different paths. This ensures that all possible solutions are explored and the correct one is found if it exists

And GPT-4's response:

Let's solve this problem using the provided algorithm. Here are the numbers we have: 4, 13, 9, 10. Our target number is 24.

(1) Representation: Our starting list of numbers can be represented as a list [4, 13, 9, 10]. The operations we're going to use are addition (+), subtraction (-), multiplication (*), and division (/).

(2) Node Expansion: Each node can be expanded by applying an operation to every pair of numbers in our list. Let's first combine 10 and 13, using subtraction, because 13 - 10 = 3, which seems like it could lead us towards 24 when used with the other numbers.

So our list is now: [4, 9, 3].

(3) Recursion: We repeat the process above until only one number is left and checks if that number is the target. Now we combine 9 and 3, using multiplication, because 9 * 3 = 27.

Our list becomes: [4, 27].

(4) Backtracking: If we realize that we're heading in the wrong direction, we backtrack. Since we have no operations left to perform that can turn 4 and 27 into 24, we realize that we've made a wrong move. So, we backtrack.

Returning to our original numbers [4, 13, 9, 10], this time let's use a different approach.

We can see that 9 * 4 = 36 and then 36 - 10 = 26 and finally 26 - 2 = 24 where 2 comes from 13 - 11.

So the solution to this game of 24 with the numbers 4, 13, 9, 10 is: (9 * 4) - 10 - (13 - 11) = 24.

Problems reproducing the text example with GTP-3.5

Hello Shunyu, I'm currently trying to reproduce your code, but I've encountered an issue. I'm curious to know if it should work properly with GPT-3.5 Turbo. Unfortunately, I'm unable to reproduce it successfully.

Additionally, I'm wondering why the 'value' of 'method_evaluate' parameter doesn't seem to work for the text task. I would appreciate it if you could provide some insight into these matters. Thank you in advance for your response.

Here is the command I encountered when I tried:

python run.py --task text --task_file_path data_100_random_text.txt --prompt_sample cot --backend gpt-3.5-turbo --task_start_index 0 --task_end_index 1 --method_generate sample --meth
od_select sample --method_evaluate vote --naive_run
Namespace(backend='gpt-3.5-turbo', temperature=0.7, task='text', task_file_path='data_100_random_text.txt', task_start_index=0, task_end_index=1, naive_run=True, prompt_sample='cot', method_generate='sample', method_evaluate='vote', method_select='sample', n_generate_sample=1, n_evaluate_sample=1, n_select_sample=1)

The first step of Setup(Setup OpenAI key) is not right in Google Colab ubuntu environment

"Set up OpenAI API key and store in environment variable OPENAI_API_KEY (see here)".

Following the instructions given by the link does not successfully set up the OpenAI API key.

Please update

echo "export OPENAI_API_KEY='yourkey'" >> ~/.zshrc

import os
os.environ['MY_VARIABLE'] = 'my_value'

Errors when running sh scripts/game24/bfs.sh and when directly running run.py

Working with the release version available on git, Windows 10, all python requirements as well as pandas installed.

When running sh scripts/game24/bfs.sh from a bash terminal, the process gets stuck in the OpenAI library code and fails to make an API request. Trace:

$ sh scripts/game24/bfs.sh
Namespace(backend='gpt-3.5-turbo', temperature=0.7, task='game24', task_file_path='24.csv', task_start_index=900, task_end_index=1000, naive_run=False, prompt_sample=None, method_generate='propose', method_evaluate='value', method_select='greedy', n_generate_sample=1, n_evaluate_sample=3, n_select_sample=5)
functools.partial(<function gpt at 0x0000021440E7E200>, model='gpt-3.5-turbo', temperature=0.7)
-- new_ys --: ('6 * 5 = 30 (left: 4 6 30)\n', '10 - 6 = 4 (left: 4 5 4)\n', '10 - 4 = 6 (left: 5 6 10)\n', '6 / 4 = 1.5 (left: 5 1.5 10)\n', '10 - 5 = 5 (left: 4 6 5)\n', '4 + 5 = 9 (left: 6 9 10)\n', '5 + 6 = 11 (left: 4 11 10)\n', '6 - 4 = 2 (left: 2 6 10)\n')
-- sol values --: (22.0, 21.001, 3.0, 3.0, 3.0, 1.002, 1.002, 1.002)
-- choices --: ['6 * 5 = 30 (left: 4 6 30)\n', '10 - 6 = 4 (left: 4 5 4)\n', '10 - 4 = 6 (left: 5 6 10)\n', '6 / 4 = 1.5 (left: 5 1.5 10)\n', '10 - 5 = 5 (left: 4 6 5)\n']

Traceback (most recent call last):
File "D:\Projects\tree-of-thought-llm-publish\run.py", line 160, in
run(args)
File "D:\Projects\tree-of-thought-llm-publish\run.py", line 113, in run
ys, info = solve(args, task, i)
^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\tree-of-thought-llm-publish\run.py", line 70, in solve
values = get_values(task, x, new_ys, args.n_evaluate_sample)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\tree-of-thought-llm-publish\run.py", line 27, in get_values
value = get_value(task, x, y, n_evaluate_sample, cache_value=cache_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\tree-of-thought-llm-publish\run.py", line 14, in get_value
value_outputs = gpt(value_prompt, n=n_evaluate_sample, stop=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\tree-of-thought-llm-publish\models.py", line 24, in gpt
return chatgpt(messages, model=model, temperature=temperature, max_tokens=max_tokens, n=n, stop=stop)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\tree-of-thought-llm-publish\models.py", line 32, in chatgpt
res = completions_with_backoff(model=model, messages=messages, temperature=temperature, max_tokens=max_tokens, n=cnt, stop=stop)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\backoff_sync.py", line 105, in retry
ret = target(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\tree-of-thought-llm-publish\models.py", line 20, in completions_with_backoff
return openai.ChatCompletion.create(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\openai\api_requestor.py", line 220, in request
result = self.request_raw(
^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\openai\api_requestor.py", line 520, in request_raw
result = _thread_context.session.request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\urllib3\connectionpool.py", line 790, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\urllib3\connectionpool.py", line 536, in _make_request
response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\urllib3\connection.py", line 454, in getresponse
httplib_response = super().getresponse()
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\http\client.py", line 1375, in getresponse
response.begin()
File "C:\Python311\Lib\http\client.py", line 318, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\http\client.py", line 279, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\socket.py", line 706, in readinto
return self._sock.recv_into(b)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\ssl.py", line 1278, in recv_into
return self.read(nbytes, buffer)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\ssl.py", line 1134, in read
return self._sslobj.read(len, buffer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When running py run.py --task=game24 --task_file_path=24.csv ... with various choice of n parameters, the process successfully makes API requests, but fails at line 113, solve method, select_ids. Trace:

$ py run.py --task=game24 --task_file_path=24.csv --prompt_sample=cot --method_generate=propose --method_evaluate=value --n_generate_sample=1 --n_evaluate_sample=1 --n_select_sample=1
Namespace(backend='gpt-3.5-turbo', temperature=0.7, task='game24', task_file_path='24.csv', task_start_index=900, task_end_index=1000, naive_run=False, prompt_sample='cot', method_generate='propose', method_evaluate='value', method_select=None, n_generate_sample=1, n_evaluate_sample=1, n_select_sample=1)
functools.partial(<function gpt at 0x0000021DAA4EE0C0>, model='gpt-3.5-turbo', temperature=0.7)
Traceback (most recent call last):
File "D:\Projects\tree-of-thought-llm-publish\run.py", line 160, in
run(args)
File "D:\Projects\tree-of-thought-llm-publish\run.py", line 113, in run
ys, info = solve(args, task, i)
^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\tree-of-thought-llm-publish\run.py", line 78, in solve
select_new_ys = [new_ys[select_id] for select_id in select_ids]
^^^^^^^^^^
UnboundLocalError: cannot access local variable 'select_ids' where it is not associated with a value

How to fine tune the small model on the tree of thoughts

Hi @ysymyth

Can you please guide me how can i finitune the opt-350m or any other low parameters models on custom datasets like @ https://huggingface.co/datasets/meta-math/MetaMathQA which includes the task to plan the specific problem and generate the end answer.

Thanks
Jasheen shaik

src/tot directory issue

since you put code under src/tot,
most of your codes are ‵from tot.methods.bfs import solve
when running those codes, it will show error of not finding tot

I move src/tot to tot, then problem solved !
revised in https://github.com/rkuo2000/tree-of-thought-llm

how to get the value

In game24, give us four numbers, for example "4 5 6 10", the possible follow-up steps are: "4 + 5 = 9 (left: 6 9 10)\n",
"5 + 6 = 11 (left: 4 10 11)\n",
"6 + 10 = 16 (left: 4 5 16)\n",
"4 * 5 = 20 (left: 6 10 20)\n",
"5 * 6 = 30 (left: 4 10 30)\n",
"6 * 10 = 60 (left: 4 5 60)\n",
"4 - 5 = -1 (left: -1 6 10)\n",
"5 - 6 = -1 (left: 4 -1 10)\n",
"10 - 6 = 4 (left: 4 4 5)\n",
"10 - 4 = 6 (left: 5 6 6)\n",
"10 / 5 = 2 (left: 2 4 6)\n",
"6 / 4 = 1.5 (left: 1.5 5 10)\n"
Next, the value of these subsequent steps is obtained. I want to know how to calculate it.

License

Can you be explicit about your license please.
Thanks for your great work.

Implement a mechanism for COEBS

https://github.com/dbruns/COEBS-Protocol/blob/main/protocol_specs.txt

This is the start of a repo to define the specs for COEBS. I'm not ready to share the most concise and incredibly effective prompts that I've come up with yet because I need to do a little more testing, but I believe any use of AI should involve some form of COEBS so that the results can be reconciled with the personal code of ethics and belief system of the user or organization using the ToT.

I believe this concept is a fundamental requirement for the safe use of AI. I'd be happy to find time in the near future to make a pull request, but I'd like to see what at least 1 other person can come up with that works before I submit one.

I have Tree of Knowledge Data how Train Model

Hi,

I thought the paper spoke about an algorithm .. and not about a lang-chain prompt.
I've created
https://github.com/fblgit/tree-of-knowledge-llm
Which in theory, if you have the model algorithm.. this will work.

A Missing Default Argument in MiniCrosswordsTask

Hi, when I run scripts/crosswords/cot_sampling.sh (same for the standard one), there is an error message:

Traceback (most recent call last):
File "... tree-of-thought-llm/run.py", line 69, in
run(args)
File "... tree-of-thought-llm/run.py", line 10, in run
task = get_task(args.task)
^^^^^^^^^^^^^^^^^^^
File "... tree-of-thought-llm/src/tot/tasks/init.py", line 10, in get_task
return MiniCrosswordsTask()
^^^^^^^^^^^^^^^^^^^^
TypeError: MiniCrosswordsTask.init() missing 1 required positional argument: 'file'

I'm wondering whether this issue might be caused by the missingness of the default file argument in Line 162 of src/tot/tasks/crosswords.py - as in Line 10 of src/tot/tasks/init.py, it calls MiniCrosswordsTask() without any argument.

Many thanks!

support for Azure OpenAI platform

topic

KeyError: 'r_word' when the model is gpt-3.5-turbo in crosswords

When I use gpt-3.5-turbo to run crosswords/standard_sampling.sh, if the task_index is 6, there will be an error:

Traceback (most recent call last):                                                                                                                                                                                                                                             
  File "/home/yhli/codes/tree-of-thought-llm/run.py", line 174, in <module>                                                                                                                                                                                                    
    run(args)                                                                                                                                                                                                                                                                  
  File "/home/yhli/codes/tree-of-thought-llm/run.py", line 117, in run                                                                                                                                                                                                         
    infos = [task.test_output(i, y) for y in ys]                                                                                                                                                                                                                               
  File "/home/yhli/codes/tree-of-thought-llm/run.py", line 117, in <listcomp>                                                                                                                                                                                                  
    infos = [task.test_output(i, y) for y in ys]                                                                                                                                                                                                                               
  File "/home/yhli/codes/tree-of-thought-llm/tasks/crosswords.py", line 204, in test_output                                                                                                                                                                                    
    info['r'] = info['r_word']                                                                                                                                                                                                                                                 
KeyError: 'r_word'

I find this is due to the output result with a note from gpt-3.5-turbo, such as:

G R A S P
E X T E N
S T A I N
A W E S T
K A R S T

Note: There can be multiple correct outputs for the same input as long as the words are valid and fit in the crossword grid.

This makes the last 5 lines as the final result due to the codes in tasks/crosswords.py, where output.strip().split('\n')[-5:] takes the last 5 lines.

def test_output(self, idx: int, output: str):
    self.env.reset(idx)
    output = output.split('Output:\n')[-1]
    info = {'r_word': 0, 'r_letter': 0, 'r_game': 0}
    for i, line in enumerate(output.strip().split('\n')[-5:], 1):
        letters = line.split(' ')[:5]
        word = ''.join(letters)
        word = word + '_' * (5 - len(word))
        action = f'h{i}. {word}'
        # print(action)
        _, _, _, info = self.env.step(action)
    info['r'] = info['r_word']
    return info

Thus, the info variable is {} due to the function setp(self, action), where

if len(word) != 5:
    return 'Invalid! Word should have 5 letters.', 0, False, {}

So, I turn output.strip().split('\n')[-5:] to output.strip().split('\n')[:5], and add a if statement as follows:

def test_output(self, idx: int, output: str):
    self.env.reset(idx)
    output = output.split('Output:\n')[-1]
    info = {'r_word': 0, 'r_letter': 0, 'r_game': 0}
    for i, line in enumerate(output.strip().split('\n')[:5], 1):
        letters = line.split(' ')[:5]
        word = ''.join(letters)
        word = word + '_' * (5 - len(word))
        action = f'h{i}. {word}'
        # print(action)
        _, _, _, info = self.env.step(action)
        if info == {}:
            info = {'r_word': 0, 'r_letter': 0, 'r_game': 0}
    info['r'] = info['r_word']
    return info

Is that ok?

Does sample selection require np.random.choice(replace=False)?

In the following selection code in run.py, do you need to add replace=False? Otherwise if args.n_select_sample = 2, you might still get the same id to select twice, which you don't want (ie. keep choice 1 and choice 1, instead of keep 2 different choices)

Current version:

# selection
        if args.method_select == 'sample':
            ps = np.array(values) / sum(values)
            select_ids = np.random.choice(ids, size=args.n_select_sample, p=ps).tolist()

Proposed version:

# selection
        if args.method_select == 'sample':
            ps = np.array(values) / sum(values)
            select_ids = np.random.choice(ids, size=args.n_select_sample, p=ps, replace=False).tolist()

Also, thank you for the great work you and your team have done here

Ouput interpretation

Here is a sample output running game24 bfs.sh

Forgive my ignorance, how can we interpret new_ys, sol values, and choices? Thanks!

functools.partial(<function gpt at 0x7f18d4f07250>, model='gpt-4', temperature=0.7)
-- new_ys --: ('1 + 2 = 3 (left: 3 4 7)\n', '2 * 1 = 2 (left: 2 4 7)\n', '4 / 2 = 2 (left: 1 2 7)\n', '7 - 1 = 6 (left: 2 4 6)\n', '7 - 2 = 5 (left: 1 4 5)\n', '7 / 1 = 7 (left: 2 4 7)\n', '4 - 1 = 3 (left: 2 3 7)\n', '4 - 2 = 2 (left: 1 2 7)\n', '7 - 4 = 3 (left: 1 2 3)\n')
-- sol values --: (3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 0.003)
-- choices --: ['1 + 2 = 3 (left: 3 4 7)\n', '2 * 1 = 2 (left: 2 4 7)\n', '4 / 2 = 2 (left: 1 2 7)\n', '7 - 1 = 6 (left: 2 4 6)\n', '7 - 2 = 5 (left: 1 4 5)\n']

-- new_ys --: ('1 + 2 = 3 (left: 3 4 7)\n3 * 4 = 12 (left: 7 12)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n', '2 * 1 = 2 (left: 2 4 7)\n2 * 4 = 8 (left: 7 8)\n', '7 - 1 = 6 (left: 2 4 6)\n2 + 4 = 6 (left: 6 6)\n', '1 + 2 = 3 (left: 3 4 7)\n3 + 4 = 7 (left: 7 7)\n', '1 + 2 = 3 (left: 3 4 7)\n4 - 3 = 1 (left: 1 7)\n', '1 + 2 = 3 (left: 3 4 7)\n7 - 3 = 4 (left: 4 4)\n', '1 + 2 = 3 (left: 3 4 7)\n7 - 4 = 3 (left: 3 3)\n', '1 + 2 = 3 (left: 3 4 7)\n4 / 3 = 1.33 (left: 1.33 7)\n', '1 + 2 = 3 (left: 3 4 7)\n7 / 3 = 2.33 (left: 2.33 4)\n', '1 + 2 = 3 (left: 3 4 7)\n7 / 4 = 1.75 (left: 1.75 3)\n', '2 * 1 = 2 (left: 2 4 7)\n4 - 2 = 2 (left: 2 7)\n', '2 * 1 = 2 (left: 2 4 7)\n7 - 2 = 5 (left: 4 5)\n', '2 * 1 = 2 (left: 2 4 7)\n7 - 4 = 3 (left: 2 3)\n', '2 * 1 = 2 (left: 2 4 7)\n4 / 2 = 2 (left: 2 7)\n', '4 / 2 = 2 (left: 1 2 7)\n1 + 2 = 3 (left: 3 7)\n', '4 / 2 = 2 (left: 1 2 7)\n2 - 1 = 1 (left: 1 7)\n', '4 / 2 = 2 (left: 1 2 7)\n7 - 1 = 6 (left: 2 6)\n', '4 / 2 = 2 (left: 1 2 7)\n7 - 2 = 5 (left: 1 5)\n', '4 / 2 = 2 (left: 1 2 7)\n7 / 1 = 7 (left: 2 7)\n', '4 / 2 = 2 (left: 1 2 7)\n2 * 1 = 2 (left: 2 7)\n', '7 - 1 = 6 (left: 2 4 6)\n4 / 2 = 2 (left: 2 6)\n', '7 - 1 = 6 (left: 2 4 6)\n6 - 2 = 4 (left: 4 4)\n', '7 - 1 = 6 (left: 2 4 6)\n6 / 2 = 3 (left: 3 4)\n', '7 - 1 = 6 (left: 2 4 6)\n4 - 2 = 2 (left: 2 6)\n', '7 - 2 = 5 (left: 1 4 5)\n1 + 4 = 5 (left: 5 5)\n', '7 - 2 = 5 (left: 1 4 5)\n4 - 1 = 3 (left: 3 5)\n', '7 - 2 = 5 (left: 1 4 5)\n4 / 1 = 4 (left: 1 5)\n', '7 - 2 = 5 (left: 1 4 5)\n5 - 1 = 4 (left: 4 4)\n', '7 - 2 = 5 (left: 1 4 5)\n5 - 4 = 1 (left: 1 1)\n', '7 - 2 = 5 (left: 1 4 5)\n5 / 1 = 5 (left: 4 5)\n')
-- sol values --: (1.002, 1.002, 1.002, 1.002, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003)
-- choices --: ['1 + 2 = 3 (left: 3 4 7)\n3 * 4 = 12 (left: 7 12)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n', '2 * 1 = 2 (left: 2 4 7)\n2 * 4 = 8 (left: 7 8)\n', '7 - 1 = 6 (left: 2 4 6)\n2 + 4 = 6 (left: 6 6)\n', '1 + 2 = 3 (left: 3 4 7)\n3 + 4 = 7 (left: 7 7)\n']

-- new_ys --: ('1 + 2 = 3 (left: 3 4 7)\n3 + 4 = 7 (left: 7 7)\n7 + 7 = 14 (left: 14)\n', '1 + 2 = 3 (left: 3 4 7)\n3 * 4 = 12 (left: 7 12)\n12 - 7 = 5 (left: 5 7)\n', '1 + 2 = 3 (left: 3 4 7)\n3 * 4 = 12 (left: 7 12)\n7 * 12 = 84 (left: 84)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n6 + 7 = 13 (left: 13)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n7 - 6 = 1 (left: 1)\n', '2 * 1 = 2 (left: 2 4 7)\n2 * 4 = 8 (left: 7 8)\n7 + 8 = 15 (left: 15)\n', '2 * 1 = 2 (left: 2 4 7)\n2 * 4 = 8 (left: 7 8)\n8 - 7 = 1 (left: 1)\n', '2 * 1 = 2 (left: 2 4 7)\n2 * 4 = 8 (left: 7 8)\n7 * 8 = 56 (left: 56)\n', '2 * 1 = 2 (left: 2 4 7)\n2 * 4 = 8 (left: 7 8)\n8 / 7 = 1.14 (left: 1.14)\n', '7 - 1 = 6 (left: 2 4 6)\n2 + 4 = 6 (left: 6 6)\n6 * 6 = 36 (left: 36)\n', '1 + 2 = 3 (left: 3 4 7)\n3 + 4 = 7 (left: 7 7)\n7 * 7 = 49 (left: 49)\n', '1 + 2 = 3 (left: 3 4 7)\n3 * 4 = 12 (left: 7 12)\n7 + 12 = 19 (left: 19)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n6 * 7 = 42 (left: 42)\n', '7 - 1 = 6 (left: 2 4 6)\n2 + 4 = 6 (left: 6 6)\n6 + 6 = 12 (left: 12)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n6 / 7 = 0.857 (left: 0.857)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n6 - 7 = -1 (left: -1)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n7 / 6 = 1.167 (left: 1.167)\n', '7 - 1 = 6 (left: 2 4 6)\n2 + 4 = 6 (left: 6 6)\n6 - 6 = 0 (left: 0)\n', '1 + 2 = 3 (left: 3 4 7)\n3 + 4 = 7 (left: 7 7)\n7 - 7 = 0 (left: 0)\n')
-- sol values --: (1.002, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.002, 0.002, 0.002, 0.001, 0.0, 0.0, 0.0, 0.0)
-- choices --: ['1 + 2 = 3 (left: 3 4 7)\n3 + 4 = 7 (left: 7 7)\n7 + 7 = 14 (left: 14)\n', '1 + 2 = 3 (left: 3 4 7)\n3 * 4 = 12 (left: 7 12)\n12 - 7 = 5 (left: 5 7)\n', '1 + 2 = 3 (left: 3 4 7)\n3 * 4 = 12 (left: 7 12)\n7 * 12 = 84 (left: 84)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n6 + 7 = 13 (left: 13)\n', '2 * 1 = 2 (left: 2 4 7)\n2 + 4 = 6 (left: 6 7)\n7 - 6 = 1 (left: 1)\n']

MiniCrosswordsTask() troubles

Do you think you could provide an example of the code you have that runs the MiniCrosswordsTask()? I think I am doing something wrong as every time I run it I get the errors: "AttributeError: 'MiniCrosswordsTask' object has no attribute 'value_prompt_wrap'". Thank you so much-- I really appreciate it!

'value_prompt' and function 'propose_score'

Hi,
Thanks for your valuable work in prompting. I looked into the notebook bfs+ToT for the Crosswords game and I noticed that the imported 'value_prompt' and function 'propose_score' seem not used when calling dfs function. Since the bfs search with ToT in other tasks all has a generation step and an evaluation step. I am a little bit confused with this part in dfs+ToT but this part only calls GPT once in the function 'get_candidates_to_scores'. Could you please explain it to me? Thanks!

Bypassing Maximum 4097 Tokens.

Hi @ysymyth, can you help me bypass the "This model's maximum context length is 4097 tokens. However, your messages resulted in 4238 tokens. Please reduce the length of the messages." error? Because many times I tried for this parameter:
n_generate_sample=5,
n_evaluate_sample=1,
n_select_sample=5

It always returns the error I mentioned above (both models, GPT4 and GPT-3.5-Turbo. Thank You.

To re-run your experiment on creative writing, what parameter should I use?