Comments (6)
Interesting, from the file you provided, it does seem that GPT-4 became weaker. For example, in the second test case, previously the first three moves were
"actions": [
"h4. seder",
"v3. evade",
"v1. arise"
],
but now it is
"actions": [
"h4. seder",
"v3. evade",
"v4. seder"
],
I would be happy to run again some time, and feel free to do some analysis. But unfortunately we cannot control GPT-4 or guarantee exact reproducibility of experiments run on GPT-4. Given the test set is small (only 20 cases) for crosswords, maybe running multiple trials would be safer.
from tree-of-thought-llm.
Interesting. Can you share your command and results?
from tree-of-thought-llm.
Yes,
I ran the cells in the ipynb file in the crosswords folder.
The result is attached below. We ran on the first 12 examples (0, 5, ...55) in the test set. Only example 40 (the 8th test case) has results with r_word = 1.0 or r_game = true. The file shares the same structure as in the repo.
tot-0.7-infoss_dfs_prune.txt
Thanks
from tree-of-thought-llm.
I agree. It's hard to exactly reproduce the result with GPT-4. I also notice that the experiments are finished in May. In this case, do you think it is worth running it with GPT4-0314 or other earlier versions would help on this issue?
from tree-of-thought-llm.
probably. if you can run and share it'd be awesome!
from tree-of-thought-llm.
closing for now --- feel free to open a new one if you have some new results to share!
from tree-of-thought-llm.
Related Issues (20)
- Experiment takes too long to run HOT 2
- MiniCrosswordsTask() troubles HOT 2
- The first step of Setup(Setup OpenAI key) is not right in Google Colab ubuntu environment HOT 4
- openai.error.ServiceUnavailableError: The server is overloaded or not ready yet. HOT 5
- Open source llms HOT 1
- Text generation task is not implemented as what the paper shows HOT 2
- How to use custom inputs? HOT 1
- 'value_prompt' and function 'propose_score'
- how to get the value HOT 1
- Marketing suggestion for your idea HOT 1
- A Missing Default Argument in MiniCrosswordsTask HOT 1
- Run time
- Inquiry on the Game 24 result of Tree-of-Thought HOT 1
- 'TextTask' object has no attribute 'propose_prompt_wrap' HOT 1
- How to fine tune the small model on the tree of thoughts HOT 4
- Bypassing Maximum 4097 Tokens. HOT 2
- do support qwen models from vllm HOT 1
- propose_prompt and value_prompt independently use? HOT 1
- Requirement check missed in the evaluation of the text task HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tree-of-thought-llm.