Hi, I have tried to reproduce the result on Crosswords with the give

MiniCrosswords performance about tree-of-thought-llm HOT 6 CLOSED

mhd0528 commented on May 2, 2024

MiniCrosswords performance

from tree-of-thought-llm.

Comments (6)

ysymyth commented on May 2, 2024 1

Interesting, from the file you provided, it does seem that GPT-4 became weaker. For example, in the second test case, previously the first three moves were

            "actions": [
                "h4. seder",
                "v3. evade",
                "v1. arise"
            ],

but now it is

            "actions": [
                "h4. seder",
                "v3. evade",
                "v4. seder"
            ],

I would be happy to run again some time, and feel free to do some analysis. But unfortunately we cannot control GPT-4 or guarantee exact reproducibility of experiments run on GPT-4. Given the test set is small (only 20 cases) for crosswords, maybe running multiple trials would be safer.

from tree-of-thought-llm.

ysymyth commented on May 2, 2024

Interesting. Can you share your command and results?

from tree-of-thought-llm.

mhd0528 commented on May 2, 2024

Yes,

I ran the cells in the ipynb file in the crosswords folder.

https://github.com/princeton-nlp/tree-of-thought-llm/blob/7d1bd7eb4dd656b325f6c70f2f9b1ded9a8368bf/scripts/crosswords/search_crosswords-dfs.ipynb

The result is attached below. We ran on the first 12 examples (0, 5, ...55) in the test set. Only example 40 (the 8th test case) has results with r_word = 1.0 or r_game = true. The file shares the same structure as in the repo.
tot-0.7-infoss_dfs_prune.txt

Thanks

from tree-of-thought-llm.

mhd0528 commented on May 2, 2024

I agree. It's hard to exactly reproduce the result with GPT-4. I also notice that the experiments are finished in May. In this case, do you think it is worth running it with GPT4-0314 or other earlier versions would help on this issue?

from tree-of-thought-llm.

ysymyth commented on May 2, 2024

probably. if you can run and share it'd be awesome!

from tree-of-thought-llm.

ysymyth commented on May 2, 2024

closing for now --- feel free to open a new one if you have some new results to share!

from tree-of-thought-llm.

Recommend Projects

MiniCrosswords performance about tree-of-thought-llm HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent