Comments (9)
Hi, what we did was to use prompted PaLM-540B to generate trajectories with correct answers to fine-tune PaLM-8/62B. These are not shared as PaLM is a Google-internal model, but you can similarly use GPT-3 trajectories to fine-tune.
from react.
Was it that you had 3000 examples of these inputs and targets pair for fine-tuning the model?
Yes that's right --- we use prompted LLM to generate 3000 such examples with correct answers.
from react.
Yes, we believe fine-tuning is very promising --- can also check out Toolsformers (https://arxiv.org/abs/2302.04761) which is also about fine-tuning!
from react.
Thanks for the reply. All the best with your next project!
from react.
@ysymyth I was asking for specific details of the fine-tuning process. For example, did you finetune the model by separating the data individually for every question, trajectories and answers? I guess you would this so that you will not mixed up the data during training right?
from react.
did you finetune the model by separating the data individually for every question, trajectories and answers?
Sorry, what do you mean exactly?
from react.
Example:
Input is given to model:
Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Targets that you use to fine-tune the model on:
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. So I need to look up eastern sector.
Action 2: Lookup[eastern sector]
Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of two distinct land regions:
Thought 4: I need to instead search High Plains (United States).
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft]
[ADD END TOKEN HERE]
Was it that you had 3000 examples of these inputs and targets pair for fine-tuning the model? I was just curious on how did you go about fine-tuning the model.
from react.
I see, thank you for the information!
It's very interesting to see knowledge distillation done for reasoning and action in language models. You are prompt engineering LLMs that has in-context learning (which is an emergent property of its size) to teach smaller models how to go about reasoning and acting. From your paper, I believe that REact be another important part of instruction tuning task that we can add when instruction-tuning smaller language models to perform as well as LLMs.
from react.
Hey, sorry for asking in a closed issue, but its very relevant to authors' question.
Is it enough to just have pairs (<Question>, <Thought, Action, Observation chain>)
in the dataset, or we also need to add intermediate pairs, like: (<Question, Thought, Action, Observation>, <Thought, Action, Observation>)
?
I haven't tried to fine-tune the model yet, but my guess is that the model will be able to start answering the question, but might be bad at REacting, i.e. taking previous observations into the consideration. Am I wrong?
from react.
Related Issues (20)
- Alfworld GPT-3 Results HOT 3
- I got zero score running Webshop.ipython HOT 8
- Paper, table2 HOT 2
- Question about webshopEnv HOT 6
- Could you please tell me how to access the url in the WebShop.ipynb: http://3.83.245.205:3000 ? HOT 4
- Have you ever considered to apply ReAct prompting to numerical reasoning task? HOT 1
- Could you provide text-davinci-002 log on HotpotQA 500 (30.8EM)? HOT 1
- Potential Implementation error on Webshop
- Questions on Table 3 (AlfWorld) HOT 1
- Webshop experiment details for numbers in paper HOT 1
- Get low accuracy with GPT-3.5. HOT 12
- Question for the code HOT 1
- WEBSHOP_URL = "http://3.83.245.205:3000" 遇到一些问题 HOT 4
- Davinci-002 HOT 1
- Old or New openai version HOT 2
- [Reproducing Results] on Alfworld HOT 3
- How can I install ReAct? HOT 2
- How to finetune the small REACT model
- cot->react & react->cot HOT 2
- Jupyter output on HotpotQA HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from react.