sotopia-lab / sotopia-pi Goto Github PK

View Code? Open in Web Editor NEW

46.0 3.0 1.0 5.42 MB

Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)

Home Page: https://pi.sotopia.world/

License: Apache License 2.0

Python 78.50% Shell 4.22% Jupyter Notebook 17.29%

agents llm-training reinforcement-learning social-intelligence

sotopia-pi's Issues

[FEAT]: together AI coupon

Description

No response

Additional Information

No response

[BUG]: Move on file into the correct folder

Description of the bug

No response

Steps To Reproduce

No response

Additional Information

No response

[FEAT]: Multi-turn data format is expired. Should remove and replace with new data ASAP?

Description

No response

Additional Information

No response

[FEAT]: add multiple examples during the scenario generation process for creative generation

Description

No response

Additional Information

No response

[FEAT]: Run the current finetuning code on the GCP server (not deploy, just train)

Description

No response

Additional Information

No response

[FEAT]: Build llama2-13b-chat-hf finetuning and inference pipeline in fastchat

Description

We need to align with together ai finetuning results

Additional Information

No response

[FEAT]: Support vLLM deployment after having the checkpoint

Description

After having a model checkpoint from #7 , we need to use vLLM to deploy it on the GCP server.

Additional Information

No response

[FEAT]: Make the prompt shorter to fit in 2048 window size

Description

We use:

delete format part of the prompt
adding sliding window on the conversation

to make the prompt shorter and avoid any >2048 cases.

Additional Information

No response

[FEAT]: Deploy fine-tuned models on babel server and provide API

Description

Current infernce and deploy pipeline lacks interface for hf access token, which needs update to use llama-2-13b. Need to update quantization config for inference to work too.

Additional Information

No response

[WRT]: finish the deliverable report on social llama

Description

Have a relatively formal report for our project.

Additional Information

No response

[FEAT]: Run baseline llama-2-13b model on selected tasks

Description

No response

Additional Information

No response

[FEAT]: Compare the performances between multi-turn INST fine-tuning and QA fine-tuning of Together AI

Description

Support #22

Additional Information

No response

[FEAT]: Transfer dataset construction code from sotopia repo to this repo

Description

Since there is a published pip install sotopia, we can directly utilize that to include the data processing pipeline in this repo.

Additional Information

No response

[FEAT]: Support sotopia repo's code with our self-defined model checkpoint.

Description

We need to support sotopia code to support vLLM api.

Additional Information

No response

[FEAT]: Use 3rd party libraries as dependencies?

Description

A low-level Q: I am wondering to which extent we need to directly work on FastChat library?

As mentioned in Llama-recipe, finetuning only requires a line of code if the input file is in the right form. So is inheriting the whole FastChat repo an overkill? In this way the CI pipeline is broken, and code structure is not very clear.

Additional Information

No response

[FEAT]: Explore the multi-turn capability of INST format of Together AI

Description

According to together AI doc, the INST format support should support multi-turn dialogue directly, and there is no need to split the fine-tuning data points to (previous dialogue, current response). The goal of this Feature is to experiment with 1) the original (previous dialogue, current response) as the fine-tuning dataset entry and 2) putting the entire dialogue as one entry.

Additional Information

No response

[BUG]: Fix typo in new issue template

Description of the bug

No response

Steps To Reproduce

No response

Additional Information

No response

[FEAT]: Skip files for github workflow testing

Description

No response

Additional Information

No response

[FEAT]: Migrate llama-2-13b and mistral-7b to llama factory

Description

Migrate and replicate the training script of llama-2-13b from fastchat to llama factory.

Additional Information

No response

[FEAT]: Split based on different model pair on the overall dataset

Description

No response

Additional Information

No response

[FEAT]: Support together AI finetuning

Description

No response

Additional Information

No response

[FEAT]: add data processing file

Description

Adding data processing file to convert sotopia data into together AI readable format.

Additional Information

No response

[BUG]: (Update) Together AI fine-tuning always stops at step 10 ❌ Need further explanations on how steps work in Together AI ✅

Description of the bug

All fine-tuning processes still stop at step 10. According to @lwaekfjlk, no early stopping code is found in the source code. And the curves shown in the below image stop at different losses. Need further experiments.

Steps To Reproduce

No response

Additional Information

Here is a comparison between two different fine-tuning settings:
epoch = 2, batch=4 -> epoch 1 stopped at step 5, epoch 2 stopped at step 10
epoch = 4, batch=32 -> epoch 1 stopped at step 3, epoch 2 stopped at step 6, epoch 3 stopped at step 9, epoch 4 stopped at step 10.

[FEAT]: Add data point level filtering based on quality of all selected dialogues in train and test sets (whole dataset)

Description

For each set of scenario split by easy and hard, we previously discuss if we want to use only dialogues that reach high performance / high quality. If we want to add this filtering, how we are going to define "high quality" should be clarified. Currently, the most simplified approach we adopt is use overall reward score, but this might be too high level.

Additional Information

No response

[FEAT]: Deploy fine-tuned models on babel server and provide API

Description

No response

Additional Information

No response

[FEAT]: Evaluate finetuned model on social goal as our final results for milestone1

Description

No response

Additional Information

No response

[FEAT]: Reorganize the whole repo based on task tag

Description

No response

Additional Information

No response

[FEAT]: Add vLLM-based deployment and usage testing code

Description

We want to add our finetuned LLM deployment code.

Additional Information

No response

[FEAT]: Do ablation study on Mistral-7B model on hyper-parameter

Description

We need to test whether different learning rate or batch size matters or not.

Additional Information

No response

[FEAT]: add PR and issue template

We transfer the PR and issue template from sotopia to this repo.

[FEAT]: Convert general finetuning format into together-ai input format and FastChat input format

Description

No response

Additional Information

No response

[BUG]: FastChat template may cause HuggingFace to use gradients from the prompt

Description of the bug

Becuase of the way FastChat templates are formed, HF autograd may calculate gradients for the prompts. The prompts are masked so the gradients are zero but it may be wasting memory.

Steps To Reproduce

No response

Additional Information

No response

[FEAT]: Plans for evaluation and fine-tuning data format

Description

Currently we use plain dialogue (passing the other agent’s direct response instead of passing the prompt from environment agent) to fine-tune Llama 2 on Together AI. A qualitative analysis indicates that the fine-tuned llama-2-13b-chat performs well on selected difficult scenarios.

In order to call the fine-tuned model’s API and evaluate its performance based on Sotopia’s evaluation metrics, we may have two solutions:

Solution 1.

Keep the same fine-tuning data and revise the prompt template for the fine-tuned model during inference. A potential problem with this solution would be the unfair(?) comparison between the fine-tuned model and the original Llama 2 model - their prompts during inference are different.

Solution 2.

Modify the fine-tuning data to be consistent with the prompts for all other models. Fine-tune Llama 2 again with QA format instead of multi-turn INST format.

Before implementing Solution 2., we still have two key points to discuss and solve:

Issue 1. Currently we only have episode logs in the database. We could either 1) Re-run (GPT-4/GPT-4) dialogues and
collect the environment prompts in the process, or 2) Reverse the episode logs in the database to the environment prompts following a specific template.
Issue 2. Currently we have no idea about the performances of multi-turn INST fine-tuning and QA fine-tuning of Together AI. Once Issue 1 is resolved, we will experiment on the difference in performances between multi-turn INST fine-tuning and QA fine-tuning (#24).

Discussion

We expect the evaluation result of the fine-tuned model trained on the bable servers to align perfectly with the fine-tuned model on Together AI.

On a side note, we are curious about the motivation of using environment prompts from the third-party GPT-4 (e.g., why is prompting with previous dialogue history important? what goals can’t multi-turn prompting format do?).

Additional Information

No response

[FEAT]: Build Mistral finetuning and deploying pipeline

Description

First, we need to confirm which platform to use for finetuning.
Secondly, we need to run 13b-llama on this platform.
Thirdly, we need to run Mistral7b on this platfom.

The output of this issue is a model checkpoint.

Additional Information

No response

[FEAT]: Convert episode log data to standard Sotopia prompts + convert into a general format for the following stage

Description

No response

Additional Information

No response

[BUG]: FastChat training and inferences should not Vicuna template but self-defined template

Description of the bug

FastChat training and inferences uses Vicuna prompt template, which is in accurate since our prompt is already well-formed. We need a clean template.

Steps To Reproduce

No response

Additional Information

No response

[FEAT]: Design quatitative eval metric on fine-tuned model on specific hard scenarios including bargain and donation

Description

No response

Additional Information

No response

[FEAT]: Finish social warm up (finish sotopia-data + together.ai + llm-ft + eval tag issues)

Description

Finish the instruction tuning warm up stage of sotopia model. The model gains some improvement in social ability.

Additional Information

No response

[BUG]: together AI finetuning not work as exptected

Description of the bug

Based on the wandb visualization, it always stops at the 10th steps.

Have tried:

change model name
change suffix
increasing training data to more than 1000 examples

All not working.

Steps To Reproduce

No response

Additional Information

No response

[FEAT]: Run baseline model and compare its performance with finetuned model

Description

No response

Additional Information

No response

[FEAT]: Switch to llama factory

Description

We need to switch to llama factory.

Additional Information

No response

[FEAT]: Scenario Generation and Social Goal Generation

Description

No response

Additional Information

No response

[FEAT]: Adding llama-factory for finetuning and reinforcement learning

Description

llama-factory contains methods for reinforced training of LLMs which we will use for the next stage of self-training.

Additional Information

No response

[FEAT]: Selected scenarios w better valuation score and split into train / test

Description

Adding filtering layer to select dialogue/scenario that produces high overall score.

Additional Information

No response

[BUG]: Unstable model output when fine-tuned on INST format

Description of the bug

In the following example, the model outputs "[INST] What is the best way to eat a snowman?" after answering the question. The sentence is not found in the fine-tuning dataset and it looks like the model's hallucination.