Comments (8)
So I've compared two different alpaca 7b models on the Squad Dataset:
dataset | model | Squad(Mini) F1 |
---|---|---|
Original Alpaca | samwit/alpaca7B-lora | 34.63 |
Cleaned Alpaca | tloen/alpaca-lora-7b | 49.64 |
At least on the surface, it appears the cleaning & curation we've been doing has helped significantly.
from alpacadatacleaned.
Just added piqa benchmark, also redid the scoring of the Squad bench:
dataset | Hugging Face | parameters | SquadMini (f1) | Piqa (acc) |
---|---|---|---|---|
Original Alpaca | samwit/alpaca7B-lora | 7b | 74.271 | |
Cleaned Alpaca (Mar 26) | tloen/alpaca-lora-7b | 7b | 75.629 | |
Cleaned Alpaca (Mar 31) | yahma/alpaca-7b-lora | 7b | 76.388 | |
GPT4All | nomic-ai/gpt4all-lora | 7b | 72.643 |
Note: PIQA benchmark has issues. Do not use it yet.
from alpacadatacleaned.
Decided to standardize by using the lm-eval-harness by EleutherAI instead. Here are the new results:
Dataset | Model | parameters | WikiText (ppl) | MNLI (acc) | Piqa (acc norm) |
---|---|---|---|---|---|
Original Alpaca | samwit/alpaca7B-lora | 7b (lora) | 9.5396 | 38.33 | 78.51 |
Cleaned Alpaca (Mar 26) | tloen/alpaca-lora-7b | 7b (lora) | 9.4885 | 51.6 | 79.33 |
GPT4All | nomic-ai/gpt4all-lora | 7b (lora) | 10.09 | 38.97 | 78.40 |
Not sure why the model trained on the cleaned dataset scored so high in the MNLI benchmark. I ran the test multiple times to confirm.
from alpacadatacleaned.
https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All-J_Technical_Report_2.pdf
from alpacadatacleaned.
I have mentioned a few options previously in this issue: tloen/alpaca-lora#147
from alpacadatacleaned.
Just FYI. I re-ran the SQUADmini bench on a model I fine-tuned on March 31 release of the cleaned dataset and got an avg F1 score of 55.229.
from alpacadatacleaned.
May I ask a question? You use 'tloen/alpaca-lora-7b' got a 49.64 'Squad(Mini) F1' 2 weeks ago, and you use the same model got 75.629 last week, why are the two results so different? I have tried this model and got around 55.07 Squad(Mini) F1.
from alpacadatacleaned.
May I ask a question? You use 'tloen/alpaca-lora-7b' got a 49.64 'Squad(Mini) F1' 2 weeks ago, and you use the same model got 75.629 last week, why are the two results so different? I have tried this model and got around 55.07 Squad(Mini) F1.
The SQUAD MINI score calculations were re-done in that time. Anyhow, going forward, we are ditching the benchmark eval.py and using the lm-evaluation-harness from EleutherAI. The scores reported in the main README are directly from the lm-evaluation-harness report.
from alpacadatacleaned.
Related Issues (20)
- overall approach
- Incorrect key string in alpaca_data_cleaned.json
- Idea about better cleaning HOT 3
- Correct or potentially to be cleaned? HOT 6
- How are you going about cleaning? HOT 4
- Separate instructions by functionality
- What about starting a crowdfunding campaign to collect money to run the examples against GPT-4? HOT 5
- Diffs as data HOT 1
- good job HOT 1
- Contributing to the dataset curation with Argilla and the Alpaca Garbage collector HOT 2
- Is there a boost in performance for full fine-tuning versus LoRA? HOT 2
- Identify code snippet in "input" fields HOT 1
- Command to run the evaluation
- PIQA dataset's metric
- Is the "alpaca_data_cleaned_archive.json" file having all cleaned data? HOT 2
- The MNLI score in lm-evaluation-harness
- Where is the 9k cleaned alpaca data in the paper Alpagasus? HOT 2
- How to format dataset fields in model prompt? HOT 1
- Chinese sft data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alpacadatacleaned.