Giter VIP home page Giter VIP logo

Comments (8)

gururise avatar gururise commented on August 10, 2024 1

So I've compared two different alpaca 7b models on the Squad Dataset:

dataset model Squad(Mini) F1
Original Alpaca samwit/alpaca7B-lora 34.63
Cleaned Alpaca tloen/alpaca-lora-7b 49.64

At least on the surface, it appears the cleaning & curation we've been doing has helped significantly.

from alpacadatacleaned.

gururise avatar gururise commented on August 10, 2024 1

Just added piqa benchmark, also redid the scoring of the Squad bench:

dataset Hugging Face parameters SquadMini (f1) Piqa (acc)
Original Alpaca samwit/alpaca7B-lora 7b 74.271 50.5
Cleaned Alpaca (Mar 26) tloen/alpaca-lora-7b 7b 75.629 54.0
Cleaned Alpaca (Mar 31) yahma/alpaca-7b-lora 7b 76.388 52.6
GPT4All nomic-ai/gpt4all-lora 7b 72.643 49.5

Note: PIQA benchmark has issues. Do not use it yet.

from alpacadatacleaned.

gururise avatar gururise commented on August 10, 2024 1

Decided to standardize by using the lm-eval-harness by EleutherAI instead. Here are the new results:

Dataset Model parameters WikiText (ppl) MNLI (acc) Piqa (acc norm)
Original Alpaca samwit/alpaca7B-lora 7b (lora) 9.5396 38.33 78.51
Cleaned Alpaca (Mar 26) tloen/alpaca-lora-7b 7b (lora) 9.4885 51.6 79.33
GPT4All nomic-ai/gpt4all-lora 7b (lora) 10.09 38.97 78.40

Not sure why the model trained on the cleaned dataset scored so high in the MNLI benchmark. I ran the test multiple times to confirm.

from alpacadatacleaned.

claysauruswrecks avatar claysauruswrecks commented on August 10, 2024 1

image

https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All-J_Technical_Report_2.pdf

from alpacadatacleaned.

claysauruswrecks avatar claysauruswrecks commented on August 10, 2024

I have mentioned a few options previously in this issue: tloen/alpaca-lora#147

from alpacadatacleaned.

gururise avatar gururise commented on August 10, 2024

Just FYI. I re-ran the SQUADmini bench on a model I fine-tuned on March 31 release of the cleaned dataset and got an avg F1 score of 55.229.

from alpacadatacleaned.

YukinoshitaKaren avatar YukinoshitaKaren commented on August 10, 2024

May I ask a question? You use 'tloen/alpaca-lora-7b' got a 49.64 'Squad(Mini) F1' 2 weeks ago, and you use the same model got 75.629 last week, why are the two results so different? I have tried this model and got around 55.07 Squad(Mini) F1.

from alpacadatacleaned.

gururise avatar gururise commented on August 10, 2024

May I ask a question? You use 'tloen/alpaca-lora-7b' got a 49.64 'Squad(Mini) F1' 2 weeks ago, and you use the same model got 75.629 last week, why are the two results so different? I have tried this model and got around 55.07 Squad(Mini) F1.

The SQUAD MINI score calculations were re-done in that time. Anyhow, going forward, we are ditching the benchmark eval.py and using the lm-evaluation-harness from EleutherAI. The scores reported in the main README are directly from the lm-evaluation-harness report.

from alpacadatacleaned.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.