Giter VIP home page Giter VIP logo

notus's Introduction

πŸ’¨ Notus

A banner representing Notus, the wind god of the south, in a mythical and artistic style. The banner features a strong, swirling breeze, embodying the warm, wet character of the southern wind. Gracefully flowing across the scene are several paper planes, caught in the gentle yet powerful gusts of Notus. The background is a blend of warm colors, symbolizing the heat of the south, with hints of blue and green to represent the moisture carried by this wind. The overall atmosphere is one of dynamic movement and warmth.

Notus is a collection of fine-tuned models using SFT, DPO, SFT+DPO, and/or any other RLAIF/RLHF techniques; following a data-first, human-centric approach, since that's what we do best at Argilla.

Notus models are intended to be used as assistants via chat-like applications, and are evaluated with Chat (MT-Bench, AlpacaEval) and Academic (Open LLM Leaderboard) benchmarks for a direct comparison with other similar LLMs.

Notus name comes from the ancient Greek god Notus, as a wink to Zephyr, which comes from the ancient Greek god Zephyrus; with the difference that Notus is the god of the south wind, and Zephyr the god of the west wind. More information at https://en.wikipedia.org/wiki/Anemoi.

Being able to fine-tune LLMs while still keeping a data-first approach wouldn't have been possible without the inestimable help of the open source community and all the amazing resources out there intended for the general public. We are very grateful for that, and we hope that our work can be useful for others as well.

🎩 h/t HuggingFace H4 team for their amazing work with alignment-handbook, and also for the fruitful discussions we had with them and their support.

News

  • December 1st, 2023: Notus 7B v1 is released! πŸŽ‰ Using the same DPO fine-tuning approach as Zephyr 7B Beta, but changing the data source from UltraFeedback to binarize it using the average of the different criterias, instead of the critique score. Notus 7B improved in both AlpacaEval and LM Eval Harness compared to Zephyr 7B Beta, while for MT-Bench the results were on par. More information at v1/.

Resources

πŸ€— HuggingFace Hub Collection

πŸ’¬ Chat UI

Citation

Since most of the content is ported / adapted from huggingface/alignment-handbook, we recommend citing their work.

@misc{alignment_handbook2023,
  author = {Lewis Tunstall and Edward Beeching and Nathan Lambert and Nazneen Rajani and Alexander M. Rush and Thomas Wolf},
  title = {The Alignment Handbook},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huggingface/alignment-handbook}}
}

Additionally, if you find any of the contents within this repository useful, please feel free to use the following BibTeX cite as well:

@misc{notus2023,
  author = {Alvaro Bartolome and Gabriel Martin and Daniel Vila},
  title = {Notus},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/argilla-io/notus}}
}

Note

Alphabetically ordered by last name due to equal contribution.

notus's People

Contributors

alvarobartt avatar gabrielmbmb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

notus's Issues

Run DPO step with multibinarized dataset

One important open question, especially for distilabel is: does generating more pairs of chosen/rejected samples improve the DPO process. Both Notus, Zephyr and Tulu, just use the best chosen and a random rejected.

We need to run an experiment to better understand how multibinarization impacts the model.

The dataset is ready:
https://huggingface.co/datasets/argilla/notus-uf-dpo-multibinarized

It contains pairs with the best chosen response and then several pairs with responses that go a lower rating, so instead of generating just one sample per UF row (choosing a random rejected), we generated >=1<4 pairs

【question】are you planning to support multi language model?

Hi everyone!

Thanks for opening great project to make high quality data and LLM model like Notux and Notus!
I appreciate your passion to publish everything you do.
I have no doubt that you are producing a great PROGRESS in the development of AI in the world.

Let me ask following questions.

  • Do you have any plan to support multi language model?
  • Or can I contribute somehow to make LLM model which support Japanese?
    I tested Japanese and result was close to perfect.
    I felt NOTUS have basic knowledge of Japan but Japanese is not natural.

I think creating multi-language model can accelate to develop Open LLM model in world scale

I wish you will give me reply!
Thanks!

Curate UltraFeedack dataset's overall_score

Based on our curation efforts, we spotted a bug in the overall_score of UltraFeedback AI Critique score. TLDR: Responses getting the lowest score (1 or less) become a high score (10 or 8.0 or 7.5 who knows!). Our initial work with notus shows that by using something different to the overall score, we can train a better model.

In this task, we want to really clean up the original dataset to make sure others build on an error free dataset. I have myself curated a few hundreds (sorting by chosen score = 10) and most of the responses getting a 10 are totally useless according to the rationale (natural language) explanation.

The objective is as follows:

  1. Using this dataset take the col best_overall_score_response, get the critique text and run it through a very simple sentiment analysis (I suggest starting with TextBlob's because it's really fast and the rationales are very expressive when the response is really bad).
  2. Add this sentiment score to the dataset on a new column, best_overall_score_response_critique_sentiment.
  3. Based on this new dataset, let's try to find out those examples that get a high overall_score but a bad sentiment.
  4. Iterate as much as we can to really narrow down those problematic cases. I'd strongly suggest to use Argilla UI with sort and filters to quickly adjust.
  5. Once we know the problematic cases, we have several choices, the best I can think of is reduce their overall_score (dividing by 10 :-) ) in the completions object.
  6. Now we have a clean dataset, we can use to experiment further (compare rating vs critique, etc.) and most important share it with the community so people build on a clean version!

More details about the initial analysis on the dataset readme.

Please keep us posted as you start and iterate!

Run SFT step with the rating binarized data chosen responses

I'd recommend to run an experiment:

Using the base SFT Zephyr model, do SFT with the chosen responses of our UF dataset. As we know there are several issues with the original train_prefs we should evaluate if their results (SFT on the chosen responses not improving the overall recipe) still holds.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.