Giter VIP home page Giter VIP logo

louisbrulenaudet / tax-retrieval-benchmark Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 85 KB

An implementation of the TaxRetrievalBenchmark task for the 🤗 Massive Text Embedding Benchmark (MTEB) framework.

Home Page: https://huggingface.co/louisbrulenaudet

License: Apache License 2.0

Jupyter Notebook 100.00%
benchmark droit embeddings fiscal fiscalite information-retrieval mteb rag retrieval retrieval-augmented-generation

tax-retrieval-benchmark's Introduction

Massive Text Embedding Benchmark for French Taxation 🤗

Python Maintainer Open In Colab

In this notebook, we will explore the process of adding a new task to the Massive Text Embedding Benchmark (MTEB). The MTEB is an open-source framework developed to facilitate the evaluation and benchmarking of multilingual and multi-task models across a diverse set of tasks and languages.

The task we will be integrating is the TaxRetrievalBenchmark, a retrieval task focused on retrieving relevant tax articles or content based on provided queries. This task is particularly useful in the legal and financial domains, where accurate and efficient retrieval of relevant information is crucial. To add this task to the MTEB framework, we will follow a structured approach:

  • Understanding the task: We will start by analyzing the TaxRetrievalBenchmark task, its data format, and the evaluation metrics used to assess model performance.
  • Preparing the data: Next, we will preprocess the data from the HuggingFace Hub, converting it to the MTEB format. This step involves organizing the corpus, queries, and relevant document information into the required data structures.
  • Implementing the task class: We will then implement the TaxRetrievalBenchmark class, which inherits from the AbsTaskRetrieval class provided by the MTEB framework. This class will encapsulate the task-specific logic, including data loading, metadata management, and evaluation methods.
  • Integrating with MTEB: Finally, we will integrate the TaxRetrievalBenchmark class into the MTEB framework, allowing it to be used alongside other tasks for multi-task training and evaluation.

By adding the TaxRetrievalBenchmark task to the MTEB framework, we will contribute to the growing collection of diverse tasks, enabling researchers and practitioners to develop and evaluate multilingual and multi-task models more effectively. This notebook will serve as a practical guide for anyone interested in extending the MTEB framework with new tasks, fostering collaboration and advancing the field of natural language processing.

Citing this project

If you use this code in your research, please use the following BibTeX entry.

@misc{louisbrulenaudet2024,
  author =       {Louis Brulé Naudet},
  title =        {Massive Text Embedding Benchmark for French Taxation},
  year =         {2024}
}

Feedback

If you have any feedback, please reach out at [email protected].

tax-retrieval-benchmark's People

Contributors

louisbrulenaudet avatar

Watchers

 avatar

Forkers

lemoneresearch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.