Giter VIP home page Giter VIP logo

continuous-eval-examples's Introduction

continuous-eval examples

This repo contains end-to-end examples of GenAI/LLM applications and evaluation pipelines set up using continuous-eval.

Checkout continuous-eval repo and documentation for more information.

Example Name App Framework Eval Framework Description
Simple RAG Langchain continuous-eval Simple QA chatbot over select Paul Graham essays
Complex RAG Langchain continuous-eval Complex QA chatbot over select Paul Graham essays
Simple Tools LlamaIndex continuous-eval Math question solver using simple tools
Context Augmentation Agent LlamaIndex continuous-eval QA over Uber financial dataset using agents
Sentiment Classification LlamaIndex continuous-eval Single label classification of sentence sentiment

Installation

git clone https://github.com/relari-ai/examples.git && cd examples
poetry install

Add LLM API keys in .env (reference .env.example) for select applications.

  • COHERE API_KEY for Cohere Rerankers in RAG examples
  • GOOGLE_API_KEY for all LLM calls

Get started

In each application folder (examples/[langchain|llamaindex]/APP_NAME/), here are the files:

  • pipeline.py defines the application pipeline and the evaluation metrics / tests.
  • app.py contains the LLM application. Run the application to get the outputs (saved as results.jsonl)
  • eval.py runs the metrics / tests defined by pipeline.py (saved as metrics_results.json and test_results.json)

Depending on the application, the source data for the application (documents and embeddings in Chroma vectorstore) and evaluation (golden dataset) is also provided. Note that for the evaluation golden dataset, there are always two files:

  • dataset.jsonl contains the inputs (questions) and reference module outputs (ground truths)
  • manifest.yaml defines the structure of the dataset for the evaluators.

Tweak metrics and tests in pipeline.py to try out different metrics.

continuous-eval-examples's People

Contributors

pantonante avatar yisz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.