Light

chateval Goto Github PK

repos: 31.0 gists: 0.0

Type: Organization

ChatEval (A Tool for Evaluating Chatbots)

Chatbot evaluation is really hard. There is no standard, and this is our attempt to at least address small parts of this issue.

Right now we using ParlAi as our framework for data as well as experiments. We used OpenMNT-py for training models. All of our checkpoints will be made publicly available including all configurations. See this link for checkpoints from the paper.

Submit your model! Please take a look our submission page.

Amazon Mechanical Turk is not free... we have received initial funding from Joao Sedoc's MSR Dissertation Grant. Thank you Microsoft!

Please find our paper here.

What does ChatEval solve?

Shared and publicly available model code and checkpoints.
Standard evaluation datasets.
Standard human annotator framework (currently using Amazon Mechanical Turk).
Model comparisons of the performance of Model A vs Model B. Both a summary and all data are available.

chateval's Projects

application

A platform for the warehousing and evaluation of neural open domain chatbot models.

archive

Public evaluation tool for non task driven neural open domain chatbots

autolabel

Label, clean and enrich text datasets with LLMs.

bartscore

BARTScore: Evaluating Generated Text as Text Generation

baseline-collection

Code to publish HITs on Mechanical Turk to collect human baselines

begin-dataset

A benchmark dataset for evaluating dialog system and natural language generation metrics.

botsim

BotSIM - a data-efficient end-to-end Bot SIMulation toolkit for evaluation, diagnosis, and improvement of commercial chatbots

chateval-amt-interface

Scripts for ChatEval and Dialog Annotation

chateval.github.io

comparison

Chatbot comparison webapp built using React.

conture

ConTurE is a human-chatbot dataset that contains turn level annotations to assess the quality of chatbot responses.

d-score

D-score Framework For Open-domain Automatic Dialogue Evaluation

dialevalmetrics

dialoflow

Code for ACL 2021 main conference paper "Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances".

dialog-eval

Evaluate your dialog model with 17 metrics! (see paper)

dialog-processing

NLG and NLU for dialogue processing

easl

Efficient Annotation of Scalar Labels

evaluation

Microservice to handle automatic evaluation of neural chatbot models. Multiple automated evaluation methods (including embedding-based metrics).

flow-ai-nlg-api

gptscore

Source Code of Paper "GPTScore: Evaluate as You Desire"

human-evaluation-datasheet

irt-for-chatbots

kani

kani (カニ) is a highly hackable microframework for chat-based language models with tool usage/function calling.

mephisto

A suite of tools for managing crowdsourcing tasks from the inception through to data packaging for research use

metaeval-simplification

multirefeval

Code and Data for the paper Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References SIGdial 2019

online_dialog_eval

Code for the paper "Learning an Unreferenced Metric for Online Dialogue Evaluation", ACL 2020

rankme

The dataset and code released with the submission of NAACL 2018 paper "RankME: Reliable Human Ratings for Natural Language Generation"

scale-based-human-eval

All experiments and evaluation code for decoding diversity project!

spot-the-bot-code

1
2

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.