Giter VIP home page Giter VIP logo

llmcrowddialogueeval's Introduction

FeedbackImpactOnDialogueEval

Welcome to the GitHub repository for the paper "Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs." SIGIR 2024

This repository contains the data and analysis tools used in the study exploring the impact of user feedback on the evaluation processes of crowd workers and Large Language Models (LLMs) within the context of dialogue systems. The project aims to provide insights into how user feedback can influence the assessment accuracy of both crowdworkers and LLMs.

Additionally, we do further analysis to understand which metrics benefit from the inclusion of follow-up utterance and in which metrics human assessors perform better compared to LLM assessors

Overview

Our study investigates two primary methodologies for assessing Task-Oriented Dialogue Systems (TDSs):

  1. Evaluations that incorporate the user’s follow-up utterance.
  2. Evaluations conducted without the user's follow-up.

Repository Structure

  • Data/
    • CrowdworkerAnnotations/: Contains annotated dialogue data by human crowdworkers.
    • LLMAnnotations/: Contains dialogue data annotated by LLMs.
  • Prompts/: Prompts used for dialogue aspect assessment by gpt-3.5-turbo.

Data Description

The datasets in this repository are divided into two main parts:

  • Crowdworkers: Data capturing how crowdworkers' evaluations are influenced by direct user feedback.
  • LLMs: Data reflecting how Large Language Model evaluations are influenced by direct user feedback.

Each dataset includes:

  • user_id: Identifier for the user session.
  • dialogue_id: Identifier for each dialogue instance.
  • feedback: Actual feedback provided by users.
  • evaluations: Responses and evaluation scores from crowdworkers and LLM.

Evaluation Aspects of Dialogue Systems

This section outlines the criteria used to evaluate dialogue responses in our study. Each aspect plays a significant role in the comprehensive assessment of Task-Oriented Dialogue Systems (TDSs).

  • Relevance: How relevant are the responses to the initial query?
  • Usefulness: The utility of the responses in context.
  • Interestingness: How engaging the responses are.
  • Explanation Quality: The clarity and helpfulness of explanations provided by the system.

llmcrowddialogueeval's People

Contributors

clemenciah avatar

Watchers

 avatar

llmcrowddialogueeval's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.