Giter VIP home page Giter VIP logo

crowd-kit's Introduction

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit

GitHub Tests Codecov Documentation

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
  • metrics of uncertainty, consistency, and agreement with aggregate;
  • loaders for popular crowdsourced datasets.

Also, the learning subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.

Installing

To install Crowd-Kit, run the following command: pip install crowd-kit. If you also want to use the learning subpackage, type pip install crowd-kit[learning].

If you are interested in contributing to Crowd-Kit, use Pipenv to install the library with its dependencies: pipenv install --dev. We use pytest for testing.

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset:

df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then, you can aggregate the workers' responses using the fit_predict method from the scikit-learn library:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (โœ…) and in progress (๐ŸŸก).

Categorical Responses

Method Status
Majority Vote โœ…
One-coin Dawid-Skene โœ…
Dawid-Skene โœ…
Gold Majority Vote โœ…
M-MSR โœ…
Wawa โœ…
Zero-Based Skill โœ…
GLAD โœ…
KOS โœ…
MACE โœ…
BCC ๐ŸŸก

Multi-Label Responses

Method Status
Binary Relevance โœ…

Textual Responses

Method Status
RASA โœ…
HRRASA โœ…
ROVER โœ…

Image Segmentation

Method Status
Segmentation MV โœ…
Segmentation RASA โœ…
Segmentation EM โœ…

Pairwise Comparisons

Method Status
Bradley-Terry โœ…
Noisy Bradley-Terry โœ…

Learning from Crowds

Method Status
CrowdLayer โœ…
CoNAL โœ…

Citation

@misc{CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
  title     = {{Learning from Crowds with Crowd-Kit}},
  year      = {2023},
  publisher = {arXiv},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://arxiv.org/abs/2109.08584},
  language  = {english},
}

Questions and Bug Reports

  • To report a bug, post an issue on the Toloka/bugreport page.
  • To find answers to common questions or start a new discussion, join our English-speaking Slack community.

License

ยฉ Crowd-Kit team authors, 2020โ€“2023. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

crowd-kit's People

Contributors

alexandervnuchkov avatar alexdremov avatar alexdrydew avatar aliskin avatar arcadia-devtools avatar denaxen avatar dependabot[bot] avatar drhf avatar dustalov avatar losik avatar natalyl3 avatar pavelgein avatar pilot7747 avatar senarect avatar shadchin avatar shenxiangzhuang avatar tulinev avatar varfolomeii avatar yulian-gilyazev avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.