Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.
Currently, Crowd-Kit contains:
- implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
- metrics of uncertainty, consistency, and agreement with aggregate;
- loaders for popular crowdsourced datasets.
Also, the learning
subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.
To install Crowd-Kit, run the following command: pip install crowd-kit
. If you also want to use the learning
subpackage, type pip install crowd-kit[learning]
.
If you are interested in contributing to Crowd-Kit, use Pipenv to install the library with its dependencies: pipenv install --dev
. We use pytest for testing.
This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.
First, let us do all the necessary imports.
from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset
import pandas as pd
Then, you need to read your annotations into Pandas DataFrame with columns task
, worker
, label
. Alternatively, you can download an example dataset:
df = pd.read_csv('results.csv') # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2') # or download an example dataset
Then, you can aggregate the workers' responses using the fit_predict
method from the scikit-learn library:
aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)
Below is the list of currently implemented methods, including the already available (โ ) and in progress (๐ก).
Method | Status |
---|---|
Majority Vote | โ |
One-coin Dawid-Skene | โ |
Dawid-Skene | โ |
Gold Majority Vote | โ |
M-MSR | โ |
Wawa | โ |
Zero-Based Skill | โ |
GLAD | โ |
KOS | โ |
MACE | โ |
BCC | ๐ก |
Method | Status |
---|---|
Binary Relevance | โ |
Method | Status |
---|---|
RASA | โ |
HRRASA | โ |
ROVER | โ |
Method | Status |
---|---|
Segmentation MV | โ |
Segmentation RASA | โ |
Segmentation EM | โ |
Method | Status |
---|---|
Bradley-Terry | โ |
Noisy Bradley-Terry | โ |
Method | Status |
---|---|
CrowdLayer | โ |
CoNAL | โ |
- Ustalov D., Pavlichenko N., Tseitlin B. Learning from Crowds with Crowd-Kit. 2023. arXiv: 2109.08584 [cs.HC].
@misc{CrowdKit,
author = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
title = {{Learning from Crowds with Crowd-Kit}},
year = {2023},
publisher = {arXiv},
eprint = {2109.08584},
eprinttype = {arxiv},
eprintclass = {cs.HC},
url = {https://arxiv.org/abs/2109.08584},
language = {english},
}
- To report a bug, post an issue on the Toloka/bugreport page.
- To find answers to common questions or start a new discussion, join our English-speaking Slack community.
ยฉ Crowd-Kit team authors, 2020โ2023. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.