The tse22 from bhpachulski

This repository contains a dataset of flaky tests associated with GitHub projects written in five different programming languages.

This dataset is the basis of the paper
Test Flakiness Across Programming Languages,
which investigates the phenomenon of flakyness across programming languages.

This repository is organized as follows:
/
├── data/ <= where the spreadsheet with the data can be found
└── src/ <= where the script to extract the issues can be found

Script (under /src)

We used a script in Python to access issues related to flakiness. We used the GitHub API for that. The script uses the following filters for issues:

From projects written in C, Go, Python, Java, or JavaScript;
Use keywords "Flaky" and "Test";
Contain label "bug";
Status is "closed".

The script caps results at 300 isues per programming language. The output is a spreadsheet (.csv) with 1500 issues, 300 issues for each programming language.

Spreadsheet (under /data)

A flaky test is associated with (1) a root cause that explains the reason for flakiness and (2) a fix strategy that explains how developers addreessed flakiness. We aimed to classified 100 root causes and fix strategy for each language. We classified a total of 591 root causes and 500 fix strategies.

The data is available in a spreadsheet with data organied in different tabs:
    (1) Data
    (2) Problem
    (3) Solution

We elaborate in the following the structure of each tab.

Data

This tab contains the Issues data and is organized in columns.

ID: Unique identifier for each Issue.
Repository / Project: Name of the repository registered on GitHub for that Issue.
Language: Language-specific to each Issue.
Status: Defines whether Flaky is True ("T"), False Positive ("F"), or Not determined ("ND").
Label: Label registered in the Issue.
Issue status: Defines the status found in the Issue.
Year: Year the Issue was created.
URL Issue: Save the issues link.
M1: Number of days until issue is closed.
M2: Number of comments until issue is closed.
Domain: Application Domain.

We also divided the spreadsheet into Problem and Solution where each one has:

Category: Defines which category (either problem or solution).
Reviewers: This column shows which author has reviewed that issue.
Description: It is an excerpt from the issue that helps to support the decision to choose the category.

Problem

This tab is the reference that supported the authors in making a decision for which category of problems issues should be categorized. This tab is organized with the columns as follows:

References: Previous work that reported the problem category.
Root Cause Category: Name used by previous authors for a given problem category.
Description: Description found in the works.
Support: Amount of times this Root Cause Category occurred in the "Data" tab.
[n]: Number of times that Root Cause Category occurred in reference works [n].

Finally, in line 17 is the sum of each column.

Solution

This tab is the reference that supported the authors in making a decision for which category of problems issues should be categorized. This tab is organized with the columns as follows:

Support: Number of times the category occurred in the "Date" tab.
References: What previous work has reported this category of solution.
Root Cause Category: Problem category belonging to this solution category.
Fix Strategy Category: Solution category.
Reported Works: Previous jobs that reported the solution category.
Killer Description: Excerpts from the works that support the decision.
Example: Practical example for the solution category.

Line 27 of this tab contains the sum of supports. Starting on line 29, we have the categories of solutions that we did not find in our data.

Research Questions

To answer survey questions, you need to refer to the sheet for more details. Where it will be possible to view the RQ's and their tables with the respective results.

bhpachulski / tse22 Goto Github PK

tse22's Introduction

Script (under /src)

Spreadsheet (under /data)

Research Questions

tse22's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent