Giter VIP home page Giter VIP logo

tse22's Introduction

This repository contains a dataset of flaky tests associated with GitHub projects written in five different programming languages.

This dataset is the basis of the paper
    Test Flakiness Across Programming Languages,
which investigates the phenomenon of flakyness across programming languages.

This repository is organized as follows:
/
├── data/    <= where the spreadsheet with the data can be found
└── src/     <= where the script to extract the issues can be found

Script (under /src)

We used a script in Python to access issues related to flakiness. We used the GitHub API for that. The script uses the following filters for issues:

  1. From projects written in C, Go, Python, Java, or JavaScript;
  2. Use keywords "Flaky" and "Test";
  3. Contain label "bug";
  4. Status is "closed".

The script caps results at 300 isues per programming language. The output is a spreadsheet (.csv) with 1500 issues, 300 issues for each programming language.


Spreadsheet (under /data)

A flaky test is associated with (1) a root cause that explains the reason for flakiness and (2) a fix strategy that explains how developers addreessed flakiness. We aimed to classified 100 root causes and fix strategy for each language. We classified a total of 591 root causes and 500 fix strategies.

The data is available in a spreadsheet with data organied in different tabs:
    (1) Data
    (2) Problem
    (3) Solution

We elaborate in the following the structure of each tab.

Data

This tab contains the Issues data and is organized in columns.

  • ID: Unique identifier for each Issue.
  • Repository / Project: Name of the repository registered on GitHub for that Issue.
  • Language: Language-specific to each Issue.
  • Status: Defines whether Flaky is True ("T"), False Positive ("F"), or Not determined ("ND").
  • Label: Label registered in the Issue.
  • Issue status: Defines the status found in the Issue.
  • Year: Year the Issue was created.
  • URL Issue: Save the issues link.
  • M1: Number of days until issue is closed.
  • M2: Number of comments until issue is closed.
  • Domain: Application Domain.

We also divided the spreadsheet into Problem and Solution where each one has:

  • Category: Defines which category (either problem or solution).
  • Reviewers: This column shows which author has reviewed that issue.
  • Description: It is an excerpt from the issue that helps to support the decision to choose the category.

Problem

This tab is the reference that supported the authors in making a decision for which category of problems issues should be categorized. This tab is organized with the columns as follows:

  • References: Previous work that reported the problem category.
  • Root Cause Category: Name used by previous authors for a given problem category.
  • Description: Description found in the works.
  • Support: Amount of times this Root Cause Category occurred in the "Data" tab.
  • [n]: Number of times that Root Cause Category occurred in reference works [n].

Finally, in line 17 is the sum of each column.

Solution

This tab is the reference that supported the authors in making a decision for which category of problems issues should be categorized. This tab is organized with the columns as follows:

  • Support: Number of times the category occurred in the "Date" tab.
  • References: What previous work has reported this category of solution.
  • Root Cause Category: Problem category belonging to this solution category.
  • Fix Strategy Category: Solution category.
  • Reported Works: Previous jobs that reported the solution category.
  • Killer Description: Excerpts from the works that support the decision.
  • Example: Practical example for the solution category.

Line 27 of this tab contains the sum of supports. Starting on line 29, we have the categories of solutions that we did not find in our data.


Research Questions

To answer survey questions, you need to refer to the sheet for more details. Where it will be possible to view the RQ's and their tables with the respective results.


tse22's People

Contributors

test-flaky avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.