Giter VIP home page Giter VIP logo

dpminer's Introduction

DPMiner : Mining Repository Tool

DPMiner is an integrated framework that can collect various types of data required for defect prediction through a single program.

Contents of DPMiner

What is DPMiner

1. Repository list

A list of repository URLs matching the conditions desired by the user is extracted from the version control system and the open source repository, GitHub.

To extract the URL list, DPminer use Search API among GitHub REST APIs. Search API provided by GitHub can receive a list of 100 repository URLs per page by sending information about conditions in query format. This framework can collect all of the project repository URLs corresponding to the condition by collecting a list of repository URLs for several queries.

Possible conditions

  • commit Count Base
  • recent Date
  • fork Number
  • language Type
  • author Token

2. Patch

The patch is function to collects bug fixing commit(BFC). There are three ways to collect bug fixing commit(BFC)

  • Jira
    Jira is a repository for managing issues. Jira manages the project with a label indicating the nature of the issue and status information, which is the progress of the issue. DPMiner collects commit hash whose label is bug and progress status is Close. Find Jira key example

  • GitHub Issue
    GitHub provides an issue function for efficient project management. GitHub helps manage version upgrades, defect detection, and feature enhancements by assigning issue. And the status of the issue is marked as open or closed. DPMiner collects data by considering the issue is a bug and the state is closed as BFC.

  • Commit message
    Commit messages are recorded using keywords important to each commit for developers to efficiently maintain and collaborate. If there are "bug" and "fix" keywords in the commit message, that commit considers as BFC. DPMiner collects commit hash whose commit message have "bug" and "fix" keyword.

3. BIC

After collecting BFC (Bug Fix Commits) by the method described in Patch, BIC (Bug Introducing Commits) is collected by using SZZ algorithm. In this framework, two SZZ algorithm are used.

  • B-SZZ The B-SZZ algorithm is an algorithm that finds the commit that introduced the bug by executing git blame on the modified line of the commit that fixed the bug. It is a basic szz algorithm.

  • AG-SZZ The AG-SZZ algorithm uses Annotation Graph to correct blank lines, format changes, comments, and remove outlier BFCs that modify too many files at once. The annotation graph is created from the first commit to the commit that contains the defect correction information, and then the DFS algorithm is applied to the line where the defect is corrected to find the line causing the defect.

4. Metic

The metric is information of source code for defect prediction.

  • Characteristic Vector
    Characteristic Vector is a metric representing the structural change of the source code.

  • Bag of Words
    Bag of Words is a metric that measures the frequency of occurrences of words after breaking up sentences into word units in source code and commit messages.

  • Meta data
    Meta data consists of 25 types of data such as modified lines and added lines.

How to build Gradle

 $ ./gradlew distZip 

or

 $ gradle distZip 

After the command, unzip "build/distributions/DPMiner.zip"
The executable file is in build/distributions/DPMiner/bin
There are two executable files. One is DPMiner.bat, the other is DPMiner.
Window use DPMiner.bat, Linux or Mac OS use DPMiner.

If you have trouble to build using gradlew, enter

$ gradle wrap

Options

Common options

Option Description
-i* input path
-o* output path
  • * : -i and -o are required.

1. Repository list

Command : findrepo

Option Description usage
-c create Date -c 2019-01-01..2020-01-15
-cb commit Count Base -cb less500 -cb over500
-d recent Date -d 2019-01-01..2020-06-30
-f fork Num -f 10..200
-l language Type -l java
-auth* auth Token -auth "Auth Token"
-o* output path -o /Users/Desktop/repository
  • * : -auth and -o* are required.
findrepo -o /Users/Desktop/repository -l java -auth "Auth Token" 
findrepo -o /Users/Desktop/repository -c 2019-01-01..2020-01-15 -f 10..200 -auth "Auth Token"
findrepo -o /Users/Desktop/repository -d 2019-01-01..2020-06-30 -cb over500 -auth "Auth Token"

2. Patch

Command : patch

Option Option
-ij jira url -jk* jira keyword
-ik commit message -k bug keyword (default : bug,fix)
-ig github issue -l issue bug label (default : bug)
  • One of -ij, -ik and -ig is mandatory
  • * : -jk is required when using option -ij.
Jira example
//patch -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key"
patch -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI 
Github example (-l option)
//patch -i "Github URL" -o "local directory path"/"ProjectName"/patch -ig -l "issue keyword"
patch -i https://github.com/apache/camel-quarkus -o /Users/Desktop/camel-quarkus/patch -ig 
patch -i https://github.com/google/guava -o /Users/Desktop/camel-quarkus/patch -ig -l type=defect
Commit message example (-k option)
//patch -i "Github URL" -o "local directory path"/"ProjectName"/patch -ik -k "bug keyword"
patch -i https://github.com/facebook/facebook-android-sdk -o /Users/Desktop/juddi/patch -ik
patch -i https://github.com/facebook/facebook-android-sdk -o /Users/Desktop/juddi/patch -ik -k help 

3. BIC

Command : bic (Same with patch option table)

SZZ Option Description
-z BSZZ Git Blame (default)
-z AGSZZ Annotation Graph
  • -z option is not required.
Jira example (BSZZ)
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key" -z "SZZ Mode"
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI

Github example (BSZZ)
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ig -l "issue keyword"
bic -i https://github.com/google/guava -o /Users/Desktop/camel-quarkus/patch -ig -l type=defect
Commit message example (BSZZ)
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ik -k "bug keyword"
bic -i https://github.com/facebook/facebook-android-sdk -o /Users/Desktop/juddi/patch -ik 
AG-SZZ and B-SZZ example (Jira)
//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key" -z BSZZ
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI -z BSZZ

//bic -i "Github URL" -o "local directory path"/"ProjectName"/patch -ij -jk "Jira Key" -z AGSZZ
bic -i https://github.com/apache/juddi -o /Users/Desktop/juddi/patch -ij -jk JUDDI -z AGSZZ

4. Metric

Command : metric

Option Description
-bp* bic csv file path
  • The metric can only be collected using file BIC_BSZZ.csv
Metric example
 //metric  -i "Github URL" -o "local directory path"/metric -bp "BIC file path"/BIC_BSZZ_"ProjectName.csv"
metric  -i https://github.com/apache/juddi -o /Users/Desktop/metric -bp /Users/Desktop/BIC_BSZZ_juddi.csv 

dpminer's People

Contributors

sungbin avatar lamb0711 avatar kongsubin avatar oneweek-hi avatar binarywoo27 avatar hguisel avatar lifove avatar sukjinkim avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.