Giter VIP home page Giter VIP logo

cache's Introduction

The dataset of the paper titled "Context-Aware Code Change Embedding for Better Patch Correctness Assessment".

This is the online repository of the paper "Context-Aware Code Change Embedding for Better Patch Correctness Assessment". We release the source code of Cache, the patches used in our evaluation, as well as the experiment results.

  • Patches: two patch benchmarks included in our study.

    • Samll: The 1,183 deduplicated patches from Tian's ASE20 paper and Wang's ASE20 paper.

    • Large: The patches collected by ourselves, which is consist of 49,694 patches from RepairThemAll(ground-truth labeled by Tian et al from ) and ManySStuBs.

  • Results

    • RQ1: The detailed result files in RQ1, which are named by the format of [model]_[classifier].csv. For example, the file named BERT_DT.csv in the folder Tian's_dataset means that this file is the result of patches from Tian's study embedded by BERT and classified by Decision Tree.

      • Tian's_dataset: The detailed result files on Tian's dataset.
      • Cache_dataset: The detailed result files on our own dataset.
      • Cross_dataset: The detailed result files of representation learning techniques when training on our own dataset and testing on Tian's dataset.
    • RQ2: The detailed result files in RQ2.

      • Wang_Cache.csv: The detailed result of Cache on the dataset from Wang's ASE20 paper.
      • ODS_Cache.csv: The datailed result of Cache on the dataset from Xiong's ICSE18 paper. We directly compare against the results reported by the authors of ODS on 139 patches from Xiong's paper since the data and source code of ODS is unavailable.
  • Source: The source code and lib for running Cache.

Prerequisite

  • Java 1.7
  • Python 3.6
  • Defects4j 1.2
  • Bugs.jar
  • Bears
  • QuixBugs

## Preprocessing

Extract the buggy file and fixed file from patch

git clone https://github.com/bugs-dot-jar/bugs-dot-jar  # Bugs.jar benchmark
git clone https://github.com/bears-bugs/bears-benchmark  # Bears benchmark
git clone https://github.com/jkoppel/QuixBugs # QuixBugs benchmark
# Follow the instructions in https://github.com/rjust/defects4j to install defect 4j1.2
python3 genOverfittingPatches.py

Generate the AST paths

We reuse the ast path extractor implemented by JetBrains Research in here. To run the ASTMiner, execute the following command:

java -jar ./lib/astminer_revised.jar pathContexts --lang java --project path/to/project --output path/to/results --maxL L --maxW W --maxContexts C --maxTokens T --maxPaths P

For example:

java -Xms64g -Xmx128g -jar ./lib/astminer_revised.jar pathContexts --lang java --project ./materials --output ./dataset --maxH 9 --maxW 2 --maxContexts 200 --maxTokens 500 --maxPaths 500

Note that the space of memory the preprocessor will take up depends on the number of files and parameters. Usually, it will take up more than 60GB memory and we preproccess our dataset on a server with 128G memory.

Generate the sub-token level vocabulary

python3 genSubtokenVocab.py

Training

python3 main.py

cache's People

Contributors

ringbo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.