Giter VIP home page Giter VIP logo

apibench's Introduction

APIBench

image image

APIBench is the benchmark for evaluating the performance of API recommendation approaches released in paper "Revisiting, Benchmarking and Exploring API Recommendation: How Far Are We?".

Download the Benchmark

As GitHub does not hold large datasets, you can download the benchmark dataset at Zenodo.

Benchmark Details

Currently APIBench contains two sub-dataset for evaluating the performance of query-based and code-based API recommendation approaches, namely APIBench-Q and APIBench-C. Each sub-dataset has a Java version and a Python version.

Task Definition

Here we give the definitions of query-based and code-based API recommendation:

Query-based API recommendation: Approaches for query-based API recommendation aim at providing related APIs to developers given a query which describes programming requirements in natural language. The approaches can inform developers which API to use for a programming task.

Code-based API recommendation: Approaches for code-based API recommendation aim at predicting the next API given the code surrounding the point of prediction. They can directly improve the efficiency of coding.

APIBench-Q

APIBench-Q is located in the folder APIBench/APIBench_Q. There are a Python folder and a Java folder in it for the Python and Java version of APIBench-Q.

Currently there are two json files in each version:

Original{Java,Python}Queries.json: It contains the original queries, corrsponding APIs and API classes, along with the source the query collected from (Stack Overflow or Tutorial Websites).

Reformulated{Java,Python}Queries.json : It contains all the elements in Original{Java,Python}Queries.json. Besides, it contains the reformulated queries derive from the original queries. The reformulated queries are wrapped by @.

Currently we apply the following query reformulation techniques to process the original queries:

Technique Source
RACK https://github.com/masud-technope/RACK-Replication-Package
NLP2API https://github.com/masud-technope/NLP2API-Replication-Package
SEQUER https://github.com/kbcao/sequer
Google Prediction Service http://suggestqueries.google.com/complete/search?
NLPAUG https://github.com/makcedward/nlpaug

Number of queries in APIBench-Q:

Original Expanded Modified
Python 4,309 173,517 224,068
Java 6,563 400,126 341,276

APIBench-C

APIBench-C is located in the folder APIBench/APIBench_C. There are a Python folder and a Java folder in it for the Python and Java version of APIBench-C.

Currently there are two zip files and a json file in each version:

{Java,Python}_MetaData.zip: It contains the metadata json files for code in each domain. The long, normal and short kewords indicate the function with extremely long, moderate and extremely short lengths.

{Java, Python}_Code.zip: It contains the source code of repositories we mined from GitHub at April, 2021. Only .py and .java files are reserved.

{Java, Python}RepoInfo.json: It contains the information of Github repositories we collected, including the lines of code, code ratio, domain, files, forks, stars, addresses, etc.

The structure of metadata:

"$filename$": {
		"$classname$" or "@Global@": {
      "$funcname$":{
        "@FuncLoc@":[
          $startlineno$,
          $endlineno$
        ],
        "$APIname$":[
          [
            $lineno$,
            $columnno$,
            "pure" or "attr",  #whether called from a class
            "Front", "Mdiddle" or "Back" #location of recommendation point
          ],
          ...
          ,
          "Standard", "User-defined", "Popular", or "Unknown" #category of APIs
        ]
      }
    }
}

Statistics of APIBench-C:

image

Baseline Results

To facilitate further research on API recommendation and reduce the burden of re-implementing different baselines, we release all the evaluation results and outputs of 11 baselines along with 4 IDEs discussed in the paper.

Please go to the experiment_resultsfolder for further detailed information.

Cite Us

If you use our benchmark dataset and related experiment results or code, please cite us:

@misc{peng2021revisiting,
      title={Revisiting, Benchmarking and Exploring API Recommendation: How Far Are We?}, 
      author={Yun Peng and Shuqing Li and Wenwei Gu and Yichen Li and Wenxuan Wang and Cuiyun Gao and Michael Lyu},
      year={2021},
      eprint={2112.12653},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

Contact

If you have any questions, please contact [email protected].

apibench's People

Contributors

johnnypeng18 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.