The typeevalpy from secure-software-engineering

A Micro-benchmarking Framework for Python Type Inference Tools

📌 Features:

📜 Contains 154 code snippets to test and benchmark.
🏷 Offers 845 type annotations across a diverse set of Python functionalities.
📂 Organized into 18 distinct categories targeting various Python features.
🚢 Seamlessly manages the execution of containerized tools.
🔄 Efficiently transforms inferred types into a standardized format.
📊 Automatically produces meaningful metrics for in-depth assessment and comparison.

🛠️ Supported Tools

Supported ✅	In-progress 🔧	Planned 💡
HeaderGen	Intellij PSI	MonkeyType
Jedi	Pyre	Pyannotate
Pyright	PySonar2
HiTyper	Pytype
Scalpel	TypeT5
Type4Py
GPT-4
Ollama

🏆 TypeEvalPy Leaderboard

Below is a comparison showcasing exact matches across different tools, coupled with top_n predictions for ML-based tools.

Rank	🛠️ Tool	Top-n	Function Return Type	Function Parameter Type	Local Variable Type	Total
1	HeaderGen	1	186	56	322	564
2	Jedi	1	122	0	293	415
3	Pyright	1	100	8	297	405
4	HiTyper	1 3 5	163 173 175	27 37 37	179 225 229	369 435 441
5	HiTyper (static)	1	141	7	102	250
6	Scalpel	1	155	32	6	193
7	Type4Py	1 3 5	39 103 109	19 31 31	99 167 174	157 301 314

_{(Auto-generated based on the the analysis run on 20 Oct 2023)}

🏆🤖 TypeEvalPy LLM Leaderboard

Below is a comparison showcasing exact matches for LLMs.

Rank	🛠️ Tool	Function Return Type	Function Parameter Type	Local Variable Type	Total
1	GPT-4	225	85	465	775
2	codellama:13b-instruct	199	75	425	699
3	GPT 3.5 Turbo	188	73	429	690
4	codellama:34b-instruct	190	52	425	667
5	phind-codellama:34b-v2	182	60	399	641
6	codellama:7b-instruct	171	72	384	627
7	dolphin-mistral	184	76	356	616
8	codebooga	186	56	354	596
9	llama2:70b	168	55	342	565
10	HeaderGen	186	56	321	563
11	wizardcoder:13b-python	170	74	317	561
12	llama2:13b	153	40	283	476
13	mistral:instruct	155	45	250	450
14	mistral:v0.2	155	45	248	448
15	vicuna:13b	153	35	260	448
16	vicuna:33b	133	29	267	429
17	wizardcoder:7b-python	103	48	254	405
18	llama2:7b	140	34	216	390
19	HiTyper	163	27	179	369
20	wizardcoder:34b-python	140	43	178	361
21	orca2:7b	117	27	184	328
22	vicuna:7b	131	17	172	320
23	orca2:13b	113	19	166	298
24	tinyllama	3	0	23	26
25	phind-codellama:34b-python	5	0	15	20
26	codellama:13b-python	0	0	0	0
27	codellama:34b-python	0	0	0	0
28	codellama:7b-python	0	0	0	0

_{(Auto-generated based on the the analysis run on 14 Jan 2024)}

🐳 Running with Docker

1️⃣ Clone the repo

git clone https://github.com/secure-software-engineering/TypeEvalPy.git

2️⃣ Build Docker image

docker build -t typeevalpy .

3️⃣ Run TypeEvalPy

🕒 Takes about 30mins on first run to build Docker containers.

📂 Results will be generated in the results folder within the root directory of the repository. Each results folder will have a timestamp, allowing you to easily track and compare different runs.

Correlation of CSV Files Generated to Tables in ICSE Paper

Here is how the auto-generated CSV tables relate to the paper's tables:

Table 1 in the paper is derived from three auto-generated CSV tables:
- paper_table_1.csv - details Exact matches by type category.
- paper_table_2.csv - lists Exact matches for 18 micro-benchmark categories.
- paper_table_3.csv - provides Sound and Complete values for tools.
Table 2 in the paper is based on the following CSV table:
- paper_table_5.csv - shows Exact matches with top_n values for machine learning tools.

Additionally, there are CSV tables that are not included in the paper:

paper_table_4.csv - containing Sound and Complete values for 18 micro-benchmark categories.
paper_table_6.csv - featuring Sensitivity analysis.

docker run \
      -v /var/run/docker.sock:/var/run/docker.sock \
      -v ./results:/app/results \
      typeevalpy

🔧 Optionally, run analysis on specific tools:

docker run \
      -v /var/run/docker.sock:/var/run/docker.sock \
      -v ./results:/app/results \
      typeevalpy --runners headergen scalpel

🛠️ Available options: headergen, pyright, scalpel, jedi, hityper, type4py, hityperdl

🤖 Running TypeEvalPy with LLMs

TypeEvalPy integrates with LLMs through Ollama, streamlining their management. Begin by setting up your environment:

Create Configuration File: Copy the config_template.yaml from the src directory and rename it to config.yaml.

In the config.yaml, configure in the following:

openai_key: your key for accessing OpenAI's models.
ollama_url: the URL for your Ollama instance. For simplicity, we recommend deploying Ollama using their Docker container. Get started with Ollama here.
prompt_id: set this to questions_based_2 for optimal performance, based on our tests.
ollama_models: select a list of model tags from the Ollama library. For better operation, ensure the model is pre-downloaded with the ollama pull command.

With the config.yaml configured, run the following command:

docker run \
      -v /var/run/docker.sock:/var/run/docker.sock \
      -v ./results:/app/results \
      typeevalpy --runners ollama

Running From Source...

1. 📥 Installation

Clone the repo

git clone https://github.com/secure-software-engineering/TypeEvalPy.git

Install Dependencies and Set Up Virtual Environment

Run the following commands to set up your virtual environment and activate the virtual environment.
```
python3 -m venv .env
```
```
source .env/bin/activate
```
```
pip install -r requirements.txt
```

2. 🚀 Usage: Running the Analysis

Navigate to the src Directory
```
cd src
```
Execute the Analyzer

Run the following command to start the benchmarking process on all tools:
```
python main_runner.py
```
or

Run analysis on specific tools
```
python main_runner.py --runners headergen scalpel
```

🤝 Contributing

Thank you for your interest in contributing! To add support for a new tool, please utilize the Docker templates provided in our repository. After implementing and testing your tool, please submit a pull request (PR) with a descriptive message. Our maintainers will review your submission, and merge them.

To get started with integrating your tool, please follow the guide here: docs/Tool_Integration_Guide.md

⭐️ Show Your Support

Give a ⭐️ if this project helped you!

Some feedback on improving `TypeEvalPy`

Hi @ashwinprasadme and @Samzcodez,

First of all, great job! I have managed to run and evaluate Type4Py on the micro-benchmark with almost one command. The automation is great! That said, I have some comments that might help improve the user experience with TypeEvalPy, especially with the installation process.

Currently, one needs to run setup.sh, i.e., a shell script, to install the benchmark. This is good, though, but in general, people in the community are a bit hesitant to run a shell script for security reasons. Therefore, I suggest to create a proper Python package with setup.py file which installs the required dependencies and adds a command to the user's environment, e.g., typeevalpy. Then, the user can run the benchmark with one command, say, typeevalpy run. Or even typeevalpy run type4py, if one wants to run only one benchmark. This way, people only install a Python package, which should be also platform-agnostic, at least for Linux distributions.
I could not find .results folder containing the benchmark output. It'd also be helpful to log the path to the results folder so that the user can find it easily.
After running hityperdl, I also encountered this error below, although I could see the benchmark results at the end of the run.

b"Python is running inside a Docker container\n2023-09-08 14:11:01,441 - runner - INFO - Command returned non-zero exit status: [Errno 2] No such file or directory: '/tmp/micro-benchmark/python_features/builtins/switch/._main_INFERREDTYPES.json' for file: /tmp/micro-benchmark/python_features/builtins/switch/main.py\n"

In general, I like the TypeEvalPy benchmark and it is pretty easy to use it.

secure-software-engineering / typeevalpy Goto Github PK

typeevalpy's Introduction

A Micro-benchmarking Framework for Python Type Inference Tools

📌 Features:

🛠️ Supported Tools

🏆 TypeEvalPy Leaderboard

🏆🤖 TypeEvalPy LLM Leaderboard

🐳 Running with Docker

1️⃣ Clone the repo

2️⃣ Build Docker image

3️⃣ Run TypeEvalPy

🤖 Running TypeEvalPy with LLMs

1. 📥 Installation

2. 🚀 Usage: Running the Analysis

🤝 Contributing

⭐️ Show Your Support

typeevalpy's People

Contributors

Stargazers

Watchers

Forkers

typeevalpy's Issues

Recommend Projects

Recommend Topics

Recommend Org