Giter VIP home page Giter VIP logo

dopamin's Introduction

Dopamin: Transformer-based Comment Classifiers through Domain Post-training and Multi-level layer aggregation

This repository includes our implementation for training, testing, and utilizing Dopamin, which is our submission for NLBSE'24 Tool Competition: Code Comment Classification.

Quickstart Guide

Set up

Clone Dopamin repo:

git clone https://github.com/FSoft-AI4Code/Dopamin.git
cd Dopamin

Python >= 3.8

Install requirements: pip install -r requirements.txt

Note: We employ 2 NVIDIA A100 GPUs for training the model, configuring a batch size of 32 per GPU, thus the total batchsize is 64. However, replication may not be feasible when utilizing a single GPU with a batch size of 64.

Data preparation

Create data for the post-training stage:

python process_data.py --save_dir ./code-comment-classification/processed_data/all --post_training

Create training and evaluation set:

python process_data.py --save_dir ./code-comment-classification/processed_data/valid --validation

Original_data:

python process_data.py --save_dir ./code-comment-classification/processed_data/novalid

Training

All training and evaluation scripts can be found in training Dopamin

Post-training stage

python training/autorun.py --output_dir ./models/Dopamin_post_training --post_training

You can reuse the post-trained model at dopamin-post-training. Skip this stage to reuse the post-trained model.

Training Dopamin for each category

  1. Training model with validation set to obtain the best checkpoint step
python training/autorun.py --output_dir ./models/Dopamin_valid --validation
  1. Training model with original training data with the found optimal step
python training/autorun.py --output_dir ./models/Dopamin --optimal_step_dir ./models/Dopamin_valid

Evaluation

To run the evaluation of Dopamin, please refer to the evaluation notebook or if you want to use the script:

python training/predict.py --model_name codebert-hsum \
                           --model_path ./models/Dopamin \

All model checkpoints are publicity available at Huggingface Hub - Dopamin for replication purposes.

Citation

@software{
  Dopamin_2024,
  author = {Hai, Nam Le and Bui, Nghi DQ},
  year = {2024},
  title = {Dopamin: Transformer-based Comment Classifiers through Domain Post-training and Multi-level layer aggregation},
  url = {https://github.com/FSoft-AI4Code/Dopamin},
  huggingface= {https://huggingface.co/collections/Fsoft-AIC/dopamin-6575bdeb7068a850897e4404}
}

dopamin's People

Contributors

namcyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.