Giter VIP home page Giter VIP logo

hierarchynet's Introduction

[EACL 2024] HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations


Existing code summarization approaches primarily leverage Abstract Syntax Trees (ASTs) and sequential information from source code to generate code summaries while often overlooking the critical consideration of the interplay of dependencies among code elements and code hierarchy. However, effective summarization necessitates a holistic analysis of code snippets from three distinct aspects: lexical, syntactic, and semantic information. In this paper, we propose a novel code summarization approach utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet. HCRs adeptly capture essential code features at lexical, syntactic, and semantic levels within a hierarchical structure. Our HierarchyNet processes each layer of the HCR separately, employing a Heterogeneous Graph Transformer, a Tree-based CNN, and a Transformer Encoder. In addition, our approach demonstrates superior performance compared to fine-tuned pre-trained models, including CodeT5, and CodeBERT, as well as large language models that employ zero/few-shot settings, such as StarCoder and CodeGen.

Environment


All source code are written in Python. Besides Pytorch, we also use many other libraries such as DGL, scikit-learn, pandas, jsonlines.

Run


  1. Datasets All the datasets used in the paper are publicly accessible.

  2. Data preprocessing: Folder preprocessing is used to prepare data in the proper format before training. Go to this folder for more information.

  3. Modify the configuration file in the folder c2nl/configs such that all the paths are valid

  4. Train model


cd c2nl

bash main/train.sh

Experimental Results


  1. Baselines Examined baselines are grouped into three categories:
  • Training from scratch: PA-former, CAST, NCS

  • Fine-tuning pretrained models: CodeT5, CodeBERT

  • In-context learning: StarCoder and CodeGen-Multi 2B

  1. Results tab1 tab2 Results indicate that HierarchyNet surpasses the others with large margins on all the datasets. Our evaluations demonstrate that HierarchyNet, which utilizes a hierarchical-based architecture and dependencies information, significantly improves performance in code summarization tasks.

hierarchynet's People

Contributors

minhngh avatar bdqnghi avatar

Stargazers

Markus Rauhalahti avatar Jeff Carpenter avatar Prof. Hy Truong Son avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.