Giter VIP home page Giter VIP logo

codeditor's Introduction

Multilingual Code Co-Evolution Using Large Language Models

This repo hosts the code and data for the following FSE 2023 paper:

Title: Multilingual Code Co-Evolution Using Large Language Models

Authors: Jiyang Zhang, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

@inproceedings{ZhangETAL23Codeditor,
  author = {Zhang, Jiyang and Nie, Pengyu and Li, Junyi Jessy and Gligoric, Milos},
  title = {Multilingual Code Co-Evolution Using Large Language Models},
  booktitle = {Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  year = {2023},
}

News

May 2024 The fine-tuned EditsTranslation model is released on 🤗 ! 🔥cs2java and java2cs

How to Use

from transformers import T5ForConditionalGeneration, AutoTokenizer

checkpoint = "EngineeringSoftware/EditsTranlation-java2cs"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = T5ForConditionalGeneration.from_pretrained(checkpoint)

code_input = """class HelloWorld { public static void main(String[] args) { System.out.println("Hello, World!")"""

input_ids = tokenizer(code_input, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=200)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
# output: <INSERT>; } } ;<INSERT_END> class HelloWorld { public static void main(String[] args) { System.out.println("Hello, World!") ; } } ;

Introduction

This repo contains the code and artifacts for reproducing the experiments in Multilingual Code Co-Evolution Using Large Language Models. In this work, we introduce Codeditor for co-evolving software implemented in multiple programming languages.

The code includes:

  • scripts for processing dataset
  • scripts for training and evaluating codeditor models

The artifacts include:

  • Java to C# raw paired changes
  • Java to C# translation dataset processed for codeditor models

Data Downloads

All our data is hosted on UTBox via a shared folder.

Code for Processing Fine-tuning Data

We provide the sample script to process the datasets for edit-translation. Requires the raw data files at raw_data/.

cd python/
python -m deltr.collector.DataProcessor edit_translation_data_process --exp cs2java --src_lang cs --tgt_lang java

Code for Training and Evaluating Models

Train ML models

cd python/
python -m deltr.coditT5.CodeT5 fit --exp_dir {MODELS_DIR}/${model_name}/${dataset} --data.dataset {dataset} --data.model ${model_name} --config  configs/coditT5.yaml

# Example: python -m deltr.coditT5.CodeT5 fit --exp_dir models/edit-translation/java2cs --data.dataset java2cs --data.model edit-translation --config  configs/coditT5.yaml

Results are generated to models/${model}/${dataset}/, where:

  • model/: stores the trained model.

  • logs/: stores logs during training.

Run ML models to do inference

Requires the dataset at data/${model}/${dataset}/, the trained model at models/${model}/${dataset}/model/.

cd python/
python -m deltr.coditT5.CodeT5 predict --exp_dir {MODELS_DIR}/${model_name}/${dataset} --data.dataset {dataset} --data.model ${model_name} --config  configs/coditT5.yaml

Results are generated to models/${model}/${dataset}/, where:

  • output.hyp: the predictions.

codeditor's People

Contributors

jiyangzhang avatar

Stargazers

Limin Wang @wlmnzf avatar Zhijie Liu avatar rax avatar Albert-Gong avatar Zhiqiang Zang avatar Hannan Naeem avatar Pengyu Nie avatar Nader Al Awar avatar Yu Liu avatar Milos Gligoric avatar Jeff Carpenter avatar

Watchers

Kostas Georgiou avatar  avatar

codeditor's Issues

Some source code was missed

When attempting to run the command

python -m deltr.coditT5.CodeT5 fit --exp_dir models/edit-translation/java2cs --data.dataset java2cs --data.model edit-translation --config configs/coditT5.yaml

an error occurred. Within the CodeT5.py file at line 31, it attempts to import compute_bleu_scores using

from deltr.eval.evaluate import compute_bleu_scores
However, no eval directory exists within the deltr directory.

Upon reviewing your paper, I noted the mention of the coditT5 model and hypothesized the missing file might have been overlooked in the repository. Fortunately, I located the necessary file and, after making minor adjustments, placed it within the codeditor. Subsequently, I encountered another issue:

(cdt) ----------------------------------------------------------------------------------------------------------
~/codeditor/python (public*) » python -m deltr.coditT5.CodeT5 fit --exp_dir models/edit-translation/java2cs --data.dataset java2cs --data.model edit-translation --config  configs/coditT5.yaml
Traceback (most recent call last):
  File "/home/yeren/miniconda3/envs/cdt/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/yeren/miniconda3/envs/cdt/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/yeren/codeditor/python/deltr/coditT5/CodeT5.py", line 451, in <module>
    DefaultLightningCLI(
  File "/home/yeren/codeditor/python/deltr/coditT5/utils.py", line 54, in __init__
    super().__init__(*args, **kwargs)
  File "/home/yeren/miniconda3/envs/cdt/lib/python3.8/site-packages/seutil/../pytorch_lightning/utilities/cli.py", line 515, in __init__
    self.setup_parser(run, main_kwargs, subparser_kwargs)
  File "/home/yeren/miniconda3/envs/cdt/lib/python3.8/site-packages/seutil/../pytorch_lightning/utilities/cli.py", line 551, in setup_parser
    self._add_subcommands(self.parser, **subparser_kwargs)
  File "/home/yeren/miniconda3/envs/cdt/lib/python3.8/site-packages/seutil/../pytorch_lightning/utilities/cli.py", line 624, in _add_subcommands
    subcommand_parser = self._prepare_subcommand_parser(trainer_class, subcommand, **kwargs.get(subcommand, {}))
  File "/home/yeren/miniconda3/envs/cdt/lib/python3.8/site-packages/seutil/../pytorch_lightning/utilities/cli.py", line 632, in _prepare_subcommand_parser
    self._add_arguments(parser)
  File "/home/yeren/miniconda3/envs/cdt/lib/python3.8/site-packages/seutil/../pytorch_lightning/utilities/cli.py", line 589, in _add_arguments
    self.add_arguments_to_parser(parser)
  File "/home/yeren/codeditor/python/deltr/coditT5/utils.py", line 120, in add_arguments_to_parser
    parser.add_lr_scheduler_args(types, nested_key, link_to)
  File "/home/yeren/miniconda3/envs/cdt/lib/python3.8/site-packages/seutil/../pytorch_lightning/utilities/cli.py", line 213, in add_lr_scheduler_args
    self.add_subclass_arguments(lr_scheduler_class, nested_key, **kwargs)
  File "/home/yeren/miniconda3/envs/cdt/lib/python3.8/site-packages/jsonargparse/_signatures.py", line 506, in add_subclass_arguments
    raise ValueError(f"Expected 'baseclass' argument to be a class or a tuple of classes: {baseclass}")
ValueError: Expected 'baseclass' argument to be a class or a tuple of classes: ()

Unsure if my actions contributed to this problem, I cloned coditT5 to attempt fine-tuning, only to face the same error. Your assistance in resolving this would be greatly appreciated.

Would model weights be available on HuggingFace?

Hello!

In your paper, you have several fine-tuned models in addition to Codeditor (e.g. CodeT5-Translation, CodeT-Update). Have you considered sharing the model weights for some of these models, similarly to how CoditT5 was shared on HuggingFace at https://huggingface.co/JiyangZhang/CoditT5?

I'm looking to try and reproduce some of the results and then try the trained models on private repos of java and C#; would be great not to have to train new models :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.