madaan / pie-perf Goto Github PK
View Code? Open in Web Editor NEWTraining language models to make programs faster
Home Page: htttps://pie4perf.com
Training language models to make programs faster
Home Page: htttps://pie4perf.com
https://github.com/madaan/pie-perf/blob/main/README.md?plain=1#L18 mentions details on the problems that can be found in the said path. But I couldn't map out where to find them. It would be greatly helpful @madaan.
For the following problems with test cases in the public_test_cases, the input.*.txt
and output.*.txt
files does not start with index 0:
p01875
p02069
p02067
p00754
p02068
p02072
p02871
p01895
p02074
p02224
p01660
p02197
p01516
p01589
p00000
p01779
p01581
p01588
p01969
p00685
p00683
p01590
p02064
p03978
p02857
p02226
p02076
p02071
p01664
p02227
p02070
p02077
p02592
p01584
p01515
p01972
p01585
p01582
p00696
p01918
This causes an error in the run_eval.py script that expects that the index starts with 0.
One solution is to simply rename these files.
Thank you for sharing the dataset and code!
I couldn't find the pretrained model trained with this data. Can you provide the CodeLlama 13B w/ FineTune used for reporting in Table 1 of the paper?
Also, is it possible to provide the synthetic data generated using GPT?
Thank you.
@madaan
Thanks for script for evaluation on the CodeNet dataset!
I am trying to evaluate some predictions on the dataset with the command:
python3 src/codenet_eval/run_eval.py --eval_config eval_files/example_eval_config.yaml
In the output report file, all input_*
and generated_answers_*
columns are either null
or 0
. I tried to submit the file (generated from by setting the temp_dir
option to run_eval.py
) to the AtCoder website and it was able to compile and run.
In attachment I have uploaded one line of the jsonl files and yaml config file. Thanks if you can take a look at this.
I checked the whole paper but failed to find any information about how you post-process the generated code.
Is there any post-processing or not?
Thanks for any reply.
Hello. Thank you so much for the great research!
I had a question while reading the paper, so I'm leaving it here.
I read section 5 "Analysis of Generated Code Edits" with great interest and was wondering if I could find more detailed information in the subsection "Comparing CODEX and CODEGEN", since it only shows the number of code edits that each model generated (i.e. optimized successfully). For example, I would like to see a list of problems that CODEX successfully optimized and the detailed %OPT/SpeedUp/%RTR for each problem.
Again, thank you so much for this great work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.