Comments (7)
Thanks for catching the bug. Think we forgot to add a "> 0" on line 29.
I'll push the big fix soon.
As for your other question. I thought it was a good metric that ultimately wasn't used to be able to measure how often does the model even produce code that runs compared to gibberish. So compile errors were worse in my opinion than runtime errors. It can also be seen as measuring grammar vs semantic errors in a loose way.
from apps.
BTW APPS is now available on Hugging Face hub https://huggingface.co/datasets/codeparrot/apps and we're currently adding the evaluation metric
from apps.
I'll make the changes you suggested but also feel free to make a pull requests too. Thanks for looking through it and adding it to hugging face!
from apps.
Great I'll open a PR! I saw that you already changed it thanks!
from apps.
Okay I think now with the examples and documentation it is working correctly and as intended. So I think this issue is good to close now. Feel free to reopen if there's something that was missed.
from apps.
Also for the comment regarding the expressions. The following should work provided they're numpy arrays:
import numpy as np
a = [-2, -1, 0, 1, -2]
a[a==-2] # outputs -2 which is not what we expect
b = np.asarray(a)
b==-2 # outputs array([ True, False, False, False, True])
# Then the line below returns the following, which is what we expect. The length of which is 2.
b[b==-2] # outputs array([-2, -2])
from apps.
Thank you for your reply and for the fix! Regarding the comment above tmp_results
is defined as a list in the function, maybe we could add res.extend(np.array(results[index]))
here
apps/eval/test_one_solution.py
Line 27 in d5c8e99
and
tmp_results = np.array(res)
here apps/eval/test_one_solution.py
Line 30 in d5c8e99
from apps.
Related Issues (20)
- Show a data instance in the readme HOT 2
- evaluation on multiple solutions at once causes memory leak HOT 14
- Nan test case average HOT 5
- Test case average of solutions in real dataset HOT 8
- Running instructions HOT 4
- Request for scripts of fine-tuning HOT 3
- Problems with fine-tuning
- Problems With APPS HOT 4
- Too Long Problems HOT 8
- Unable to run pre-trained (1.5B) model on test set HOT 2
- answer_type calculation is different for train/val and eval HOT 1
- Steps About Generated Code Solutions Post-processing HOT 1
- About Solutiions' validity HOT 1
- check5 in function "run_test" seem to bring some wrong result HOT 2
- Can this dataset test for chatgpt?(gpt 3.5?) HOT 10
- Problem in ground-truth solutions HOT 2
- Asking for scripts for pre-processing HOT 2
- Request for pretrained models HOT 2
- DeepSpeed config and TrainingArguments mismatch HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apps.