Giter VIP home page Giter VIP logo

Comments (8)

loubnabnl avatar loubnabnl commented on July 29, 2024 1

Hi this isn't an error in the dataset, to load it in HuggingFace hub and respect some format constraints we had to save the solutions and input_output columns in json format which led to this behaviour. But in the README of the dataset we show how to load the solutions and input_output columns correctly: https://huggingface.co/datasets/codeparrot/apps#how-to-use-it
image

from apps.

xksteven avatar xksteven commented on July 29, 2024

We filtered when we collected the solutions to check if they pass the testcases however they may be horribly inefficient such as requiring several gigabytes of RAM to execute or take a really long time to execute (sometimes minutes). This varied based on the source of the ground truth solutions. We didn't filter the solutions further to only those that were optimal or near optimal.

from apps.

sindhura97 avatar sindhura97 commented on July 29, 2024

I see. I was running evaluation by generating code by simply copying first ground truth solution this way:

from datasets import load_dataset
import json

ds = load_dataset("codeparrot/apps", split="train")
examples = {}
for eg in ds:
        for sol in eg['solutions'][2:-2].split('", "'):
            sol = sol.replace('\\n', '\n')
            examples[eg['problem_id']] = [sol]
            print ('='*10)
            print (sol)
            break
json.dump(examples, open('results/all_codes_orig_train.json','w'))

And when I run evaluation for these codes, I only got 60%. Does this seem right?

from apps.

xksteven avatar xksteven commented on July 29, 2024

You may need to select a different solution to test out. I can rerun the evaluation script to see how many optimal solutions. I might not be able to get to it for a while though as I'll need the compute in the background to re-evaluate all of the solutions and with sufficient RAM etc. So I can't really give an ETA on that.

from apps.

sindhura97 avatar sindhura97 commented on July 29, 2024

Okay, btw I see that the low 60% is due to some problems with '' character used unnecessarily in solutions at some places.

from apps.

sindhura97 avatar sindhura97 commented on July 29, 2024

Update: Doing sol = sol.replace('\n', '\n').replace('\"','"').replace('\r','').replace('\\','\').replace('\t','\t') has pushed it to >95% when I tested on first few training samples.

from apps.

xksteven avatar xksteven commented on July 29, 2024

Great thanks for the information. I thought we did that for our preprocessing but maybe something happened where it got removed.

Feel free to make a PR with the changes if you have time :)

from apps.

xksteven avatar xksteven commented on July 29, 2024

@loubnabnl Thanks for the input! Leaving the issue closed.

from apps.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.