Giter VIP home page Giter VIP logo

Comments (8)

loubnabnl avatar loubnabnl commented on July 2, 2024 2

You can use:

accelerate launch main.py \
--model codellama/CodeLlama-7b-Instruct-hf  \
--tasks humanevalsynthesize-python \
--do_sample False \
--batch_size 1 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt codellama \
--save_generations_path generations_humanevalsynthesizepython_codellama.json \
--metric_output_path evaluation_humanevalsynthesizepython_codellama.json \
--max_length_generation 2048 \
--precision fp16

from bigcode-evaluation-harness.

loubnabnl avatar loubnabnl commented on July 2, 2024

Hi, to use an instruction version of HumanEval prompt, you can use the HumenEvalSynthesize task (the one used for instruction models in the leaderboard when evaluating on Python), for example:

accelerate launch main.py \
--model bigcode/octocoder  \
--tasks humanevalsynthesize-python \
--do_sample True \
--temperature 0.2 \
--n_samples 20 \
--batch_size 5 \
--allow_code_execution \
--save_generations \
--trust_remote_code \
--prompt octocoder \
--save_generations_path generations_humanevalsynthesizepython_octocoder.json \
--metric_output_path evaluation_humanevalsynthesizepython_octocoder.json \
--max_length_generation 2048 \
--precision bf16

To change how the instruction prompt is built you can update --prompt argument check the code for the list of options (i.e the transformations that we apply to HumanEval prompts to make them instruction friendly)

from bigcode-evaluation-harness.

phqtuyen avatar phqtuyen commented on July 2, 2024

Thanks @loubnabnl , do we have to specify instruction token for this task? Much appreciated.

from bigcode-evaluation-harness.

phqtuyen avatar phqtuyen commented on July 2, 2024

Also, do you mind telling the exact setting to replicate the codellama instruct performance? Thank you so much.

from bigcode-evaluation-harness.

loubnabnl avatar loubnabnl commented on July 2, 2024

If your model uses different tokens you'll need to build a new prompt and update the code. See this PR for adding codellama prompt: https://github.com/bigcode-project/bigcode-evaluation-harness/pull/130/files

from bigcode-evaluation-harness.

phqtuyen avatar phqtuyen commented on July 2, 2024

Ah I just want to replicate the performance of codellama-intruct in HF leaderboard https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard . Do you know what config/args that they run the evaluation with? Also, is the reported number for "humanevalsynthesize-python"? Thanks.

from bigcode-evaluation-harness.

phqtuyen avatar phqtuyen commented on July 2, 2024

Thank you, another minor detail, in the leaderboard HF says that this is the setting which they use "All models were evaluated with the bigcode-evaluation-harness with top-p=0.95, temperature=0.2, max_length_generation 512, and n_samples=50.", here you use fp16, is this correct? Much appreciated.

from bigcode-evaluation-harness.

loubnabnl avatar loubnabnl commented on July 2, 2024

The displayed models were indeed evaluated in that setting, but we've found greedy to give results close to top-p sampling with 50 samples so you can use greedy to speed-up the evaluation. HumanEvalSynthesize requires sequence length of 2048 though not 512.

from bigcode-evaluation-harness.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.