Giter VIP home page Giter VIP logo

show_your_work's Introduction

Usage

To generate expected max curves, put a list containing the performance (accuracy, F1, or your measure of choice) of the N trained models on validation data in the main method within plot.py.

The code has a few options for better visualization: 1) log-scale the X-axis, 2) shade the variance (when comparing multiple curves the shading can be distracting), and 3) scaling the X-axis with the average runtime (when comparing approaches with very different run times, it can be more appropriate to scale by total time spent rather than number of trials).

Show Your Work: Improved Reporting of Experimental Results

This repository contains code for computing expected max validation performance curves as introduced in Show Your Work: Improved Reporting of Experimental Information.

Machine learning and NLP research often involves searching over hyperparameters. Most commonly this is done by training N models on a set of training data, evaluating each of the N models on a held-out validation set, and choosing the best of the N models to evaluate on a test set. Often, this final test number is all that's reported, but there is a lot of useful information in the other experiments. The code in this repository is meant as a way to visualize the N validation results.

Understanding expected max performance

The X-axis represents the number of hyperparameter trials (or time, if the average time for the experiments is included).

The Y-axis represents the expected max performance for a given X (number of hyperparameter trials). "If I train X models, what is the expected performance of the best one?"

The leftmost point on the curve (X = 1) is the average across the N validation scores.

The shading is +/- the sample standard error (which is similar to the standard deviation), not shaded outside the observed min and max.

If two curves cross, then the best-performing model depends on the budget. Simply saying "Model A outperforms Model B" is ill-defined.

Assumptions

This was designed as a tool for reporting the expected max performance for budgets n <= N. We leave forecasting performance with larger budgets (n > N) to future work.

This method for computing expected max performance assumes I.I.D. draws. In practice, that means it's appropriate when using random search for hyperparameter optimization, as recommended in general here. The calculation of the expected max may not be correct if the hyperparameters were chosen using manual search, grid search, or Bayesian optimization.

If you have too few points estimating statistics like the expected max might not be a good idea, and you shoud just report the values.

Citation

If you use this repository for your research, please cite:

@inproceedings{showyourwork,
 author = {Jesse Dodge and Suchin Gururangan and Dallas Card and Roy Schwartz and Noah A. Smith},
 title = {Show Your Work: Improved Reporting of Experimental Results},
 year = {2019},
 booktitle = {Proceedings of EMNLP},
}

show_your_work's People

Contributors

dodgejesse avatar

Stargazers

Alexey Zemtsov avatar Sinii Viacheslav avatar  avatar Harshitadd avatar André Cruz avatar Zejian Liu avatar Yupei avatar kk avatar Paulo Mann avatar Pattarawat Chormai avatar

Watchers

James Cloos avatar  avatar paper2code - bot avatar

Forkers

harshitadd

show_your_work's Issues

Clarification on some equations

Hi,

Thank you very much for the nice paper and sharing the code. I'm currently trying to understand some maths behind the calculation. In particular, I have questions about the following equations:
1.
image
Could you please clarify what is n here. It doesn't seem to be the parameter we vary. According to https://github.com/dodgejesse/show_your_work/blob/master/expected_max_performance.py#L4, it seems to be N or B in the paper.

image
Maybe I miss some part but what is V_i? Could it be that it's just V?

image
This is related to my first question. Maybe I might wrong, but should E[ . |n=1] be the mean of {v_1, ..., v_B}?.

Thank you very much in advance for your clarification!

Cheers,
Pat

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.