Giter VIP home page Giter VIP logo

specurve's Introduction

specurve

Most recent update: 2024-03-31

specurve is a Stata command for Specification Curve Analysis.

Installation & update

Run the following command in Stata:

net install specurve, from("https://raw.githubusercontent.com/mgao6767/specurve/master") replace

Example usage & output

Setup

. use "http://www.stata-press.com/data/r13/nlswork.dta", clear
. copy "https://mingze-gao.com/specurve/example_config_nlswork_reghdfe.yml" ., replace

Regressions with reghdfe

Basic usage

. specurve using example_config_nlswork_reghdfe.yml, saving(specurve_demo)

The output is

[specurve] 11:43:54 - 84 total specifications to estimate.
[specurve] 11:43:54 - Estimating model 1 of 84
[specurve] 11:43:54 - Estimating model 2 of 84
......
[specurve] 11:43:57 - Estimating model 84 of 84
[specurve] 11:43:57 - 81 out of 84 models have point estimates significant at 1% level.
[specurve] 11:43:57 - 84 out of 84 models have point estimates significant at 5% level.
[specurve] 11:43:57 - Plotting specification curve...
file specurve_demo.gph saved
[specurve] 11:43:58 - Completed.
[specurve] use frame change specurve to see results
[specurve] use frame change default to switch back to current frame

example_reghdfe

Display options

In this first example, we

  • turn off the benchmark line,
  • set the number of decimals to display to 4,
  • set the number of y-axis ticks/labels to 8, and
  • display the coefficients in descending order.
. specurve using example_config_nlswork_reghdfe.yml, nob yticks(8) rounding(.0001) desc

example_reghdfe_with_options

Given that we have only a single dependent and focal variable, it may be redundant to display dependent and focal variable in the lower panel. In this second example, we

  • turn off the display of dependent variable
  • turn off the display of focal variable
. specurve using example_config_nlswork_reghdfe.yml, nodependent nofocal

example_reghdfe_with_options_hide_dep_focal

IV regressions with ivreghdfe

. copy "https://mingze-gao.com/specurve/example_config_nlswork_ivreghdfe.yml" ., replace
. specurve using example_config_nlswork_ivreghdfe.yml, cmd(ivreghdfe) rounding(0.01) title("IV regression with ivreghdfe")

example_ivreghdfe

Check help specurve in Stata for a step-by-step guide.

Poisson pseudo-likelihood regression with ppmlhdfe

ppmlhdfe is supported. Examples to be added.

Advanced usage

Sometimes, we are interested in combinations of controls. We can use the controlvariablebygroup option to present a more concise plot. See Issue 2 for a related discussion.

The plot below demonstrate the difference. The left one does not set controlvariablebygroup and the right one does. It's obvious that if we have say 2^6=64 models, the left one needs 64 lines but the right one uses only 6 lines for showing specifications.

example_controlvariablebygroup

However, to achieve this, we MUST make a small change in the configuration file. We need to use "comma-followed-by-space" style to label each control variable choice. This allows the program to parse the combination of control variables that forms the prevailing model specification. By default, the program assumes the entirety of the label uniquely identifies a specification of control variables.

To produce the example, use the following code.

. copy https://mingze-gao.com/specurve/example_config_nlswork_reghdfe_2.yml ., replace
. specurve using example_config_nlswork_reghdfe_2.yml, controlvariablebygroup

Post estimation

Estimation results are saved in the frame named "specurve".

Use frame change specurve to check the results.

Use frame change default to switch back to the original dataset.

Syntax

specurve using filename, [width(real) height(real) relativesize(real) scale(real) title(string) saving(name) name(string) descending outcmd output benchmark(real) nobenchmark nodependent nofocal nofixedeffect nocluster nocondition noci99 noci95 rounding(real) yticks(int) ymin(real) ymax(real) cmd(name) keepsingletons controlvariablebygroup]

Options

options Description
width(real) set width of the specification curve plot.
height(real) set height of the specification curve plot.
relativesize(real) set the size of coefficients panel relative to the entire plot. Defaults to 0.6.
scale(real) resize text, markers, and line widths.
title(string) set graph title.
saving(name) save graph as name.
name(string) set graph title as string.
descending plot coefficients in descending order.
outcmd display the full regression command.
output display all regression outputs.
benchmark(real) set the benchmark level. Defaults to 0.
nobenchmark turnoff the benchmark line
nodependent turnoff the display of dependent variable.
nofocal turnoff the display of focal variable.
nofixedeffect turnoff the display of fixed effect.
nocluster turnoff the display of standard error clustering.
nocondition turnoff the display of conditions.
noci99 turnoff the display of 99% confidence intervals.
noci95 turnoff the display of 95% confidence intervals.
rounding(real) set the rounding of y-axis labels and hence number of decimal places to display. Defaults to 0.001.
yticks(int) set the number of ticks/labels to display on y-axis. Defaults to 5.
ymin(real) set the min tick of y-axis. Default is automatically set.
ymax(real) set the max tick of y-axis. Default is automatically set.
cmd(name) set the command used to estimate models. Defaults to reghdfe. Can be one of reghdfe, ivreghdfe or ppmlhdfe.
keepsingletons keep singleton groups. Only useful when using reghdfe.
controlvariablebygroup the labels of control variables in the configuration file indicate combination of groups, instead of each indicating a distinct group. Please see the example above to better understand the difference.

Update log

2024-03-31:

  • Add support for ppmlhdfe. Simply use specurve ...., cmd(ppmlhdfe). No additional changes needed. However, note that this version does not support the expsure and offset options in ppmlhdfe.
  • Thanks to Leonhard Friedel from WHU Otto Beisheim School of Management for suggesting the features.

2024-03-07:

  • Fix a bug of options noci99 and noci95 not effective.

2024-03-03:

  • Add options noci99 and noci95 to hide the 99% and 95% confidence intervals, respectively.
  • Thanks to Kausik Chaudhuri from University of Leeds for suggesting the feature.

2024-02-18:

  • Allow for labels in "control variables" to indicate combination of variables, instead of each label indicating a unique specification of control variables. Fix #2.
  • Improve the legend. Now it shows point estimates of different significance levels.
  • Fix a typo in the help file.
  • Thanks to Leonhard Friedel from WHU Otto Beisheim School of Management for suggesting the features.

2024-01-31:

  • Add options to individually control what (not) to display in the lower panel. For example, nodependent turns off the display of dependent variable.
  • Thanks to Victor van Pelt from WHU Otto Beisheim School of Management for suggesting the feature.

2023-10-27:

  • Fix a bug about y-axis labels display due to early rounding.

2023-10-26:

  • Added options ymin and ymax to manually specify the range of y-axix.
  • Thanks to Jonas Happel from Frankfurt School of Finance & Management for suggesting the feature.

2023-07-02:

  • Added an option to turn off benchmark line.
  • Added an option to decide the number of ticks and labels on the y-axis.
  • Thanks to John Iselin from University of Maryland for suggesting the features.

2023-06-22:

  • Added a dependency check for reghdfe and ivreghdfe.
  • Thanks to Brittany O'Duffy from Oxford Internet Institute at University of Oxford for identifying the bug.

2023-04-22:

  • Fix a bug that mistakes the hashtag in Stata interactions (e.g., var1#var2) for inline comments.
  • Thanks to Kenneth Shores from University of Delaware for identifying the bug and suggesting solutions.

2023-04-03:

  • Preserve the order of choices in each group as specified in the configuration file.
  • Allow no conditions specified in the configuration file.
  • Thanks to Christopher Whaley from RAND Corp for suggesting the improvement.

2023-04-02:

  • Allow keepsingletons option for reghdfe.
  • Thanks to Ken P.Y. Wang from National Taiwan University for suggesting the improvement.

2023-02-13:

  • Remove Python dependencies.
  • Thanks to Germán Guerra from the National Institute of Public Health (Mexico), Kausik Chaudhuri from University of Leeds for numerous installation tests which ultimately lead me to rewrite specurve in pure Stata.

Troubleshooting

  • When following the help file, Stata reports error "file example_config_nlswork_1.yml could not be opened".

This is mostly due to permission error. Stata does not have write permission to the current working directory so it cannot download the example configuration file. You can solve it by changing the working directory to somewhere else.

Thanks to

Uri Simonsohn, Joseph Simmons and Leif D. Nelson for their paper "Specification Curve Analysis" (Nat Hum Behav, 2020, and previous working paper), first suggesting the specification curve.

Rawley Heimer from Boston College who visited our discipline in 2019 and introduced the Specification Curve Analysis to us in the seminar on research methods.

Martin Andresen from University of Oslo who wrote the speccurve and Hans Sievertsen from University of Bristol who wrote a speccurve demo.

Note

This Stata command was originally developed when writing my first paper, Gao, M., Leung, H., & Qiu, B. (2021) "Organization capital and executive performance incentives" at the Journal of Banking & Finance.

The earlier version depends on Stata 16's Python integration and a range of external Python packages, which has caused many compatibility issues.

The current version has removed Python dependency and implements everything from parsing configuration file, composing specifications, estimating models, to plotting specification curve in Stata Mata.

If there's any issue (likely), please contact me at [email protected]

Used in

As far as I know, specurve is used in

specurve's People

Contributors

mgao6767 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

specurve's Issues

Problem

What can I do to solve the following error?

. specurve using appliaction.yml
File "", line 1
(results_file=)
^
SyntaxError: invalid syntax
(296 lines skipped)
(error occurred while loading specurve.ado)
r(7102);

And this is my configuration file:

Each choice is independent from others

Choices:
Dependent Variable: # reserved keyword
- Number of retweets: y
Focal Variable: # reserved keyword
- Delay: delay
Control Variables: # reserved keyword
- Baseline1: a1 a2 a3 s1 s2 lnfol add
- Baseline2: a1 a2 a3 s1 s2 lnfol add gen
- Baseline3: a1 a2 a3 s1 s2 lnfol add gen comm
- Baseline3: a1 a2 a3 s1 s2 lnfol add gen comm comp

Combination of controls

Currently when making specifications, the specurve choose one from each choice group (e.g., a particular set of control variables, a particular fixed effects choice, a particulart stand error clustering method, etc.). This is intentional.

However, in some cases, we may be only interested in combining control variables within the "control variables" group. For example, we are running 64 regressions with one dependent and one independent variable, same fixed effects etc. If we have 7 sets of controls. One of these sets needs to stay always on, while for the other 6 sets, we want to have all combinations, resulting in 2^6=64 regressions. So technically, we just need 7 lines of controls with indicators and having them marked if the control sets are used. However, if we apply all 64 combinations the graph becomes too messy, such as shown below:

messy

A temporary "fix" is to use the relativesize option to shrink the size of the upper panel. This may result a better looking figure but does not solve the issue. We still have 64 lines of labels for the control variables.


I don't yet have a good solution for this. Because the program was initially written for choosing one from each choice group, it's not generalizable to consider combinations within group. Adding to the difficulty is that we need more complicated configuration YAML file. For example, one possible solution is to have

Choices:
  Control Variables - Always:
    - Baseline: age grade collgrad wks_ue ttl_exp 
  Control Variables - Optional: # cause 2^n combinations, n is number of choices in the group
    - Union: union
    - Other: hours wks_work

But for compatibility, I need to evaluate its impact on the existing "Control Variables" group - where the normal choosing-one rule applies. Alternatively, I can allow for extra instructions in comments such as:

Choices:
  Control Variables:
    - Baseline: age grade collgrad wks_ue ttl_exp # @always
    - Union: union # @for_combination
    - Other: hours wks_work # @for_combination

This latter one may work better, but also will lead to significant refactoring of the existing code.

Unfortunately I don't have the time now to re-write the code. Re-introducing Python dependency may solve the problem quite easily, but I'll probably stay with pure Stata implementation. Let me think more on this issue and solutions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.