reghdfe
is a Stata package that estimates linear regressions with multiple levels of fixed effects. It works as a generalization of the built-in areg
, xtreg,fe
and xtivreg,fe
regression commands. It's objectives are similar to the R package lfe by Simen Gaure. It's features include:
- A novel and robust algorithm that efficiently absorbs multiple fixed effects. It improves on the work by Abowd et al, 2002, Guimaraes and Portugal, 2010 and Simen Gaure, 2013. This algorithm works particularly well on "hard cases" that converge very slowly (or fail to converge) with the existing algorithms.
- Extremely fast compared to similar Stata programs.
- With one fixed effect and clustered-standard errors, it is 3-4 times faster than
areg
andxtreg,fe
(see benchmarks). Note: speed improvements in Stata 14 have reduced this gap. - With multiple fixed effects, it is at least an order of magnitude faster that the alternatives (
reg2hdfe
,a2reg
,felsdvreg
,res2fe
, etc.). Note: a recent paper by Somaini and Wolak, 2015 reported thatres2fe
was faster thanreghdfe
on some scenarios (namely, with only two fixed effects, where the second fixed effect was low-dimensional). This is no longer correct for the current version ofreghdfe
, which outperformsres2fe
even on the authors' benchmark (with a low-dimensional second fixed effect; see the benchmark results and the Stata code).
- With one fixed effect and clustered-standard errors, it is 3-4 times faster than
- Allows two- and multi-way clustering of standard errors, as described in Cameron et al (2011)
- Allows an extensive list of robust variance estimators (thanks to the avar package by Kit Baum and Mark Schaffer).
- Works with instrumental-variable and GMM estimators (such as two-step-GMM, LIML, etc.) thanks to the ivreg2 routine by Baum, Schaffer and Stillman.
- Allows multiple heterogeneous slopes (e.g. a separate slope coefficients for each individual).
- Supports all standard Stata features:
- Frequency, probability, and analytic weights.
- Time-series and factor variables.
- Fixed effects and cluster variables can be expressed as factor interactions, for both convenience and speed (e.g. directly using
state#year
instead of previously usingegen group
to generate the state-year combination). - Postestimation commands such as
predict
andtest
.
- Allows precomputing results with the
cache()
option, so subsequent regressions are faster. - If requested, saves the point estimates of the fixed effects (caveat emptor: these fixed effects may not be consistent nor identifiable; see the Abowd paper for an introduction to the topic).
- Calculates the degrees-of-freedom lost due to the fixed effects (beyond two levels of fixed effects this is still an open problem, but we provide a conservative upper bound).
- Avoids common pitfalls, by excluding singleton groups (see notes), computing correct within- adjusted-R-squares (see initial discussion), etc.
Sergio Correia
Fuqua School of Business, Duke University
Email: [email protected]
This package wouldn't have existed without the invaluable feedback and contributions of Paulo Guimaraes, Amine Ouazad, Mark Schaffer and Kit Baum. Also invaluable are the great bug-spotting abilities of many users.
reghdfe
is a free contribution to the research community, like a paper. Please cite it as such:
Sergio Correia, 2015. reghdfe: Stata module for linear and instrumental-variable/GMM regression absorbing multiple levels of fixed effects.
This is the readme file for developing the reghdfe project, which is comprised of the reghdfe package and the underlying hdfe package. The help files and tutorials are available here (work in progress).
Latest version
- Version 3.2.1
- Date: July 25, 2015
The latest stable release (version 3.2.1) can be downloaded from SSC with
cap ado uninstall reghdfe
ssc install reghdfe
The latest dev. release (3.2.x) can be installed with
cap ado uninstall reghdfe
net install reghdfe, from(http://scorreia.com/stata/reghdfe)
It can also be installed manually:
- Download the zipfile
- Extract it into a folder (e.g. C:\SOMEFOLDER)
- Run: (changing SOMEFOLDER with whatever you picked)
cap ado uninstall reghdfe
net install reghdfe, from("C:\SOMEFOLDER")
To find out which version you have installed, type reghdfe, version
.
hdfe
is a routine that facilitates absorbing multiple fixed effects in other Stata packages. It is similar to avar
in that it is a building-block routine that other packages may call (for instance, see regife and poi2hdfe)
The latest stable release (version 3.2.1) can be downloaded from SSC with
cap ado uninstall hdfe
ssc install hdfe
The latest dev. release (3.2.x) can be installed with
cap ado uninstall reghdfe
net install hdfe, from(http://scorreia.com/stata/reghdfe)
It can also be installed manually:
- Download the zipfile
- Extract it into a folder (e.g. C:\SOMEFOLDER)
- Run: (changing SOMEFOLDER with whatever you picked)
cap ado uninstall hdfe
net install hdfe, from("C:\SOMEFOLDER")
- 3.2 Fixed bug where a slopes-only model (i.e. no constant or intercepts) returned incorrect alphas (estimates for the fixed effects). Note that the estimates for the betas were unaffected. Thanks to Matthieu Gomez for the bug report
- 3.1 Improved syntax for the
cache()
andstage()
options - 3.0 Three key changes: i) faster underlying algorithm (symmetric transforms and cg acceleration perform much better on "hard" cases); ii) slow parts rewritten in mata, iii) simpler syntax
- 2.2 [internal] murphy-topel (unadjusted, robust, cluster), double-or-nothing IV/control function
- 2.1 removed
_cons
. If you really want to see the constant, run summarize on the first fixed effect. The last version that supported constants is available withnet from https://raw.githubusercontent.com/sergiocorreia/reghdfe/866f85551b77fe7fda2af0aafccbbf87f8a01987/package/
- 4.0 Improve underlying algorithm with GT preconditioning
- 5.0 Increase features for recovering the fixed effects. For instance, bootstrapping the standard errors, a better algorithm (Kaczmarz) for recovering the point estimates, and a wider set of statistics for the standard errors. If you currently require any of those, I recommend the lfe package by Simen Gaure (for R users) and the reg package by Matthieu Gomez (for Julia users).
- 6.0 Additional variance-covariance estimators. In particular, Conley Spatial HAC [2] (http://freigeist.devmag.net/category/economics/econometrics) and Cattaneo-Jansson-Newey heteroskedasticity-and-many-covariantes robust errors (similar to
vce(robust)
but correcting for the fact that the number of covariantes is increasing asymptotically, which solves Stock and Watson's critique).
Contributors and pull requests are more than welcome. There are a number of extension possibilities, such as estimating standard errors for the fixed effects using bootstrapping, exact computation of degrees-of-freedom for more than two HDFEs, and further improvements in the underlying algorithm.
For clarity reasons, the source code is spread through several files and folders (in the source folder). To modify and rebuild the package, do the following:
- Download the entire project to your computer (through the "Clone Desktop" or "Download ZIP" buttons on the right).
- Uninstall any existing versions of reghdfe (
ado uninstall reghdfe
in Stata). - Do any changes that you want on the files in that folder. You can run reghdfe without problems as long as the working directory is in that folder.
- To build the package, run the build.py file (in the build folder), using Python 3.x. This python script will carefully combine all the files and update the version/date.
- Install it using
net install reghdfe, from(PATH_OF_THE_PACKAGE_FOLDER)
- Finally, you can upload it back to github and submit a pull request.