Replicate the results of nowcasting housing sales by Google Queries, using Bayesian Structural Time-Series Model (Choi & Varian, 2009, 2012).
- Scott, S. L., & Varian, H. R. (2014). "Predicting the present with bayesian structural time series". International Journal of Mathematical Modelling and Numerical Optimisation, 5(1-2), 4-23.
- Scott, S. L., & Varian, H. R. (2015). "Bayesian variable selection for nowcasting economic time series". In Economic analysis of the digital economy (pp. 119-135). University of Chicago Press.
- [More intuitive] Varian, H. R. (2014). "Big data: New tricks for econometrics". Journal of Economic Perspectives, 28(2), 3-28.
Nowcasting - The needs of timely estimating current values (Housing Sales), which are usually available with publication lags motivates to use the Google Queries (nearly real-time (as potential predictors). By Google Correlate, we can derive the hundred of google "keywords" searching most correlated with our target time-series (Housing Sales).
This decompose the target time series into different components: i) Time Components (Trend, Seasonality, etc.); ii) Regression Component (Google Predictors)
- Structural Time-series model (Kalman Filter) for time components
- Spike-and-Slab Regression for regression components
- Markov Cahin Monte Carlo Simulation
This method enables us to decompose the time-series and analyse the contribution of each components to the target time-series
Incremental Fit Plot of Housing Sales, by adding respectively:
- Trend
- Seasonality
- First and Second Important Google Keywords
One should bear in mind the nature of this data is high-dimensional. Not all google queries are meaningful predictors. We need a mechanism for variables selections, and Spike-and-Slab approach is used. Predictors with high inclusion probabiliries are more important.