Giter VIP home page Giter VIP logo

fable's Introduction

fable

R-CMD-check Coverage status CRAN_Status_Badge Lifecycle: maturing

The R package fable provides a collection of commonly used univariate and multivariate time series forecasting models including exponential smoothing via state space models and automatic ARIMA modelling. These models work within the fable framework, which provides the tools to evaluate, visualise, and combine models in a workflow consistent with the tidyverse.

Installation

You can install the stable version from CRAN:

install.packages("fable")

You can install the development version from GitHub

# install.packages("remotes")
remotes::install_github("tidyverts/fable")

Installing this software requires a compiler

Example

library(fable)
library(tsibble)
library(tsibbledata)
library(lubridate)
library(dplyr)
aus_retail %>%
  filter(
    State %in% c("New South Wales", "Victoria"),
    Industry == "Department stores"
  ) %>% 
  model(
    ets = ETS(box_cox(Turnover, 0.3)),
    arima = ARIMA(log(Turnover)),
    snaive = SNAIVE(Turnover)
  ) %>%
  forecast(h = "2 years") %>% 
  autoplot(filter(aus_retail, year(Month) > 2010), level = NULL)

Learning to forecast with fable

  • The pkgdown site describes all models provided by fable, and how they are used: https://fable.tidyverts.org/
  • The forecasting principles and practices online textbook provides an introduction to time series forecasting using fable: https://otexts.com/fpp3/ (WIP)

Getting help

  • Questions about forecasting can be asked on Cross Validated.

  • Common questions about the fable package are often found on Stack Overflow. You can use this to ask for help if the question isn’t already answered. A minimally reproducible example that describes your issue is the best way to ask for help!

fable's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fable's Issues

summary.fable error

This no longer works:

fpp2::auscafe %>% 
  as_tsibble() %>%
  ETS(value) %>% 
  forecast() %>%
  summary()

Partially consistent to tidymodels

I think that some tidy modelling principles mentioned here are applicable to fablelite and fable.

At the top-level interface, some arguments could be renamed in favour of the tidymodels way (snake_case for arguments). Hence users don't need to memorise two sets of arguments b/t tidyverts and tidymodels.

fable installation fails on Windows 10

Issue: fable installation fails due to issues with tsibblestats
Setup: Windows 10, Microsoft R 3.5.0, RTools 3.5.0
Steps to reproduce: devtools::install_github("tidyverts/fable")

Output:

> devtools::install_github("tidyverts/fable")
Downloading GitHub repo tidyverts/fable@master
from URL https://api.github.com/repos/tidyverts/fable/zipball/master
Installing fable
Downloading GitHub repo tidyverse/ggplot2@master
from URL https://api.github.com/repos/tidyverse/ggplot2/zipball/master
Installing ggplot2
"C:/PROGRA~1/MICROS~2/ROPEN~1/R-35~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore  \
  --quiet CMD INSTALL  \
  "C:/Users/xxxx/AppData/Local/Temp/RtmpymMFFc/devtools37d0121b6723/tidyverse-ggplot2-79e8b45"  \
  --library="C:/Users/xxxx/Documents/R/win-library/3.5" --install-tests 

* installing *source* package 'ggplot2' ...
** R
** data
*** moving datasets to lazyload DB
** inst
** tests
** byte-compile and prepare package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : 
  namespace 'rlang' 0.2.0 is being loaded, but >= 0.2.1 is required
ERROR: lazy loading failed for package 'ggplot2'
* removing 'C:/Users/xxxx/Documents/R/win-library/3.5/ggplot2'
* restoring previous 'C:/Users/xxxx/Documents/R/win-library/3.5/ggplot2'
In R CMD INSTALL
Installation failed: Command failed (1)
Downloading GitHub repo tidyverts/tsibblestats@master
from URL https://api.github.com/repos/tidyverts/tsibblestats/zipball/master
Installing tsibblestats
Downloading GitHub repo tidyverse/ggplot2@master
from URL https://api.github.com/repos/tidyverse/ggplot2/zipball/master
Skipping ggplot2, it is already being installed.
"C:/PROGRA~1/MICROS~2/ROPEN~1/R-35~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore  \
  --quiet CMD INSTALL  \
  "C:/Users/xxxx/AppData/Local/Temp/RtmpymMFFc/devtools37d067f17852/tidyverts-tsibblestats-ff2ed9d"  \
  --library="C:/Users/xxxx/Documents/R/win-library/3.5" --install-tests 

* installing *source* package 'tsibblestats' ...
** R
** byte-compile and prepare package for lazy loading
Error : object 'autolayer' is not exported by 'namespace:ggplot2'
ERROR: lazy loading failed for package 'tsibblestats'
* removing 'C:/Users/xxxx/Documents/R/win-library/3.5/tsibblestats'
In R CMD INSTALL
Installation failed: Command failed (1)
"C:/PROGRA~1/MICROS~2/ROPEN~1/R-35~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore  \
  --quiet CMD INSTALL  \
  "C:/Users/xxxx/AppData/Local/Temp/RtmpymMFFc/devtools37d044ca70cb/tidyverts-fable-d852dd8"  \
  --library="C:/Users/xxxx/Documents/R/win-library/3.5" --install-tests 

ERROR: dependency 'tsibblestats' is not available for package 'fable'
* removing 'C:/Users/xxxx/Documents/R/win-library/3.5/fable'
In R CMD INSTALL
Installation failed: Command failed (1)

NSE with dynamic model specification and automatic modelling

There is a conflict of intention between dynamic model creation and automatic forecasting:

library(fable)
value <- as.formula('log(value) ~ pdq(0,1,1) + PDQ(0,1,1)')
USAccDeaths %>% ARIMA(log(value) ~ pdq(0,1,1) + PDQ(0,1,1))
USAccDeaths %>% ARIMA(value)

It is unclear for the second call to ARIMA, if value should refer to the formula defined above, or the value variable contained in the dataset (as_tsibble(USAccDeaths)).

First pointed out in this issue: tidyverts/fasster#28

Possible solution: attempt evaluation of the formula, if it is a formula, do nothing. If it is anything else, do automatic modelling with the input treated as an expression.

Reconciliation API

A function aggregate is used to provide aggregation structures to the tsibble, which can then be modelled and forecasted univariately. The univariate forecasts can then be reconciled according to the aggregation structure using reconcile.

tsbl %>% 
  aggregate %>% # Name TBD
  model %>% 
  forecast %>% 
  reconcile

[question] Rolling Forecasts

How would I implement rolling forecasts in fable. Specifically, one-step forecasts with re-estimation.

The following is a simple example using a random walk forecast (RW()).

library(tsibbledata)
library(fable)
library(forecast)
small_ed <- elecdemand %>% slice(1:5) 

small_ed %>% 
  slice(1:3) %>% 
  fable::RW(Demand) %>% 
  forecast(h = 2) %>% 
  summary %>% 
  right_join(small_ed) %>% 
  select(index, mean, Demand)

This gives the following

# A tsibble: 5 x 3 [30MINUTE]
  index                mean Demand
  <dttm>              <dbl>  <dbl>
1 2014-01-01 00:00:00 NA      3.91
2 2014-01-01 00:30:00 NA      3.67
3 2014-01-01 01:00:00 NA      3.50
4 2014-01-01 01:30:00  3.50   3.34
5 2014-01-01 02:00:00  3.50   3.20

but in a rolling forecast setting the last value would be 3.34

Edit:

I realize that this example can be solved with lag(), but if I were to use an ARIMA model or some other alternative it would be nice to have a solution that would extend to other models.

Model checklist

Models

Arima

  • Functional
  • Reimplement
  • Methods

ETS

  • Functional
  • Reimplement
  • Methods

naive / rwf / snaive

  • Functional
  • Reimplement
  • Methods

tslm

  • Functional
  • Reimplement
  • Methods

BATS/TBATS

  • Functional
  • Reimplement
  • Methods

baggedModel (tidyverts/fabletools#217)

  • Functional
  • Reimplement
  • Methods

dshw

  • Functional
  • Reimplement
  • Methods

croston

  • Functional
  • Reimplement
  • Methods

stlm

  • Functional
  • Reimplement
  • Methods

meanf

  • Functional
  • Reimplement
  • Methods

thetaf

  • Functional
  • Reimplement
  • Methods

modelAR

  • Functional
  • Reimplement
  • Methods

nnetar

  • Functional
  • Reimplement
  • Methods

arfima

  • Functional
  • Reimplement
  • Methods

splinef

  • Functional
  • Reimplement
  • Methods

Wishlist

VAR

  • Functional
  • Reimplement
  • Methods

VARMA

  • Functional
  • Reimplement
  • Methods

Fix object sizes of environments

It seems that the environments of functions are storing too much information (such as the input dataset), which is inflating the object size.

ACF, PACF, CCF

A summary of some thoughts with @earowang...

  1. Should the correlation family of functions exist within tidyforecast, tsibble, or some other tidystats package?
  2. Is it possible to structure the ACF output in a way suitable for a tsibble? Does it make sense to have the ACF lag as the tsibble's index? Perhaps tsibble should support some form of interval/period as the index.

Workflow Using Fable

I really enjoy the idea of fable. I am a fan of tidyverse tools and am happy to see the tidyverts working to bring tidy tools to forecasting.

I am struggling with making a workflow that is analogous to the workflow I would normally use in a non-forecast setting. I understand that forecasting is a different animal so exact translation may not be possible, but I would like your suggestions on how to approach the grouped time-series forecast problem set-up below.

I think that this is best explained by example, but essentially I am looking for convenient ways to view the following:

  1. Model Coefficients
  2. Model metrics such as AIC/BIC
  3. Predicted values on test set
  4. Evaluation of forecast accuracy

Example in a non-forecast setting

library(purrr)
library(broom)
library(modelr)

#data set up
my_iris <- iris %>% 
  mutate(train_test = ifelse(rbinom(n=n(), size = 1, prob = .85) == 1,
"train","test")) 

#set up model function
model_by_group <- function(df){
  lm(Sepal.Length ~ Sepal.Width,data = df %>% filter(train_test == "train"))
}

#store model, preds, coeffs, and model metrics in one dataframe
model_df <- my_iris %>% 
  group_by(Species) %>% 
  nest() %>% 
  mutate(model     = map(data, model_by_group),
         pred      = map2(data, model, modelr::add_predictions),
         coeffs    = map(model, broom::tidy),
         glance    = map(model, broom::glance)
         )

#want to view model coeffs
model_df %>% 
  unnest(coeffs)

#want to view model metrics such as r.squared
model_df %>% 
  unnest(glance)

#want to view predicted values on test set
model_df %>% 
  unnest(pred) %>% 
  filter(train_test == "test")

#want avg absolute error
model_df %>% 
  unnest(pred) %>% 
  filter(train_test == "test") %>% 
  mutate(ae = abs(pred-Sepal.Length)) %>% 
  group_by(Species) %>% 
  summarise(mae = mean(ae))

Attempt in a forecast setting

library(fable)
library(tsibbledata)
library(purrr)

#set up model function
fcast_model <- function(df){
  df %>% filter(train_test == "train") %>% ETS(avg_temp) %>% forecast
}

#data set up
hottness <- nycflights13::weather %>% 
  select(origin, time_hour, temp, humid, precip) %>% 
  as_tsibble(key = id(origin), index = time_hour) %>% 
  index_by(year_month = yearmonth(time_hour)) %>% # monthly aggregates
  group_by(origin) %>% 
  summarise(avg_temp = mean(temp, na.rm = TRUE)) %>% 
  group_by(origin) %>% 
  # test on last 3 months
  mutate(train_test = ifelse(year_month > "2013-09-01", "test", "train"))

model_df <- hottness %>% 
  group_by(origin) %>% 
  nest() %>% #nest by origin
  mutate(fcast = map(data, fcast_model)) %>% 
  unnest(fcast) 

#want to view model coeffs

#want to view model metrics such as AIC/BIC

#want to view predicted values on test set

#want avg absolute error on test set

Implement refit()

Similarly to the model argument from forecast, refit() allows a model to be applied to new datasets.
A key argument for this functionality is reestimate, which defines if the model parameters should be re-estimated to suit the new dataset.

This function is closely linked with stream(). The key difference is that refit() does not condition on earlier data (replacing the dataset entirely), and stream() extends the model fit with the introduction of new future data.

Automatic response variable selection

if the data has only one column other than index and keys, we should allow it to be the default column for modelling without giving a warning. If there is more than one non-index and non-key column, the modelling function should probably return an error if the formula is not specified, rather than pick the first such column.

@robjhyndman

Better visualisation of multivariate ts/model/fc objects

Related issue: #25

For fable to be able to visualise multivariate forecasts, more thought is needed to adequately display enough information without visual clutter.

It may be necessary to limit the number of supported series/models/forecasts on a single plot.

ts_model not correctly displayed

Should show ETS models in model column

> fpp2::prisonLF %>%
+   mutate(qtr=yearquarter(t)) %>%
+   select(-t) %>%
+   as_tsibble(index=qtr, key=id(state,gender,legal)) %>%
+   ETS(count)
# A mable: 32 models [1QUARTER]
# Keys:    state, gender, legal [32]
   state gender legal     data               model         
   <fct> <fct>  <fct>     <list>             <list>        
 1 ACT   Female Remanded  <tsibble [48 × 2]> <S3: ts_model>
 2 ACT   Female Sentenced <tsibble [48 × 2]> <S3: ts_model>
 3 ACT   Male   Remanded  <tsibble [48 × 2]> <S3: ts_model>
 4 ACT   Male   Sentenced <tsibble [48 × 2]> <S3: ts_model>
 5 NSW   Female Remanded  <tsibble [48 × 2]> <S3: ts_model>
 6 NSW   Female Sentenced <tsibble [48 × 2]> <S3: ts_model>
 7 NSW   Male   Remanded  <tsibble [48 × 2]> <S3: ts_model>
 8 NSW   Male   Sentenced <tsibble [48 × 2]> <S3: ts_model>
 9 NT    Female Remanded  <tsibble [48 × 2]> <S3: ts_model>
10 NT    Female Sentenced <tsibble [48 × 2]> <S3: ts_model>
# ... with 22 more rows

Change name to fable

I think we can now do this. It would be nice to have a hex design before I talk about it in Boulder.

Write STL()

Should return a tsibble with decomposed components

... --- ...

ANOMALY DETECTED. ON OUR END FUNS ARE COMING THROUGH IN CAPS. CHECK CAPS LOCK KEY AND RE-TRANSMIT. CONFIRM?

STL() API suggestions

https://github.com/tidyverts/tidyforecast/blob/86ce90ecf550ecfb5ba7bd5d04b4c8f731eae3f4/R/stl.R#L16

  1. if the argument x takes a bare variable, the default would be better using x instead of x = NULL . Inside the function, rlang::quo_is_missing(enquo(x)) helps to capture whether it's missing or not, to avoid unnecessary computations using eval_tidy().
  2. Usually x suggests a data frame or an object. I would prefer naming it as value taking a numerical variable.
  3. If x is not specified, currently the first variable is picked up after removing the index. tsibble provides a convenient function measured_vars() to list all the measured variables.
  4. Does it only deal with a univariate time series? what about a tsibble with multiple series?
  5. seasonal.periods requires a vector of frequencies (like 24 for hourly, 365 for daily and etc). This actually brings it back to the old style, which users are always struggling with the appropriate time series frequencies. Can we improve it by providing natural languages like c("hour", "day"), and we compute the frequencies inside the function?
  6. What if the tsibble has a column called "Trend" and "Seasonal"? Are we going to overwrite them? Is there any way to let the users name the "Seasonal, "Trend" and "Residuals" when calling the function?

Vignettes

  • Workflow / getting started
  • Transformations
  • Extending fable (AR(1), subset_AR)
  • Model special functions

Speed up fable

  • Avoid using the pipe
  • Use compact-purrr instead of purrr.
  • Remove forecast/dplyr/ggplot dependencies.

Output object structure

@robjhyndman
@earowang and I had a discussion today regarding the returned values from models.

There are two main ideas/possibilities that we have come up with (with preference for option 2):

  1. Return a tibble (mable?) which contains a column of data necessary for producing the model (time index, response, and xreg), and a column containing the model itself (parameter estimates, etc.).
  2. Return a tibble (mable?) containing the user-provided data, and a column containing everything in (1).

We also discussed the need for a verb that nests a tsibbles by keys, which could be used to provide names for the newly introduced model columns.

Error: Irregular time series provided

The above error occurred with my own tsibbles piped to fable - but also with trying to replicate Rob Hyndman's UseR! talk.

Specifically:

cafe <- as_tsibble(fpp2::auscafe)
cafe %>% ARIMA(log(value) ~ pdq(2,1,1) + PDQ(2,1,2))

raises

Error: Irregular time series provided

For my daata - similar error - a tsibble with a key as well (with two elements in the key) had df %>% ETS(count) raised:

Error in mutate_impl(.data, dots) : 
  Evaluation error: Irregular time series provided.

For cafe and df, the tsibble has continuous monthly observations.

Any suggestions?

thanks
Chris

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.