Giter VIP home page Giter VIP logo

nowcaster's Introduction

nowcaster

License: GPL (>= 3)

nowcaster is an R package for “nowcasting” epidemiological time series. Every single system of notification has an intrinsic delay, nowcaster can estimate how many counts of any epidemiological data of interest (i.e., daily cases and death counts) by fitting a negative binomial model to the time steps of delay between the onset date of the event, (i.e., date of first symptoms for cases or date of occurrence of death) and the date of report (i.e., date of notification of the case or death).

nowcaster is based on the R-INLA and INLA packages for “Integrated Nested Laplace Approximation” algorithm to Bayesian inference. INLA is a fast alternative to other methods for Bayesian inference like MCMC. An introduction to INLA can be found here.

nowcaster is built for epidemiological emergency use, it was constructed for the Brazilian Severe Acute Respiratory Illness (SARI) surveillance database (SIVEP-Gripe).

Installing

To install nowcaster package simply run the code below in R:

if( !require(nowcaster, quietly = T) ){
  devtools::install_github("https://github.com/covid19br/nowcaster")
}

After installing you can load the typical library:

library(nowcaster)

First example on LazyData

When the package is loaded it provides a LazyData file, sariBH, it is an anonymized record of Severe Acute Respiratory Illness notified in the city of Belo Horizonte, from March 2020 to April 2022. To load it basically, write:

# Loading Belo Horizonte SARI dataset
data(sragBH)

And we take a look at the data:

head(sragBH)
## # A tibble: 6 x 7
##   DT_SIN_PRI DT_DIGITA  CLASSI_FIN EVOLUCAO CO_MUN_RES Idade fx_etaria
##   <date>     <date>          <dbl>    <dbl>      <dbl> <dbl> <fct>    
## 1 2020-02-11 2020-03-05          4        1     310620    59 50 - 59  
## 2 2020-01-21 2020-02-06          4        1     310620    79 70 - 79  
## 3 2020-03-30 2020-04-17          4        1     310620    72 70 - 79  
## 4 2020-03-26 2020-04-02          4        1     310620    82 80 +     
## 5 2020-03-20 2020-04-13          4        1     310620    50 50 - 59  
## 6 2020-04-07 2020-04-22          5        1     310620    74 70 - 79

It is a data.frame with 7 variables and 65,404 observations. We will make use of only the first two columns, “DT_SIN_PRI” (date of onset symptoms) and “DT_DIGITA” (recording date) as well as the column “Idade” (age in years) to make age-structured nowcasting.

Non-structured data

Now we call the nowcasting function, it has by default the parametrization to take the data and estimate it with non-structured data form. The estimate fits a negative binomial distribution, NegBinom(\lambda_{t,d}, \phi), to the cases count at the time t with delay d, \phi is the dispersion parameter. The rate \lambda_{t,d} is then parameterized in a log-linear format by a constant term added by structured delay random effects and structured time random effects. Hence, the model is given by the following:

\begin{equation} Y_{t,d} \sim NegBinom(\lambda_{t,d}, \phi), \\ \log(\lambda_{t,d}) = \alpha + \beta_t + \gamma_d, \\ t=1,2,\ldots,T, \\ d=1,2,\ldots,D, \end{equation}

where the intercept \alpha follows is Gaussian distribution with a very large variance, \beta_t is follows a second-order random walk with precision \tau_\beta, \gamma_d a first-order random walk with precision \tau_\gamma. The model is then completed by INLA default prior distributions for \phi, \tau_\beta, and \tau_\gamma. See nbinom, rw1, and rw2 INLA help pages.

The call of the function is straightforward, it simply needs a dataset as input, here the LazyData is loaded in the namespace of the package. The function has 3 mandatory parameters, dataset for the parsing of the dataset to be nowcasted, date_onset for parsing the column name which is the date of onset of symptoms, and date_report which parses the column name for the date of report of the cases. Here these columns are “DT_SIN_PRI” and “DT_DIGITA”, respectively.

nowcasting_bh_no_age <- nowcasting_inla(dataset = sragBH, 
                                        date_onset = DT_SIN_PRI, 
                                        date_report = DT_DIGITA, 
                                        data.by.week = T)
head(nowcasting_bh_no_age$total)
## # A tibble: 6 x 7
##    Time dt_event   Median    LI    LS   LIb   LSb
##   <int> <date>      <dbl> <dbl> <dbl> <dbl> <dbl>
## 1    17 2021-12-13    625  621    632   623   627
## 2    18 2021-12-20    695  687    709   692   699
## 3    19 2021-12-27    812  801    829   808   817
## 4    20 2022-01-03    886  871    906   880   893
## 5    21 2022-01-10    819  799.   847   811   826
## 6    22 2022-01-17    631  608    661   623   641

This calling will return only the nowcasting estimate and its Confidence Interval (CI) for two different Credible intervals, LIb and LSb are the max and min CI, respectively, with credibility of 50% and LI and LS is the max and min CI, respectively, with a credibility of 95%.

nowcasting_inla has the option to return the curve when the nowcasting estimate was set as the window of action of the model, if the data.by.week parameter is flagged as TRUE it returns on the second element of the output list the summarized data by week.

library(ggplot2)
library(dplyr)

dados_by_week <- nowcasting_bh_no_age$data |> 
  filter(date_onset >= (Sys.Date()-270)) |> 
  group_by(date_onset) |> 
  summarise(n = n())

dados_by_week |> 
  ggplot()+
  geom_line(data = dados_by_week, aes(date_onset, y = n, col = 'Observed'))+
  theme_bw()+
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c('grey50', 'black'), name = '')+
  scale_x_date(date_breaks = '2 weeks', date_labels = '%V/%y', name = 'Date in Weeks')+
  labs(x = '', y = 'Nº Cases')

After this element is groped by and summarized by the onset of symptoms date, here DT_SIN_PRI, it is the epidemiological curve observed. For example, how the estimate compares with the observed curve, we plot the estimate and the epidemiological curve all together.

nowcasting_bh_no_age$total |> 
  ggplot(aes(x = dt_event, y = Median, col = 'Nowcasting')) +
  geom_line(data = dados_by_week, aes(date_onset, y = n, col = 'Observed'))+
  geom_ribbon(aes(ymin = LI, ymax = LS, col = NA), alpha = 0.2, show.legend = F)+
  geom_line()+
  theme_bw()+
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c('grey50', 'black'), name = '')+
  scale_x_date(date_breaks = '2 weeks', date_labels = '%V/%y', name = 'Date in Weeks')+
  labs(x = '', y = 'Nº Cases')

This is an example where the estimate was done without considering any type of structure in data, which is the first assumption for the nowcasting.

Nowcasting as a tool to support decision-making

Nowcasting a rising curve or a curve at any other moment can give quantitative support for decision-making, during the public health crises, the most needed is a way to anticipate, at least, what it is happening at the moment. Nowcasting is the tool for this type of questioning and can give insights on the data to support the needed decisions.

We start this section by cutting the original data at a moment of apparent decaying of the SARI hospitalization, for the city of Belo Horizonte, which had a prompt starting response to the COVID-19 pandemic. The pressure on the health system took more time than the rest of the country, and the data at the same time were showing a decay. We filter all cases entered until the 4th of July 2020 by the date of digitalization, a date that the cases show up in the database.

library(tidyverse)
library(lubridate)

## To see Nowcasting as if we were on the verge of rise in the curve
srag_now<-sragBH |> 
  filter(DT_DIGITA <= "2020-07-04")

data_by_week<-data.w_no_age(dataset = srag_now, 
                         date_onset = DT_SIN_PRI, 
                         date_report = DT_DIGITA) |> 
  group_by(date_onset) |> 
  tally()

data_by_week |>  
  ggplot(aes(x = date_onset, 
             y = n))+
  geom_line()+
  theme_bw()+
  labs(x = 'Date of onset of symptons', y = 'Nº Cases')+
  scale_color_manual(values = c('grey50', 'black'), name = '')+
  scale_x_date(date_breaks = '2 weeks', date_labels = '%V/%y', name = 'Date in Weeks')

On this filtered data, we estimate the cases already that started its date of onset of symptoms but were not yet reported, so there not in the database. We just pass to the nowcasting_inla function, the dataset filtered, flag for the columns where are the date_onset and date_report, we add the flag for the function return the epidemic curve by epiweek.

nowcasting_bh_no_age <- nowcasting_inla(dataset = srag_now, 
                                        date_onset = DT_SIN_PRI, 
                                        date_report = DT_DIGITA, 
                                        data.by.week = T)
head(nowcasting_bh_no_age$data)
## # A tibble: 6 x 3
##   date_report date_onset Delay
##   <date>      <date>     <dbl>
## 1 2020-02-29  2020-02-08     3
## 2 2020-02-01  2020-01-18     2
## 3 2020-04-11  2020-03-28     2
## 4 2020-03-28  2020-03-21     1
## 5 2020-04-11  2020-03-14     4
## 6 2020-04-18  2020-04-04     2

Before we see the result of the nowcasting estimate we take a look on the intermediate part of the process of nowcasting, the delay triangle, which sets the objects for nowcasting. The delay triangle is only a a table where each unique amount of delay, (i.e. integer numbers of days or weeks) has passed between the date of onset and the date of report spread over each date of onset. The part that is closer to the present has less counts and has a lower amount of delay, this is trivial due to, the the system takes time to process the cases, the newer cases are lesser then the older ones, which already have time to be processed.

From the data in a weekly format, we mount the counts of cases by the amount of delay. By tabling the delay amount against the data of the onset of the first symptoms, we see the pattern of the delay for the cases.

data_triangle <- nowcasting_bh_no_age$data |> 
  filter(Delay < 30) |> 
  arrange(desc(Delay))
delay_triangle<-table(data_triangle$date_onset, 
                      rev(data_triangle$Delay),
                      dnn = list("Date of Onset", "Delay"))
head(delay_triangle)
##              Delay
## Date of Onset  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
##    2019-12-28  5 13  1  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0
##    2020-01-04  3 15  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##    2020-01-11  1 13  4  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0
##    2020-01-18  5 17  3  0  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0
##    2020-01-25  5  7  5  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##    2020-02-01  3 12  3  1  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0

We just look at the amount of cases with 30 weeks of delay or less, it is the default maximum delay considered at nowcasting estimation. It can be changed by the parameter Dmax.

If this element is grouped by and summarized by the onset of symptoms date, here DT_SIN_PRI, it is the epidemiological curve observed. For example, we plot the estimate and the epidemiological curve all together.

library(ggplot2)

dados_by_week <- nowcasting_bh_no_age$data %>% 
  group_by(date_onset) %>% 
  summarise(n = n())

nowcasting_bh_no_age$total |> 
  ggplot(aes(x = dt_event, y = Median, col = 'Nowcasting')) +
  geom_line(data = dados_by_week, aes(date_onset, y = n, col = 'Observed'))+
  geom_ribbon(aes(ymin = LI, ymax = LS, col = NA), alpha = 0.2, show.legend = F)+
  geom_line()+
  theme_bw()+
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c('grey50', 'black'), name = '')+
  scale_x_date(date_breaks = '2 weeks', date_labels = '%V/%y', name = 'Date in Weeks')+
  labs(x = '', y = 'Nº Cases')

As expected, the nowcasting estimated a rise on a curve when it was observed a decaying. Adding to the plot what has happened in that period, with the data inserted posteriorly the period for when the nowcasting estimated the rising in the curve for SARI hospitalizations.

nowcasting_bh_no_age$total %>% 
  ggplot(aes(x = dt_event, y = Median, col = 'Nowcasting')) +
  geom_line(data = dados_by_week, aes(date_onset, y = n, col = 'Observed'))+
  geom_ribbon(aes(ymin = LI, ymax = LS, col = NA), alpha = 0.2, show.legend = F)+
  geom_line()+
  geom_line( data = sragBH %>%
               filter(DT_SIN_PRI <= "2020-07-04") %>% 
               mutate(
                 D_SIN_PRI_2 = DT_SIN_PRI - as.numeric(format(DT_SIN_PRI, "%w"))
               ) %>% 
               group_by(D_SIN_PRI_2) %>% 
               tally(), 
             mapping = aes(x = D_SIN_PRI_2, y = n, 
                           color = "Observed after one year")) + 
  theme_bw() +
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 90)) +
  scale_color_manual(values = c('grey50', 'black', 'red'), 
                     name = '')+
  scale_x_date(date_breaks = '2 weeks', date_labels = '%V/%y', name = 'Date in Weeks')+
  labs(x = '', y = 'Nº Cases')

This ends the first simple example when estimating the already started events but not yet reported (i.e. nowcasting). The relevance of nowcasting for public health decisions is given by the understanding that what is present in the databases is only a picture of the real-time situation. The above graph can help policymakers on what decisions to take in the face of a rising curve of hospitalizations.

Structured data, Age

For the structured data the nowcasting_inla() fits again a Negative binomial distribution to the cases count at the time t with delay d. Differently, from the non-structured case, the model now gives random effects on the delay distribution and and time distribution by each of the age class chosen by the user to break the data. The model has the form now:

\begin{equation}Y_{t,d,a} \sim  NegBinom(\lambda_{t,d,a}, \phi), \\ \log(\lambda_{t,d,a}) =  \alpha_a + \beta_{t,a} + \gamma_{d,a}, \\ \quad t=1,2,\ldots,T, \\ d=1,2,\ldots,D, \\ a=1,2,\ldots,A, \end{equation}

where each age class, a, has an intercept \alpha_a following a Gaussian distribution with a very large variance, the time-age random effects, \beta_{t,a}, follow a joint multivariate Gaussian distribution with a separable variance components an independent Gaussian term for the age classes with precision \tau_{age,\beta} and a second-order random walk term for the time with precision \tau_{\beta}. Analogously, the delay-age random effects, \gamma_{d,a}, follow a joint multivariate Gaussian distribution with a separable variance components an independent Gaussian term for the age classes with precision \tau_{age,\gamma} and a first-order random walk term for the time with precision \tau_{\gamma}. The model is then completed by INLA default prior distributions for \phi, \tau_{age,\beta}, \tau_{age,\gamma}, \tau_{\beta} and \tau_\gamma. See nbinom, iid, rw1, and rw2 INLA help pages.

This new model corrects the delay taking into account the effects of age classes and the interactions of each age class between time and also delay. Now the model needs a flag indicating which column on the dataset will be used to break the data into age classes and how the age classes will be split. This is given by the parameters age_col and bins_age. We pass three additional parameters, data.by.week to return the epidemiological curve out of a window of action of the nowcasting estimate and return.age to inform we desire a nowcasting result in two ways, the total aggregation estimate and the age-stratified estimate. The calling of the function has the following form:

nowcasting_bh_age <- nowcasting_inla(dataset = sragBH, 
                                     bins_age = "10 years",
                                     data.by.week = T, 
                                     date_onset = DT_SIN_PRI, 
                                     date_report = DT_DIGITA,
                                     age_col = Idade)

Each of the estimates returned by nowcasting_inla has the same form as in the non-structured case. On the nowcasting estimates, it returns a data.frame with the posterior median and 50% and 95% credible intervals, (LIb and LSb) and (LI and LS) respectively.

library(ggplot2)

dados_by_week <- nowcasting_bh_age$data |> 
  filter(date_onset >= (Sys.Date()-270)) |>
  group_by(date_onset) |> 
  summarise(n = n())


nowcasting_bh_age$total |>  
  ggplot(aes(x = dt_event, y = Median, col = 'Median'))+
  geom_line()+
  geom_line(data = dados_by_week, aes(date_onset, y = n))+
  geom_ribbon(aes(ymin = LI, ymax = LS, col = 'IC'), alpha = 0.2)+
  theme_bw()+
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 90))+
  scale_color_manual(values = c('grey90', 'black'), name = '')+
  scale_x_date(date_breaks = '2 weeks', date_labels = '%V/%y', name = 'Date in Weeks')+
  labs(x = '', y = 'Nº Cases')

Comparing the estimates

We can compare the estimates by each of the strategies, we plot the two estimates together:

nowcasting_bh_no_age$total$type <- "No Age structured"
nowcasting_bh_age$total$type <- "Age structured"


nowcasting_bh_total <- nowcasting_bh_age$total |> 
  full_join(nowcasting_bh_no_age$total)

nowcasting_bh_total |> 
  ggplot(aes(x = dt_event, y = Median, col = type))+
  geom_line(show.legend = F)+
  geom_ribbon(aes(ymin = LI, ymax = LS, fill = NULL), alpha = 0,
              show.legend = T)+
  theme_bw()+
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 90))+
  scale_color_manual(values = c('grey60', 'grey90'), name = '')+
  scale_fill_manual(values = c('grey60', 'grey90'), name = '')+
  scale_x_date(date_breaks = '2 weeks', date_labels = '%V/%y', name = 'Date in Weeks')+
  labs(x = '', y = 'Nº Cases')

The estimates give different CIs, this is due to a better fitting when considering random effects by age class for the delays at time, this has to do with the different capabilities to respond at different ages. This is an empirical finding of this model.

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
## [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
## [5] LC_TIME=Portuguese_Brazil.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] lubridate_1.8.0 forcats_0.5.1   stringr_1.4.0   purrr_0.3.4    
##  [5] readr_2.1.2     tidyr_1.2.1     tibble_3.1.8    tidyverse_1.3.2
##  [9] dplyr_1.0.10    ggplot2_3.3.6   nowcaster_0.2.2 badger_0.2.1   
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.3          jsonlite_1.7.3      splines_4.1.2      
##  [4] tmvnsim_1.0-2       modelr_0.1.8        sn_2.0.1           
##  [7] assertthat_0.2.1    BiocManager_1.30.18 rvcheck_0.2.1      
## [10] sp_1.5-0            highr_0.9           stats4_4.1.2       
## [13] yulab.utils_0.0.5   googlesheets4_1.0.0 cellranger_1.1.0   
## [16] yaml_2.2.2          numDeriv_2016.8-1.1 pillar_1.8.1       
## [19] backports_1.4.1     lattice_0.20-45     glue_1.6.1         
## [22] digest_0.6.29       RColorBrewer_1.1-3  rvest_1.0.2        
## [25] colorspace_2.0-3    htmltools_0.5.2     Matrix_1.3-4       
## [28] pkgconfig_2.0.3     broom_1.0.0         haven_2.4.3        
## [31] scales_1.2.1        tzdb_0.2.0          MatrixModels_0.5-0 
## [34] googledrive_2.0.0   generics_0.1.3      farver_2.1.1       
## [37] ellipsis_0.3.2      withr_2.5.0         cli_3.1.1          
## [40] mnormt_2.0.2        crayon_1.5.1        readxl_1.3.1       
## [43] magrittr_2.0.2      evaluate_0.15       fs_1.5.2           
## [46] fansi_1.0.3         xml2_1.3.3          tools_4.1.2        
## [49] hms_1.1.1           gargle_1.2.0        lifecycle_1.0.2    
## [52] reprex_2.0.1        munsell_0.5.0       compiler_4.1.2     
## [55] rlang_1.0.5         grid_4.1.2          rstudioapi_0.13    
## [58] INLA_21.11.22       labeling_0.4.2      rmarkdown_2.13     
## [61] gtable_0.3.0        DBI_1.1.2           R6_2.5.1           
## [64] knitr_1.39          fastmap_1.1.0       utf8_1.2.2         
## [67] rprojroot_2.0.3     dlstats_0.1.5       desc_1.4.1         
## [70] stringi_1.7.6       Rcpp_1.0.7          parallel_4.1.2     
## [73] vctrs_0.4.1         dbplyr_2.1.1        tidyselect_1.1.2   
## [76] xfun_0.29

nowcaster's People

Contributors

beatrizmilz avatar lsbastos avatar pitmonticone avatar rafalopespx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nowcaster's Issues

Need to verify the date formats

We need on this line to verify which are date formats, the actual problem it is the onset dates are moved to the last complete day of the week .

Maybe a solution:

turn back the dates to its original format, which it is the inverser of the formula in dados_w()

create `nowcasting_cluster()`

Create new internal functions for nowcasting with and without spatial structured data, nowcasting_spatial for spatialy structured data and the existing nowcasting_age & nowcasting_no_age for non-spatial structured data

  • nowcasting_spatial created and implemented
  • stable and tested for the README

Implement Hessian testing

Hessian test is constructed to avoid problems when the INLA solver stack in negative eigenvalues, to avoid it we can work on the length iteration steps. The test can be implemented through a while test until it reach a certain amount of tries. Follows a schematic test to be put on the nowcasting_age and nowcasting_no_age functions:

hess.min <- -1
h.value <- 0.01
h.trials <- 0.001
trials <- 0
while (hess.min <= 0 & trials < 50){
    
    # Running the Negative Binomial model in INLA
    output0 <- inla(model, family = "nbinomial", data = dados.age,
                    control.predictor = list(link = 1, compute = T),
                    control.compute = list(config = T, waic=F, dic=F, openmp.strategy='huge'),
                    control.family = list(
                      hyper = list("theta" = list(prior = "loggamma", 
                                                  param = c(0.001, 0.001))
                      )
                    ),
                    num.threads = 6,
                    control.inla = list(h = h.value),
                    ...
                    
    )
    hess.start <- which(output0$logfile == 'Eigenvalues of the Hessian')
    hess.min <- min(as.numeric(output0$logfile[(hess.start+1):(hess.start+3)]))
    h.value <- h.trials + 0.001
    h.trials <- h.value
    trials <- trials + 1
    
  }

Forecasting article

Create an article performing a forecasting using the parameter K > 0

  • create a rmarkdown performing the forecasting
  • build article
  • build site

INLA Requeriments warning

Require INLA with the NAMESPACE and Warning if it is not installed:

  • INLA load with the NAMESPACE
  • Warning if it is not installed

Silent and Return.age

  • Silent parameter is not working properly needs fixing
  • Return.age is obsolete, needs fixing

Add observed number of cases into nowcaster outputs

Add a column with the observed number of cases in the nowcaster outputs.

> aaa$total
# A tibble: 15 × 7
    Time dt_event   Median    LI    LS   LIb   LSb
   <int> <date>      <dbl> <dbl> <dbl> <dbl> <dbl>
 1    17 2019-10-26    154  154   154    154  154 
 2    18 2019-11-02    173  173   173    173  173 

Last date to nowcast

Need to check the last date in the dataset and the last date in the nowcast estimates. It seems they don't match properly.

Nowcasting functions that accept any type of data

To build more cleaverly the package, without problems when calling nowcasting_inla(), create another functions that will by called by nowcasting_inla to make nowcasting without stratifying and make nowcasting with clustering specified data.

Tentative names, nowcasting_no_stratum() and nowcasting_cluster(), add the parameters to call those functions on nowcasting_inla(), these new functions will have a structure similar to nowcasting_age()

  • #6
  • #19
  • implement above functions on nowcasting_inla()
  • test stability of any kind of data passed to nowcasting_inla()

Covariate feature

Implement and test a model with covariate of tweets counts and google trends searches on the main function nowcasting_inla:

  • creates new shadow function nowcasting_covariates()
  • flag on the main function nowcasting_inla
  • all together

Review articles in the package site

To guarantee a well written documentation and understandable articles, we have to review the articles presented in the site of the package:

  • Get started!
  • Forecasting
  • Nowcasting for decision-making
  • Structured data

Imports

Stabilize the Imports for the package, now is import by the export specification on the NAMESPACE:

  • Verify Imports needed
  • NAMESPACE update
  • Warning to certify if INLA package is installed

Bug on forecasting K steps ahead

There is a bug on the use of parameter K. When the last week is incomplete and K = 0 the last week to correct delay is the last complete week. But when K>0, it should forecast the current (and incomplete) week and the following K-1 weeks, but it seems to forecast K weeks beyond the current. As as example, when K=1, it should forecast the current (and incomplete) week but it forecasts two weeks ahead, the current one + 1.

data(sragBH)

sragBH20 <- sragBH %>% 
  filter(
    # Excluding typing date before symptoms onset
    DT_SIN_PRI <= DT_DIGITA, 
    # Excluding no typing date
    !is.na(DT_DIGITA),
    # cases reported in 2020
    year(DT_DIGITA) == 2020
  ) 

max(sragBH20$DT_DIGITA)
weekdays(max(sragBH20$DT_DIGITA))

epiweek(max(sragBH20$DT_DIGITA))
# Incomplete week (recording ends on Thursday)

# Nowcasting ignores an incomplete week (OK)
nowk0 = nowcasting_inla(dataset = sragBH20,
                        date_onset = DT_SIN_PRI,
                        date_report = DT_DIGITA,
                        silent = T)
nowk0$total %>% 
  mutate(
    epiweek = epiweek(dt_event)
  )  %>% tail()



# Week to forecast: week 53 only, but it forecast weeks 53/20 and 1/21
nowk1 = nowcasting_inla(dataset = sragBH20,
                        date_onset = DT_SIN_PRI,
                        date_report = DT_DIGITA,
                        K = 1,
                        silent = T)
nowk1$total %>% 
  mutate(
    epiweek = epiweek(dt_event)
  )  %>% tail()


# Weeks to forecast: week 53 and 1, but it forecast weeks 53/20, 1 and 2/21
nowk2 = nowcasting_inla(dataset = sragBH20,
                        date_onset = DT_SIN_PRI,
                        date_report = DT_DIGITA,
                        K = 2,
                        silent = T)
nowk2$total %>% 
  mutate(
    epiweek = epiweek(dt_event)
  ) %>% tail()

Trajectory parameter

Implement a trajectory flag on nowcasting_inla to give back the option trajectory devolution

Slope function to calculate the tendency of the curve

Through a simple linear model adjusted to the trajectories, we can infer the tendency of the epidemic curve, to do so we can construct functions that take the posteriors output from the main function nowcasting_inla and adjust the linear model and return the slope of the trajectories, to give a more quantitative tendency inference:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.