loelschlaeger / fhmm Goto Github PK

View Code? Open in Web Editor NEW

12.0 3.0 7.0 158.53 MB

Hidden Markov models for finance

Home Page: https://loelschlaeger.de/fHMM/

License: GNU General Public License v3.0

C++ 1.39% R 98.61%

hidden-markov-models finance rstats

fhmm's Issues

Implement wrapper for full functionality of fHMM

Flowchart with brackets

Add brackets for functions in flowchart.

estimation output

sort: state with highest mu at the front, descending
design estimation result output in txt-file (names, elements, order)
Hessian computation
AIC and BIC computation
check if iterlim was exceeded, if so, increase

Documentation issues

Set_controls() sollte als zweites stehen, da es für prepare_data() wichtig ist. Klarstellen: An object of class RprobitB_controls oder fHMM_controls?
prepare_data() sollte als drittes stehen, weil hierbei erst die class fHMM_data eingeführt wird, was man für das jetzt zweite plot braucht. Das könnte verwirren
Decode_state: ich würde „the most likely” state sequence schreiben. Rprobit_B unklar
fHMM_events(): mir wäre nicht ganz klar, was diese Funktion genau macht. Prüft sie gegeben Events, dass diese passend sind zum Einlesen?
fHMM_parameters: A tpm of dimension controls$states[1]. – tpm würde ich ausschreiben. Unterschied mu/mus_star wird nicht klar

Plot of SDDs for simulated HHMM gives odd x-scale

Try

controls = list(
  id            = "test", 
  sdds          = c("normal","normal"),
  states        = c(3,2),
  time_horizon  = c(100,30),
  at_true       = TRUE,
  overwrite     = TRUE,
  seed          = 4
)

and see that this gives an odd x-scale in sdds.pdf. Set them based on distribution limits.

Check `set_controls` and `prepare_data`.

Check the new functions set_controls and prepare_data for clear documentation and expected behavior.

Is there a built-in function to graph the simulated data?

First of all - I love the package! I struggled with some not-so-userfriendly packages in the past, but this is really something else!

To my question - is there existing funtion in the package to visualize/plot only the simulated data but with the same structure, visualizing the state scales underneath?

If not - is the simulated data bundled together with the data I fitted the model on in the data.rds file?

In the picture above, logReturns and dataRaw with 2714 elements. I only modelled a year, so this got to be all of the data, right? So I only need to fetch he last 365 elements and then I'm fine?

I do hope I was clear enough. If there are any confusion, just leave a quick comment and I will try to explain further.

Thanks in advance,

Carlos

Incorporate covariates

Incorporate covariates into the state process(es) to determine which factors affect the probabilities of switching to bearish and bullish markets, respectively (just an idea, perhaps something for later versions of the package!).

Create package overview.

Create (tikz?) graphic of package functions.

visualization

make visualization more flexible for any number of states
add. parameter: vector with dates and labels to highlight in the plot (e.g. Lehman bankruptcy)
check that plots don't get overwritten

likelihood computation

check that nLL_hmm works
check that nLL_hhmm works
transformation of thetaUncon to thetaCon with function
nLL_hmm can also be called from nlm

Implement predict function

Use model results to forecast the market.

check_controls

write funtion "check_controls" that checks control parameters and gives output about model formulation

Data on coarse scale

Problem

Log-return averages on the coarse scale seems not to be the best idea. It's hard for the code to detect different states / state switches for this type of data.

Idea

Include parameter in controls to select type of coarse scale data (e.g. sum of absolute values, mean, average of absolute values). Plot coarse-scale data in ts.pdf to see if this yields better data.

update README

mini example
descrp of controls
saved outputs

Calculation of Hessian

Use option hessian=FALSE in nlm and only hessian=TRUE in final estimation run. Should give speed improvement.

`coef` function for model coefficients

Implement coef method to extract estimated model coefficients.

Consistent function / file / parameter names with two parts

E.g. check_estimation or checkEstimation or check.estimation?

Unexported functions

Add #' @export as last Roxygen tag for user-level functions.

Create R-package

Add a folder called "R" that contains all .R files and a folder called "scr" that contains all .cpp files (this is the folder structure that is required for the R package). Here's a cheat sheet on creating packages that could be useful for the development: https://github.com/rstudio/cheatsheets/raw/master/package-development.pdf.
Create a separate .R file for each function.
Write documentation for each function using roxygen tags, see comment below.
Where functions from other packages are used, use packageName::functionName() instead of functionName() to avoid conflicts.
Choose name for the package: fHMM
R package hex sticker
description file

control "data"

Give controls parameter "data" which is a list containing all parameters related to data processing/simulation. Update documentation.

Falsche state-dependent distribution bei 3 States

Beim Durchlaufen dieses Codes:

simulated HMM -----------------------------------------------------------
seed = 1
controls = list(
states = 3,
sdds = "gamma",
horizon = 500,
fit = list("runs" = 100)
)
controls %<>% set_controls
data = prepare_data(controls, seed = seed)
data %>% summary
data %>% plot
model = fit_model(data, ncluster = 1, seed = seed) %>%
decode_states %>%
compute_residuals
summary(model)
model %<>% reorder_states(state_order = 1:3)
compare(model)
model %>% plot("ll")
model %>% plot("sdds")

wird der 3. Status leider nicht richtig erkannt. Ich habe dasselbe auch mit 1000 Runs einmal ausgeführt, geändert hat sich am Ergebnis allerdings nichts.

Odd behaviour for fixed dfs

Try the fixed-dfs model

controls = list(
  id = "test",
  sdds = c("t(Inf)",NA),
  states = c(2,0),
  time_horizon = c(100,NA),
  seed = 1
)

and see that two states cannot be identified. However, the dfs-flexible model

controls = list(
  id = "test",
  sdds = c("t",NA),
  states = c(2,0),
  time_horizon = c(100,NA),
  seed = 1
)

works.

Error in check_controls(controls): File './data/x.csv' not found.

Hi,

I am trying to fit an hhmm on copper data, however, after

fit_hmm(controls, events)

the following error pops up:

Error in check_controls(controls): File './data/x.csv' not found.

Thank you in advance for the support.

Inconsistent reference style in the vignettes

Make sure that the reference style is consistent in ref.bib (to discuss).

Improve numerical optimization

Early stopping of non-promising optimization runs.
Parallelise numerical optimization runs.
- Set number of cores in controls via ncores.
- In check_controls, read out available number of cores, give warning if not (all-1) and error if too many (>=all) cores are used.
- Divide all runs into ncores batches. Last one has ceiling(runs/ncores) runs, all others floor(runs/ncores) runs. Implement progress bar for last batch. ncores must not exceed runs.

Error in fit_hmm(controls, events) : Id invalid. (S.1)

Run github instance, error reported

Ideas

A collection of ideas on how to further extend the code:

Functionality to loop over different numbers of states.
How to deal with NA values in empirical data ("Close" may not exist for every time point, two data sets may not share all close days)?
Include comparison between true states and predicted states for simulated data in contingency table.
Possibility to extract any column from dataset, not only "Close".
Extend for fix of degrees of freedom on one scale only.
Show progress bar before first iteration
Give error if any state = 1.
Download new data automatically from https://finance.yahoo.com/. Write function download_data in 'data.R'.

Implement progress bar

Write own code for progress + ETA output. Remove dependence to "progress" package.

sim

simulate HMM and HHMM (depending on N=0 or N!=0)

Flexible FS time horizon

For empirical data, implement that the fine-scale horizon can be monthly / quarterly. Leads to different fine-scale chunk sizes. In this case warning, if !controls[["data_cs_type"]] in c("mean","mean_abs")

control "nlm"

Give controls parameter "nlm" which is a list containing all parameters that can be passed to nlm. Update documentation.

Make graphics as ggplot2

Transform graphics from base R to ggplot2.

Fix `sprintf` on Linux

Gives output %18-s 3 / 2 instead of number of states: 3 / 2

update on readData

process two different sources of data
truncation: find nearest date if truncation point does not exist, possibility to include NA for no truncation
print out which data is used and how
check in check_control if correct emp. data is supplied for HMM and HHMM

PRS of simulated HHMM with normal SDDs on FS look odd

Try

controls = list(
  id            = "test", 
  sdds          = c("normal","normal"),
  states        = c(3,2),
  time_horizon  = c(100,30),
  at_true       = TRUE,
  overwrite     = TRUE,
  seed          = 4
)

and see that the pseudo-residuals of the fine scale are not normal.

Extend for other SDDs

Extend code for other state-dependent distributions:

t: t-distribution
t(x): t-distribution with x fixed degrees of freedom (which replaces fix_dfs in controls)
norm: normal distribution, i.e. t(Inf)
gamma: gamma-distribution

Include control sdd (character vector of length two).

Setup JSS paper template.

Set JSS paper structure and metadata.

Compute confidence intervals of estimates

Hessian checks (Timo)
integration into code (Lennart)
results to output file (Lennart)

Access the predicted values in a list/array

I'm doing an analysis of our predictions and I would like to access the output in a list/array so I could run MAPE, MSE and some other indicators. I can't seem to find it - where is it?

Viterbi

fit Viterbi to new code

Examples in .Rd-files

add small executable examples in main .Rd-files to illustrate the use of the exported function but also enable automatic testing

Simulated values

I'm raising the issue again - I can't find the simulated data in the provided model files. Just the graphical representation. I've looked through the output files and I can't find it.

Odd graphics for sim. HHMM

See residuals and ts plot.

Document data structures

Document

thetaUncon
thetaCon
thetaList
*Ordered

in ReadMe. Make sim_par to thetaList object.

Function for LaTeX output

Implement function that creates LaTeX output of model formulation and results.

@return for each .Rd file

Add @return to roxygen tags and explain the functions results in the documentation.
Write about the structure of the output (class) and also what the output means. (If a function does not return a value, document that too).
Missing @return tag in
- apply_viterbi.Rd
- check_decoding.Rd
- create_visuals.Rd
- download_data.Rd
- fit_hmm.Rd
- plot_ll.Rd
- plot_sdd.Rd
- plot_ts.Rd

Pseudo-residuals normality

Include test for normality of pseudo-residuals, see package tseries, function jarque.bera.test.

Warnings when using other datasets

download_data("dax", "^GDAXI", path=".")

download_data("hk", "HEN3.DE", path=".")

horizon: 2020-01-02 to 2021-03-01

Warnings:

possibly unidentified states (C.6)
events ignored (V.2)

No writing to users's home filespace per default

Ensure that functions do not write by default in the user's home filespace (including the package directory and getwd()). This is not allowed by CRAN policies.

loelschlaeger / fhmm Goto Github PK

fhmm's Issues

Problem

Idea

Recommend Projects

Recommend Topics

Recommend Org