Giter VIP home page Giter VIP logo

mtsdatamodel's Introduction

MTSDataModel

MTSDataModel is a Python class that stores and manipulates economic multivariate time series data. Essentially, it is a wrapper on pandas library; the core of MTSDataModel object is a pandas data frame that stores the data. Several data manipulation actions can be performed on the data frame.

Installation

Conda environment for the use of MTSDataModel is created according to the instructions in NoobQuant condaenv. The specific installation command for mts environment is:

mamba create --name mts anaconda python=3.6.7 numpy=1.15.2 numpy-base=1.15.2 tzlocal=2.0.0 pandas=0.24.1 seaborn=0.11.0 rpy2==2.9.4 r=3.6.0 r-base=3.6.0 r-essentials=3.6.0 r-tidyverse=1.2.1 rtools=3.4.0 r-rjsdmx=2.1_0 r-seasonal=1.7.0 rstudio=1.1.456 r-wavelets=0.3_0.1 r-xlconnect=1.0.3

Loading data into data model

Data is read in from a .csv file. This file needs to be in long format and contain following columns:

  • date column (string)
  • level 1 name column for variable/feature, e.g. GDP (string)
  • level 2 name column for entity, e.g. country (string)
  • value column (numeric).

It is assumed for the input data that

  • each individual time series does not contain breaks
  • NAs can be present at start and end of individual time series.

MTSDataModel initializes to a data frame with two-leveled multi-index columns. First level is meant to represent variable names. Second level is meant to represent entity level (e.g. country, company etc.).

Data manipulations

Several data manipulation operation can be performed on the initialized data frame within MTSDataModel object. These operations can be performed on different variables and/or entities.

Variables and entities selection

Data manipulations are performed via class methods. For this level 1 names (variables) need be specified and passed into methods via list variables. When it comes to level 2 names (entities), following rules are used:

  • Implicit entities selection: when list entities is left unspecified, then methods operate only on entities for which all input variables are present.
  • Explicit entities selection: when list entities is explicitly specified, then methods operate only on variable/entity pairs (cross-product of the two input lists) specified. If some variable/entity pair is not present in data an error will be thrown.

Data pre-processing

Methods available for pre-processing of data are

  • DeflateVariables()
  • DetrendVariables()

Feature engineering

Methods available for pre-processing of data are

  • MRADecomposition()
  • SumVariables()
  • ReduceVariableDimension()

mtsdatamodel's People

Contributors

vvoutilainen avatar

Watchers

James Cloos avatar

mtsdatamodel's Issues

Bug with default entities selection when variable not calculated for certain entity

Update 20190521
Possible fix in SumVariables(): added 5 lines that, when no entities selected, get entities for which all given variables exists. Seems to work.

This fix needs to be applied to other methods as well that have the same problem! Similar can be used for others if we construct a separate loop for variables at start. This entity selection needs to be wrapped as separate static method and apply it to all.

Init post
Default entity selection
if entities == None:
entities = list(np.unique(self.df.columns.get_level_values(1).values))
is not working (at least) for following methods:

SumVariables()
DetrendVariables()
MRADecomposition()
SumVariables()
ReduceVariableDimension()
Example case: MRADecomposition() is performed on entity DEU:

variables = ['Credit_def_ld1','ResidentialPrices_ld1','StockPrices_def_ld1']
do.MRADecomposition(variables,entities=['DEU'],levels = 6,expanding=False)

Later we try to perform SumVariables() on default entities:
variables = {'L45':['StockPrices_def_ld1_wl4','StockPrices_def_ld1_wl5','Credit_def_ld1_wl4','Credit_def_ld1_wl5','ResidentialPrices_ld1_wl4','ResidentialPrices_ld1_wl5']}
do.SumVariables(variables)

For entities that were not included in MRADecomposition, SumVariables() results in variable with values 0 throughtout, although it should throw an error.

Merging to data frame with no columns

In ExpandingSampleCalc(), in the first counter loop line

resultframefull = pd.merge(resultframefull, crtresultframe, left_index = True, right_index = True, how = 'left')

causes a warning as columns in resultframefull are not a multi-index for some weird reason. Functionality seems to be correct despite the warning.

Before merge resultframefull is an empty frame with just index but no columns. After the merge we currently force the columns to multi-index so in rest of loops warning disappears, but this should be somehow corrected such that there will be no warning in the first loop either.

Use this example to reproduce: how to get multi-index column to gg when there is no data in it?
df = do.ReturnDf()
gg = pd.DataFrame(index=df.index)

Add checks to all methods

Add similar checks as in SumVariables to other methods as well.
Related to #1.

It is better to throw an explicit error when something is not working as user inputs it.

Crisis dummies data type should be fixed

It seems that endogenous dummy variables are treated as integers. However, in vulnerability horizon treatment we need to introduce NaNs and this causes these columns to become floats. We could probably use integers with the new nullable integer type (https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html) but I guess in this context it is enough to use just floats. This, however, should be properly defined; that is, all numerical column on MTSDataModel should be defined as floats.

SumVariables() differs from other methods

It does not really make sense that SumVariables() differs from other methods in how it takes in variables. The idea was to let the dict key to designate the name of the resulting new variable, and the variables itself are passed in as list in dict value element. To match with others, it makes more sense to pass in two lists variables and entities, and a mandatory 3rd input variable to designate the name, similar to ReduceVariableDimension.

What this does is perhibit summing multiple variables combindations at one go, but the others work in this way as well.

Once this is done, do #1

Vulnerability horizon treatment

Data model needs method to reduce sample to given vulnerability horizon.

Questions:

  • where to put this methods? Under class MTSModel? Do not really belong to that particular class, predictive modelling part are not just data handling anymore.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.