Giter VIP home page Giter VIP logo

mlr3fda's People

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

lenaeb m-muecke

mlr3fda's Issues

TODOs

  • Fix bug for tfd_reg in PipeOpFFS and add a test
  • Use param_set$get_values() instead of param_set$values in PipeOpFFS
  • Use tag "required" and initial values instead of defaults
  • Allow for arbitrary feature extractors in PipeOpFFS
  • Improve the speed of PipeOpFFS
  • Check that the tf package can be installed in the latest R version
  • Readme says at the top that we only support tfd_irreg but this is false
  • Add some predefined task that includes some irregular data.

Multidimensional Feature Extraction

Instead of extracting trends over time for each feature (e.g. blood pressure, respiratory rate, etc.) individually, this topic focuses on extracting multidimensional trends (including but not necessarily limited to two dimensions), e.g. correlation between individual features over time. Since the measurements of the individual features are not necessarily performed at the same time points, this topic includes a strategy for aligning the features’ time axes (e.g. by binning, etc.); see example data excerpt below.

Feature extraction for different intervals in `PipeOpFFS`

To find good intervals for the extracted features, there are two possibilities:

  1. Tune the lower and upper bound
  2. Calculate the feature (e.g. mean, var) for different lower and upper bounds and then do feature selection.

We probably also want to allow the latter, i.e. the lower and upper parameter of PipeOpFFS can also be vectors:
lower = c(1, 2); upper = c(2, 3) or lower = 1:10; upper = Inf should both be possible.

Maybe we should wait with this until the C Code is merged (or not).

Full README

Have a few examples and figures in the README. This is really important for outside visibility

Size of task prototype

When creating a graph learner that has as input a task with tf columns, the data_prototype that is saved in the learner's state after training contains the arg and value vectors as well as the evaluator and other metadata defined in tf.
This unnecessarily blows up the size of learner states in a way that was not intended.

I think this should be fixed in tf, i.e. 0-lentgh tf vectors should drop discardable metadata.

Task print raising error

Description

Task print giving following error, independent of task used and only occurs when loading mlr3fda:

Error in map_values(x, r_types, p_types) :
Assertion on 'new' failed: Must be of type 'atomic vector', not 'NULL'.

Error occurs because the task_feature_types are expected to be named and the register_mlr3 removes the names due the union function dropping names:

  mlr_reflections$task_feature_types = c(
    lgl = "logical", int = "integer", dbl = "numeric", chr = "character", fct = "factor", ord = "ordered", pxc = "POSIXct"
  )

Reproducible example

library("mlr3fda")

task = tsk("fuel")
task

Functional Response

We should perhaps think about introducing a TaskFunctionalRegr.
As an output, this task would have one of the functional feature classes we have designed.

There is software that allows directly modeling functional responses, e.g. FDBoost and usig keras/torch its quite easy to model.

backend for functional features

A potentially more efficient storage for (regular) functional features might be a DataBackendDuckDB that stores a table for the normal features and a table for each functional feature.

Speed up FFS case with different window per fextractor

The current problem is that when we want to specify different lower and upper sizes per fextractor the PipeOpFFS becomes slower.

Now lower can be a list with different values per fextractor and the same for upper.

Functional Principal Component Analysis

besides the simple feature extraction through PipeOpFFS, another important method is functional PCA.

We can implement a PipeOpFPCA similar to PipeOpFFS that does a functional PCA.

I think the tfb_fpc function can be used here.

  • Add PipeOp to overview in readme

Test edge cases for PipeOpFFA

  • What happens when there are no observations in the specified window
  • Can there ever be NA's in the tfd_irreg for args for which values exist
  • What happens when the specified window is (partly) outside of the domain

`functional` class design

In a first step, let us only consider 1-D functional data.
There are several layers of complexity we could consider

  • functional_matrix: A functional is a n x t matrix. We assume that all values are made at the same t different time points.
    Not supported by data.table
  • functional_list: A functional feature is a list of n functions:
    • variable length lists,
    • lists with a time component (e.g. a time-series)
    • Something from tidyfun`

Proposal: Each of those has class functional and perhaps a more specific second class.

What do we need:

  • A converter to each sub-class
    • as_functional_matrix
    • as_functional_list
  • A converter to flatten (S3-method) to a matrix.
    • flatten_functional (S3) for the two sub-classes above)
  • PipeOpFlatten: Converts from functional to a data.table of doubles
  • PipeOpExtractFeature: Applies a function to each functional, e.g. m̀ean`.

Research:

  • What functional_list type do we want?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.