mlr-org / mlr3fda Goto Github PK

View Code? Open in Web Editor NEW

3.0 13.0 2.0 12.27 MB

Functional Data Analysis for mlr3

Home Page: https://mlr3fda.mlr-org.com/

License: GNU Lesser General Public License v3.0

R 100.00%

data-analysis data-analysis-in-r functional-data machine-learning data-science mlr3 r r-package

mlr3fda's People

Stargazers

Watchers

Forkers

lenaeb m-muecke

mlr3fda's Issues

TODOs

Fix bug for tfd_reg in PipeOpFFS and add a test
Use param_set$get_values() instead of param_set$values in PipeOpFFS
Use tag "required" and initial values instead of defaults
Allow for arbitrary feature extractors in PipeOpFFS
Improve the speed of PipeOpFFS
Check that the tf package can be installed in the latest R version
Readme says at the top that we only support tfd_irreg but this is false
Add some predefined task that includes some irregular data.

Benchmark for functional data

Some datasets we can use to benchmark the methods

Multidimensional Feature Extraction

Instead of extracting trends over time for each feature (e.g. blood pressure, respiratory rate, etc.) individually, this topic focuses on extracting multidimensional trends (including but not necessarily limited to two dimensions), e.g. correlation between individual features over time. Since the measurements of the individual features are not necessarily performed at the same time points, this topic includes a strategy for aligning the features’ time axes (e.g. by binning, etc.); see example data excerpt below.

Tasks

https://cran.r-project.org/web/packages/fds/index.html

example tasks should not be stored in tf format

THis means we always need to update the data when something in the tf package changes

Spline-based representation of functional data

https://tidyfun.github.io/tf/reference/tfb_spline.html

Use same pkgdown template as other mlr3 repositories

Feature extraction for different intervals in `PipeOpFFS`

To find good intervals for the extracted features, there are two possibilities:

Tune the lower and upper bound
Calculate the feature (e.g. mean, var) for different lower and upper bounds and then do feature selection.

We probably also want to allow the latter, i.e. the lower and upper parameter of PipeOpFFS can also be vectors:
lower = c(1, 2); upper = c(2, 3) or lower = 1:10; upper = Inf should both be possible.

Maybe we should wait with this until the C Code is merged (or not).

PipeOpSmooth

Feature: Convert irreg to reg data

Similar to flatfun but stay in functional data type

Have an example including a plot in the README

How to create a Task with functional features
A figure (I guess we should use usethis::use_readme_rmd()

Creation of tasks feels slow

Feature Request for flatfun

Additional option to determine grid: give lower, upper and resolution

rename master to main

decide whether the "ids" should be part of the functional vector

Full README

Have a few examples and figures in the README. This is really important for outside visibility

Add overview of all pipeops in readme

Flattening of irregular data

Size of task prototype

When creating a graph learner that has as input a task with tf columns, the data_prototype that is saved in the learner's state after training contains the arg and value vectors as well as the evaluator and other metadata defined in tf.
This unnecessarily blows up the size of learner states in a way that was not intended.

I think this should be fixed in tf, i.e. 0-lentgh tf vectors should drop discardable metadata.

Task print raising error

Description

Task print giving following error, independent of task used and only occurs when loading mlr3fda:

Error in map_values(x, r_types, p_types) :
Assertion on 'new' failed: Must be of type 'atomic vector', not 'NULL'.

Error occurs because the task_feature_types are expected to be named and the register_mlr3 removes the names due the union function dropping names:

  mlr_reflections$task_feature_types = c(
    lgl = "logical", int = "integer", dbl = "numeric", chr = "character", fct = "factor", ord = "ordered", pxc = "POSIXct"
  )

Reproducible example

library("mlr3fda")

task = tsk("fuel")
task

add tf to packages of FDA pipeops

Documentation: add mlr3book chapter

What is functional data
How do I get my data into the tf data, which data types are there and how do they work
How does the package worrk
Showcast the use of the PipeOps
Other comments:
We can take inspiration from: https://arxiv.org/pdf/1911.07511.pdf

zzz

Add functional to features
Add 1-2 example tasks: https://github.com/mlr-org/mlr/blob/3f70ac162d764eca87a6a2c122fb7989a1bd2d4a/inst/makeData.R#L85

Special PipeOp for tfd_irreg -> tfd_reg

PipeOpFFS: Implement differently?

With irregular data the implementation might not be optimal.

Integrate mlr3fda into mlr3 website

load mlr3fda, so pipeops are listed in overview

pipeops should respect test roles

MIMIC-III

use mean.tf, var.tf etc.

Use mlr3pipelines helper functions in tests

Look at mlr for inspiration / ideas

remove remainders of old roxygen descriptions

PipeOp for correlation between feature’s

tf_crossprod can be used fro the tf package

Buch Kapitel zu mlr3fda

overload hash_input for functional vectors

Functional Response

We should perhaps think about introducing a TaskFunctionalRegr.
As an output, this task would have one of the functional feature classes we have designed.

There is software that allows directly modeling functional responses, e.g. FDBoost and usig keras/torch its quite easy to model.

Fabian will create package like tidyfun with minimal dependencies, should we just use this datatype?

PipeOpFFS: Address edge cases with no evaluations for an individual

mlr3fda/R/PipeOpFFS.R

Line 239 in 14b8da8

ffind = function(x, left = -Inf, right = Inf) {

assumes that x has length > 0

backend for functional features

A potentially more efficient storage for (regular) functional features might be a DataBackendDuckDB that stores a table for the normal features and a table for each functional feature.

Crate custom check functions

don't warn when creating duplicated column names in PipeOpFDAExtract

Maybe only output this as debug info or not at all (?)

Installation / Deployment to runiverse

In case tf if not on CRAN by the end of march, we should enable installation of mlr3fda through r-universe as is e.g. done for mlr3proba (https://github.com/mlr-org/mlr3proba#installation).

Check whether tf vectors work with PipeOpFeatureUnion

PipeOpFeatureUnion relies on unlist() (https://github.com/mlr-org/mlr3pipelines/blob/dae03afd8efa83621a495a058f42cfc2ea8d7357/R/PipeOpFeatureUnion.R#L184-L191). It woul be a good idea to check whether this can tell whether different tf vectors are real duplicates.

Speed up FFS case with different window per fextractor

The current problem is that when we want to specify different lower and upper sizes per fextractor the PipeOpFFS becomes slower.

Now lower can be a list with different values per fextractor and the same for upper.

Functional Principal Component Analysis

besides the simple feature extraction through PipeOpFFS, another important method is functional PCA.

We can implement a PipeOpFPCA similar to PipeOpFFS that does a functional PCA.

I think the tfb_fpc function can be used here.

Add PipeOp to overview in readme

Test edge cases for PipeOpFFA

What happens when there are no observations in the specified window
Can there ever be NA's in the tfd_irreg for args for which values exist
What happens when the specified window is (partly) outside of the domain

allow offset for window

`functional` class design

In a first step, let us only consider 1-D functional data.
There are several layers of complexity we could consider

~~functional_matrix: A functional is a n x t matrix. We assume that all values are made at the same t different time points.~~
Not supported by data.table
functional_list: A functional feature is a list of n functions:
- variable length lists,
- lists with a time component (e.g. a time-series)
- Something from tidyfun`

Proposal: Each of those has class functional and perhaps a more specific second class.

What do we need:

A converter to each sub-class
- as_functional_matrix
- as_functional_list
A converter to flatten (S3-method) to a matrix.
- flatten_functional (S3) for the two sub-classes above)
PipeOpFlatten: Converts from functional to a data.table of doubles
PipeOpExtractFeature: Applies a function to each functional, e.g. m̀ean`.

Research:

What functional_list type do we want?