mlr-org / mlr3fda Goto Github PK
View Code? Open in Web Editor NEWFunctional Data Analysis for mlr3
Home Page: https://mlr3fda.mlr-org.com/
License: GNU Lesser General Public License v3.0
Functional Data Analysis for mlr3
Home Page: https://mlr3fda.mlr-org.com/
License: GNU Lesser General Public License v3.0
tfd_reg
in PipeOpFFS
and add a testparam_set$get_values()
instead of param_set$values
in PipeOpFFS
"required"
and initial values instead of defaultsPipeOpFFS
PipeOpFFS
tf
package can be installed in the latest R versiontfd_irreg
but this is falseSome datasets we can use to benchmark the methods
Instead of extracting trends over time for each feature (e.g. blood pressure, respiratory rate, etc.) individually, this topic focuses on extracting multidimensional trends (including but not necessarily limited to two dimensions), e.g. correlation between individual features over time. Since the measurements of the individual features are not necessarily performed at the same time points, this topic includes a strategy for aligning the features’ time axes (e.g. by binning, etc.); see example data excerpt below.
THis means we always need to update the data when something in the tf package changes
To find good intervals for the extracted features, there are two possibilities:
lower
and upper
boundlower
and upper
bounds and then do feature selection.We probably also want to allow the latter, i.e. the lower
and upper
parameter of PipeOpFFS
can also be vectors:
lower = c(1, 2); upper = c(2, 3)
or lower = 1:10; upper = Inf
should both be possible.
Maybe we should wait with this until the C Code is merged (or not).
Similar to flatfun but stay in functional data type
usethis::use_readme_rmd()
Additional option to determine grid: give lower, upper and resolution
Have a few examples and figures in the README. This is really important for outside visibility
When creating a graph learner that has as input a task with tf columns, the data_prototype
that is saved in the learner's state after training contains the arg
and value
vectors as well as the evaluator and other metadata defined in tf
.
This unnecessarily blows up the size of learner states in a way that was not intended.
I think this should be fixed in tf
, i.e. 0-lentgh tf vectors should drop discardable metadata.
Task print giving following error, independent of task used and only occurs when loading mlr3fda
:
Error in map_values(x, r_types, p_types) :
Assertion on 'new' failed: Must be of type 'atomic vector', not 'NULL'.
Error occurs because the task_feature_types are expected to be named and the register_mlr3
removes the names due the union
function dropping names:
mlr_reflections$task_feature_types = c(
lgl = "logical", int = "integer", dbl = "numeric", chr = "character", fct = "factor", ord = "ordered", pxc = "POSIXct"
)
library("mlr3fda")
task = tsk("fuel")
task
What is functional data
How do I get my data into the tf data, which data types are there and how do they work
How does the package worrk
Showcast the use of the PipeOps
Other comments:
We can take inspiration from: https://arxiv.org/pdf/1911.07511.pdf
functional
to featuresWith irregular data the implementation might not be optimal.
tf_crossprod can be used fro the tf package
We should perhaps think about introducing a TaskFunctionalRegr
.
As an output, this task would have one of the functional
feature classes we have designed.
There is software that allows directly modeling functional responses, e.g. FDBoost and usig keras
/torch
its quite easy to model.
Line 239 in 14b8da8
A potentially more efficient storage for (regular) functional features might be a DataBackendDuckDB
that stores a table for the normal features and a table for each functional feature.
Maybe only output this as debug info or not at all (?)
In case tf
if not on CRAN by the end of march, we should enable installation of mlr3fda
through r-universe as is e.g. done for mlr3proba
(https://github.com/mlr-org/mlr3proba#installation).
PipeOpFeatureUnion relies on unlist()
(https://github.com/mlr-org/mlr3pipelines/blob/dae03afd8efa83621a495a058f42cfc2ea8d7357/R/PipeOpFeatureUnion.R#L184-L191). It woul be a good idea to check whether this can tell whether different tf vectors are real duplicates.
The current problem is that when we want to specify different lower and upper sizes per fextractor the PipeOpFFS becomes slower.
Now lower
can be a list with different values per fextractor and the same for upper
.
besides the simple feature extraction through PipeOpFFS
, another important method is functional PCA.
We can implement a PipeOpFPCA
similar to PipeOpFFS
that does a functional PCA.
I think the tfb_fpc
function can be used here.
In a first step, let us only consider 1-D functional data.
There are several layers of complexity we could consider
functional_matrix
: A functional is a n x t
matrix. We assume that all values are made at the same t
different time points.data.table
functional_list
: A functional
feature is a list of n
functions
:
Something from
tidyfun`Proposal: Each of those has class functional
and perhaps a more specific second class.
What do we need:
as_functional_matrix
as_functional_list
flatten_functional
(S3) for the two sub-classes above)PipeOpFlatten
: Converts from functional
to a data.table
of doubles
PipeOpExtractFeature
: Applies a function to each functional
, e.g. m̀ean`.Research:
functional_list
type do we want?A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.