modal-inria / cfda Goto Github PK
View Code? Open in Web Editor NEWCategorical Functional Data Analysis
Home Page: https://modal-inria.github.io/cfda
Categorical Functional Data Analysis
Home Page: https://modal-inria.github.io/cfda
mail from Caroline Peltier
In compute_optimal_encoding : if the ids are not in the alphabetic order, labels of principal components are permuted (and the results are wrong)
# CFDA
#====================
# We simulated a simple dataset to understand how cfda works
# We obtained a bug when the ids in the dataset are not entered in the alphabetical order (due to the use of "unique" function ?)
# The following code illustrates the bug using the same dataset with permuted lines
# A proposition of correction is at the end of the code
# The same kind of bug could occur in bootstrap functions... I haven't tested them.
# Generating all lines of the dataset : one line per evaluation: P1, P2 and P3 represent 3 given products that are evaluated along time by 3 subjects (S1, S2 and S3)
all_dat_P1S1=data.frame(state=c("A","B","A"),time=c(0,0.5,1),id=rep("P1S1",3))
all_dat_P1S2=data.frame(state=c("A","B","A"),time=c(0,0.4,1),id=rep("P1S2",3))
all_dat_P1S3=data.frame(state=c("A","B","A"),time=c(0,0.6,1),id=rep("P1S3",3))
all_dat_P2S1=data.frame(state=c("A","C","A"),time=c(0,0.5,1),id=rep("P2S1",3))
all_dat_P2S2=data.frame(state=c("A","C","A"),time=c(0,0.4,1),id=rep("P2S2",3))
all_dat_P2S3=data.frame(state=c("A","C","A"),time=c(0,0.6,1),id=rep("P2S3",3))
all_dat_P3S1=data.frame(state=c("C","A","A"),time=c(0,0.5,1),id=rep("P3S1",3))
all_dat_P3S2=data.frame(state=c("C","A","A"),time=c(0,0.4,1),id=rep("P3S2",3))
all_dat_P3S3=data.frame(state=c("C","A","A"),time=c(0,0.6,1),id=rep("P3S3",3))
# Original dataset
all_dat_a=rbind(all_dat_P1S1,all_dat_P1S2,all_dat_P1S3,all_dat_P2S1,all_dat_P2S2,all_dat_P2S3, all_dat_P3S1,all_dat_P3S2,all_dat_P3S3 )
# Same dataset with permuted lines
all_dat_b=rbind(all_dat_P2S1,all_dat_P2S2,all_dat_P2S3, all_dat_P3S1,all_dat_P3S2,all_dat_P3S3,all_dat_P1S1,all_dat_P1S2,all_dat_P1S3 )
# Visualization of both dataset : no problem, same visualization for both (permuted or not) dataset
plotData(all_dat_a)
plotData(all_dat_b)
# Computing optimal encoding
basisobj <- create.bspline.basis(c(0, 1), nbasis = 6,norder=1)
fmca_a <- compute_optimal_encoding(all_dat_a, basis=basisobj, nCores = 1)
fmca_b <- compute_optimal_encoding(all_dat_b, basis=basisobj, nCores = 1)
# visualizing the bug
plotComponent(fmca_a)
plotComponent(fmca_b) # wrong label !
plot(fmca_a,harm=1)+theme_bw() # this encoding allows a good interpretation given the original data presented in plotData(all_dat_a) :
# the first axis separates the "blue" before frome the blue or green after (that is to say P3 from P1 and P2) => good interpretation for fmca_a
# Visualizing rownames of pcs and the origin of the bug
fmca_a$pc[,1:3]
fmca_b$pc[,1:3] # wrong rownames
# Using compute_optimal_encoding2 (same as compute_optimal_enconding, but with
# uniqueId=unique(data$id) is replaced by
# uniqueId=levels(factor(data$id)). See below
fmca_c <- compute_optimal_encoding2(all_dat_a, basis=basisobj, nCores = 1)
fmca_d <- compute_optimal_encoding2(all_dat_b, basis=basisobj, nCores = 1)
fmca_c$pc[,1:3]
fmca_d$pc[,1:3] # ok, bug corrected
plotComponent(fmca_c)
plotComponent(fmca_d) # ok, bug corrected
#====================================
# compute_optimal_encoding2
#===================================
compute_optimal_encoding2=function (data, basisobj, computeCI = TRUE, nBootstrap = 50,
propBootstrap = 1, nCores = max(1, ceiling(parallel:::detectCores()/2)),
verbose = TRUE, ...)
{
t1 <- proc.time()
cfda:::checkData(data)
cfda:::checkDataBeginTime(data)
cfda:::checkDataEndTmax(data)
if (!is.basis(basisobj)) {
stop("basisobj is not a basis object.")
}
if (any(is.na(nCores)) || !cfda:::is.whole.number(nCores) || (nCores <
1)) {
stop("nCores must be an integer > 0.")
}
cfda:::checkLogical(verbose, "verbose")
cfda:::checkLogical(computeCI, "computeCI")
if (computeCI) {
if (any(is.na(nBootstrap)) || (length(nBootstrap) !=
1) || ! cfda:::is.whole.number(nBootstrap) || (nBootstrap <
1)) {
stop("nBootstrap must be an integer > 0.")
}
if (any(is.na(propBootstrap)) || !is.numeric(propBootstrap) ||
(length(propBootstrap) != 1) || (propBootstrap >
1) || (propBootstrap <= 0)) {
stop("propBootstrap must be a real between 0 and 1.")
}
}
pt <- estimate_pt(data)
if (verbose) {
cat("######### Compute encoding #########\n")
}
out <- cfda:::stateToInteger(data$state)
data$state <- out$state
label <- out$label
rm(out)
# /!\ v============================TO REMOVE
#uniqueId <- unique(data$id)
#----------------------------------------
# ================================TO ADD
uniqueId=levels(factor(data$id))
#==========================
nId <- length(uniqueId)
nCores <- min(max(1, nCores), parallel:::detectCores() - 1)
Tmax <- max(data$time)
K <- length(label$label)
nBasis <- basisobj$nbasis
if (verbose) {
cat(paste0("Number of individuals: ", nId, "\n"))
cat(paste0("Number of states: ", K, "\n"))
cat(paste0("Basis type: ", basisobj$type, "\n"))
cat(paste0("Number of basis functions: ", nBasis,
"\n"))
cat(paste0("Number of cores: ", nCores, "\n"))
}
V <- cfda:::computeVmatrix(data, basisobj, K, nCores, verbose, ...)
Uval <- cfda:::computeUmatrix(data, basisobj, K, nCores, verbose,
...)
fullEncoding <- cfda:::computeEncoding(Uval, V, K, nBasis, uniqueId,
label, verbose, manage0 = TRUE)
if (computeCI) {
signRef <- cfda:::getSignReference(fullEncoding$alpha)
bootEncoding <- cfda:::computeBootStrapEncoding(Uval, V, K,
nBasis, label, nId, propBootstrap, nBootstrap, signRef,
verbose)
if (length(bootEncoding) == 0) {
warning("All bootstrap samples return an error. Try to change the basis.")
out <- c(fullEncoding, list(V = V, basisobj = basisobj,
label = label, pt = pt))
}
else {
varAlpha <- cfda:::computeVarianceAlpha(bootEncoding, K,
nBasis)
out <- c(fullEncoding, list(V = V, basisobj = basisobj,
label = label, pt = pt, bootstrap = bootEncoding,
varAlpha = varAlpha))
}
}
else {
out <- c(fullEncoding, list(V = V, basisobj = basisobj,
label = label, pt = pt))
}
class(out) <- "fmca"
t2 <- proc.time()
out$runTime <- as.numeric((t2 - t1)[3])
if (verbose) {
cat(paste0("Run Time: ", round(out$runTime, 2),
"s\n"))
}
return(out)
}
invisible(return(NULL))
seems not to work as intended:cfda:::checkData(d_JK2)
NULL
It would seem that invisible(NULL)
is in fact intended.
Currently, we compute U and V coeff for each individual separately.
But when the number of unique time values is not large (cf. biofam2: only 16 time values), some computation are repeated among individuals.
investigate the use of dplyr and a tibble data.frame to ease the code
It would seem that the internal coding of factors could be more efficiently used than the current implementation in stateToInteger and refactorCategorical.
it would not be better to have a coercion function which transforms data to a suitable data format for the CFD analysis and
which has its associated print, summary, plot methods and hence no summary_cfd method is required. It would seem that some checks should be included such as having the time sequences within individuals suitably sorted.
Make a custom version of compute_optimel_encoding for
https://github.com/modal-inria/cfda_shiny
that can manage shiny progressbar
see
https://stackoverflow.com/questions/31990447/r-shiny-progress-bar-for-pblapply-functions
free version of travis now has a credit system and I have used all credits.
Need to switch to github action to have a ci for linux and osx
Input:
y <- c(1, 3, 4, 6, 7, 8, 7, 6)
thresholds <- c(2, 4, 6, 8)
labels <- c("A", "B", "C", "D", "E")
Output:
y
converted to categorical (before the cfda format)
c("A", "B", "B", "C", "D", "D", "D", "C")
right or left open interval ?
we use snow
instead of parallel
to easily print information during the foreach loop. But it seems snow
is being discontinued...
Investigate a way to have progress bar with parallel
package
https://stackoverflow.com/questions/10903787/how-can-i-print-when-using-dopar
https://giorgilab.org/2019/10/20/progress-bars-and-parallelization-in-r/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.