modal-inria / cfda Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 2.0 20.33 MB

Categorical Functional Data Analysis

Home Page: https://modal-inria.github.io/cfda

R 100.00%

categorical-data functional-data-analysis hacktoberfest r

cfda's People

Contributors

Stargazers

Watchers

Forkers

f-bsc fbassac

cfda's Issues

plotly version of graph

wrong id labels in pc names when ids are not in alphabetical order

mail from Caroline Peltier

In compute_optimal_encoding : if the ids are not in the alphabetic order, labels of principal components are permuted (and the results are wrong)

# CFDA
#====================

# We simulated a simple dataset to understand how cfda works 
# We obtained a bug when the ids in the dataset are not entered in the alphabetical order (due to the use of "unique" function ?)
# The following code illustrates the bug using the same dataset with permuted lines
# A proposition of correction is at the end of the code
# The same kind of bug could occur in bootstrap functions... I haven't tested them.

# Generating all lines of the dataset : one line per evaluation:  P1, P2 and P3 represent 3 given products that are evaluated along time by 3 subjects (S1, S2 and S3)
all_dat_P1S1=data.frame(state=c("A","B","A"),time=c(0,0.5,1),id=rep("P1S1",3))
all_dat_P1S2=data.frame(state=c("A","B","A"),time=c(0,0.4,1),id=rep("P1S2",3))
all_dat_P1S3=data.frame(state=c("A","B","A"),time=c(0,0.6,1),id=rep("P1S3",3))
all_dat_P2S1=data.frame(state=c("A","C","A"),time=c(0,0.5,1),id=rep("P2S1",3))
all_dat_P2S2=data.frame(state=c("A","C","A"),time=c(0,0.4,1),id=rep("P2S2",3))
all_dat_P2S3=data.frame(state=c("A","C","A"),time=c(0,0.6,1),id=rep("P2S3",3))
all_dat_P3S1=data.frame(state=c("C","A","A"),time=c(0,0.5,1),id=rep("P3S1",3))
all_dat_P3S2=data.frame(state=c("C","A","A"),time=c(0,0.4,1),id=rep("P3S2",3))
all_dat_P3S3=data.frame(state=c("C","A","A"),time=c(0,0.6,1),id=rep("P3S3",3))

# Original dataset
all_dat_a=rbind(all_dat_P1S1,all_dat_P1S2,all_dat_P1S3,all_dat_P2S1,all_dat_P2S2,all_dat_P2S3, all_dat_P3S1,all_dat_P3S2,all_dat_P3S3 )

# Same dataset with permuted lines
all_dat_b=rbind(all_dat_P2S1,all_dat_P2S2,all_dat_P2S3, all_dat_P3S1,all_dat_P3S2,all_dat_P3S3,all_dat_P1S1,all_dat_P1S2,all_dat_P1S3 )

# Visualization of both dataset : no problem, same visualization for both (permuted or not) dataset
plotData(all_dat_a)
plotData(all_dat_b)

# Computing optimal encoding
basisobj <- create.bspline.basis(c(0, 1), nbasis = 6,norder=1)
fmca_a <- compute_optimal_encoding(all_dat_a, basis=basisobj, nCores = 1)
fmca_b <- compute_optimal_encoding(all_dat_b, basis=basisobj, nCores = 1)

# visualizing the bug
plotComponent(fmca_a)
plotComponent(fmca_b) # wrong label !
plot(fmca_a,harm=1)+theme_bw() # this encoding allows a good interpretation given the original data presented in plotData(all_dat_a) : 
# the first axis separates the "blue" before frome the blue or green after  (that is to say P3 from P1 and P2) => good interpretation for fmca_a


# Visualizing rownames of pcs and the origin of the bug
fmca_a$pc[,1:3]
fmca_b$pc[,1:3] # wrong  rownames

# Using compute_optimal_encoding2 (same as compute_optimal_enconding, but with 
# uniqueId=unique(data$id) is replaced by
# uniqueId=levels(factor(data$id)). See below

fmca_c <- compute_optimal_encoding2(all_dat_a, basis=basisobj, nCores = 1)
fmca_d <- compute_optimal_encoding2(all_dat_b, basis=basisobj, nCores = 1)

fmca_c$pc[,1:3]
fmca_d$pc[,1:3] # ok, bug corrected

plotComponent(fmca_c)
plotComponent(fmca_d) # ok, bug corrected


#====================================
# compute_optimal_encoding2
#===================================
compute_optimal_encoding2=function (data, basisobj, computeCI = TRUE, nBootstrap = 50, 
                                    propBootstrap = 1, nCores = max(1, ceiling(parallel:::detectCores()/2)), 
                                    verbose = TRUE, ...) 
{
  t1 <- proc.time()
  cfda:::checkData(data) 
  cfda:::checkDataBeginTime(data)
  cfda:::checkDataEndTmax(data)
  if (!is.basis(basisobj)) {
    stop("basisobj is not a basis object.")
  }
  if (any(is.na(nCores)) || !cfda:::is.whole.number(nCores) || (nCores < 
                                                                1)) {
    stop("nCores must be an integer > 0.")
  }
  cfda:::checkLogical(verbose, "verbose") 
  cfda:::checkLogical(computeCI, "computeCI") 
  if (computeCI) {
    if (any(is.na(nBootstrap)) || (length(nBootstrap) != 
                                   1) || ! cfda:::is.whole.number(nBootstrap) || (nBootstrap < 
                                                                                 1)) {
      stop("nBootstrap must be an integer > 0.")
    }
    if (any(is.na(propBootstrap)) || !is.numeric(propBootstrap) || 
        (length(propBootstrap) != 1) || (propBootstrap > 
                                         1) || (propBootstrap <= 0)) {
      stop("propBootstrap must be a real between 0 and 1.")
    }
  }
  pt <- estimate_pt(data)
  if (verbose) {
    cat("######### Compute encoding #########\n")
  }
  out <- cfda:::stateToInteger(data$state)
  data$state <- out$state
  label <- out$label
  rm(out)
  # /!\ v============================TO REMOVE
  #uniqueId <- unique(data$id)
  #----------------------------------------
  
  # ================================TO ADD
  uniqueId=levels(factor(data$id))
  #==========================
  nId <- length(uniqueId)
  nCores <- min(max(1, nCores), parallel:::detectCores() - 1)
  Tmax <- max(data$time)
  K <- length(label$label)
  nBasis <- basisobj$nbasis
  if (verbose) {
    cat(paste0("Number of individuals: ", nId, "\n"))
    cat(paste0("Number of states: ", K, "\n"))
    cat(paste0("Basis type: ", basisobj$type, "\n"))
    cat(paste0("Number of basis functions: ", nBasis, 
               "\n"))
    cat(paste0("Number of cores: ", nCores, "\n"))
  }
  V <- cfda:::computeVmatrix(data, basisobj, K, nCores, verbose, ...)
  Uval <- cfda:::computeUmatrix(data, basisobj, K, nCores, verbose, 
                                ...)
  fullEncoding <- cfda:::computeEncoding(Uval, V, K, nBasis, uniqueId, 
                                         label, verbose, manage0 = TRUE)
  if (computeCI) {
    signRef <- cfda:::getSignReference(fullEncoding$alpha)
    bootEncoding <- cfda:::computeBootStrapEncoding(Uval, V, K, 
                                                    nBasis, label, nId, propBootstrap, nBootstrap, signRef, 
                                                    verbose)
    if (length(bootEncoding) == 0) {
      warning("All bootstrap samples return an error. Try to change the basis.")
      out <- c(fullEncoding, list(V = V, basisobj = basisobj, 
                                  label = label, pt = pt))
    }
    else {
      varAlpha <- cfda:::computeVarianceAlpha(bootEncoding, K, 
                                              nBasis)
      out <- c(fullEncoding, list(V = V, basisobj = basisobj, 
                                  label = label, pt = pt, bootstrap = bootEncoding, 
                                  varAlpha = varAlpha))
    }
  }
  else {
    out <- c(fullEncoding, list(V = V, basisobj = basisobj, 
                                label = label, pt = pt))
  }
  class(out) <- "fmca"
  t2 <- proc.time()
  out$runTime <- as.numeric((t2 - t1)[3])
  if (verbose) {
    cat(paste0("Run Time: ", round(out$runTime, 2), 
               "s\n"))
  }
  return(out)
}

investigate the use of invisible()

The use of invisible(return(NULL)) seems not to work as intended:

cfda:::checkData(d_JK2)
NULL

It would seem that invisible(NULL) is in fact intended.

Improve computation time of U and V

Currently, we compute U and V coeff for each individual separately.
But when the number of unique time values is not large (cf. biofam2: only 16 time values), some computation are repeated among individuals.

Idea

find all unique time values
compute and store integrals between 2 consecutive time values
compute the matrix U and V by summing the needed integrals

dplyr

investigate the use of dplyr and a tibble data.frame to ease the code

improve the current implementation of stateToInteger/refactorCategorical with factors

It would seem that the internal coding of factors could be more efficiently used than the current implementation in stateToInteger and refactorCategorical.

Create a class for data managing the data format

it would not be better to have a coercion function which transforms data to a suitable data format for the CFD analysis and
which has its associated print, summary, plot methods and hence no summary_cfd method is required. It would seem that some checks should be included such as having the time sequences within individuals suitably sorted.

print
summary

Create a function to transform quantitative functional data to cfda data.frame

Input:

a fda object or a matrix containing the records of n individuals at p timestamps
a vector of threshold
associated labels

y <- c(1, 3, 4, 6, 7, 8, 7, 6)
thresholds <- c(2, 4, 6, 8)
labels <- c("A", "B", "C", "D", "E")

Output:

cfda data.frame

y converted to categorical (before the cfda format)

c("A", "B", "B", "C", "D", "D", "D", "C")

right or left open interval ?

Change the way the class of an object is checked

https://developer.r-project.org/Blog/public/2019/11/09/when-you-think-class.-think-again/#if-you-think-class.-think-again-rather-inherits.-.-or-is.

Use parallel instead of snow ?

we use snow instead of parallel to easily print information during the foreach loop. But it seems snow is being discontinued...

Investigate a way to have progress bar with parallel package

https://stackoverflow.com/questions/10903787/how-can-i-print-when-using-dopar
https://giorgilab.org/2019/10/20/progress-bars-and-parallelization-in-r/

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.