Giter VIP home page Giter VIP logo

cfda's People

Contributors

quentin62 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

f-bsc fbassac

cfda's Issues

wrong id labels in pc names when ids are not in alphabetical order

mail from Caroline Peltier

In compute_optimal_encoding : if the ids are not in the alphabetic order, labels of principal components are permuted (and the results are wrong)

# CFDA
#====================

# We simulated a simple dataset to understand how cfda works 
# We obtained a bug when the ids in the dataset are not entered in the alphabetical order (due to the use of "unique" function ?)
# The following code illustrates the bug using the same dataset with permuted lines
# A proposition of correction is at the end of the code
# The same kind of bug could occur in bootstrap functions... I haven't tested them.

# Generating all lines of the dataset : one line per evaluation:  P1, P2 and P3 represent 3 given products that are evaluated along time by 3 subjects (S1, S2 and S3)
all_dat_P1S1=data.frame(state=c("A","B","A"),time=c(0,0.5,1),id=rep("P1S1",3))
all_dat_P1S2=data.frame(state=c("A","B","A"),time=c(0,0.4,1),id=rep("P1S2",3))
all_dat_P1S3=data.frame(state=c("A","B","A"),time=c(0,0.6,1),id=rep("P1S3",3))
all_dat_P2S1=data.frame(state=c("A","C","A"),time=c(0,0.5,1),id=rep("P2S1",3))
all_dat_P2S2=data.frame(state=c("A","C","A"),time=c(0,0.4,1),id=rep("P2S2",3))
all_dat_P2S3=data.frame(state=c("A","C","A"),time=c(0,0.6,1),id=rep("P2S3",3))
all_dat_P3S1=data.frame(state=c("C","A","A"),time=c(0,0.5,1),id=rep("P3S1",3))
all_dat_P3S2=data.frame(state=c("C","A","A"),time=c(0,0.4,1),id=rep("P3S2",3))
all_dat_P3S3=data.frame(state=c("C","A","A"),time=c(0,0.6,1),id=rep("P3S3",3))

# Original dataset
all_dat_a=rbind(all_dat_P1S1,all_dat_P1S2,all_dat_P1S3,all_dat_P2S1,all_dat_P2S2,all_dat_P2S3, all_dat_P3S1,all_dat_P3S2,all_dat_P3S3 )

# Same dataset with permuted lines
all_dat_b=rbind(all_dat_P2S1,all_dat_P2S2,all_dat_P2S3, all_dat_P3S1,all_dat_P3S2,all_dat_P3S3,all_dat_P1S1,all_dat_P1S2,all_dat_P1S3 )

# Visualization of both dataset : no problem, same visualization for both (permuted or not) dataset
plotData(all_dat_a)
plotData(all_dat_b)

# Computing optimal encoding
basisobj <- create.bspline.basis(c(0, 1), nbasis = 6,norder=1)
fmca_a <- compute_optimal_encoding(all_dat_a, basis=basisobj, nCores = 1)
fmca_b <- compute_optimal_encoding(all_dat_b, basis=basisobj, nCores = 1)

# visualizing the bug
plotComponent(fmca_a)
plotComponent(fmca_b) # wrong label !
plot(fmca_a,harm=1)+theme_bw() # this encoding allows a good interpretation given the original data presented in plotData(all_dat_a) : 
# the first axis separates the "blue" before frome the blue or green after  (that is to say P3 from P1 and P2) => good interpretation for fmca_a


# Visualizing rownames of pcs and the origin of the bug
fmca_a$pc[,1:3]
fmca_b$pc[,1:3] # wrong  rownames

# Using compute_optimal_encoding2 (same as compute_optimal_enconding, but with 
# uniqueId=unique(data$id) is replaced by
# uniqueId=levels(factor(data$id)). See below

fmca_c <- compute_optimal_encoding2(all_dat_a, basis=basisobj, nCores = 1)
fmca_d <- compute_optimal_encoding2(all_dat_b, basis=basisobj, nCores = 1)

fmca_c$pc[,1:3]
fmca_d$pc[,1:3] # ok, bug corrected

plotComponent(fmca_c)
plotComponent(fmca_d) # ok, bug corrected


#====================================
# compute_optimal_encoding2
#===================================
compute_optimal_encoding2=function (data, basisobj, computeCI = TRUE, nBootstrap = 50, 
                                    propBootstrap = 1, nCores = max(1, ceiling(parallel:::detectCores()/2)), 
                                    verbose = TRUE, ...) 
{
  t1 <- proc.time()
  cfda:::checkData(data) 
  cfda:::checkDataBeginTime(data)
  cfda:::checkDataEndTmax(data)
  if (!is.basis(basisobj)) {
    stop("basisobj is not a basis object.")
  }
  if (any(is.na(nCores)) || !cfda:::is.whole.number(nCores) || (nCores < 
                                                                1)) {
    stop("nCores must be an integer > 0.")
  }
  cfda:::checkLogical(verbose, "verbose") 
  cfda:::checkLogical(computeCI, "computeCI") 
  if (computeCI) {
    if (any(is.na(nBootstrap)) || (length(nBootstrap) != 
                                   1) || ! cfda:::is.whole.number(nBootstrap) || (nBootstrap < 
                                                                                 1)) {
      stop("nBootstrap must be an integer > 0.")
    }
    if (any(is.na(propBootstrap)) || !is.numeric(propBootstrap) || 
        (length(propBootstrap) != 1) || (propBootstrap > 
                                         1) || (propBootstrap <= 0)) {
      stop("propBootstrap must be a real between 0 and 1.")
    }
  }
  pt <- estimate_pt(data)
  if (verbose) {
    cat("######### Compute encoding #########\n")
  }
  out <- cfda:::stateToInteger(data$state)
  data$state <- out$state
  label <- out$label
  rm(out)
  # /!\ v============================TO REMOVE
  #uniqueId <- unique(data$id)
  #----------------------------------------
  
  # ================================TO ADD
  uniqueId=levels(factor(data$id))
  #==========================
  nId <- length(uniqueId)
  nCores <- min(max(1, nCores), parallel:::detectCores() - 1)
  Tmax <- max(data$time)
  K <- length(label$label)
  nBasis <- basisobj$nbasis
  if (verbose) {
    cat(paste0("Number of individuals: ", nId, "\n"))
    cat(paste0("Number of states: ", K, "\n"))
    cat(paste0("Basis type: ", basisobj$type, "\n"))
    cat(paste0("Number of basis functions: ", nBasis, 
               "\n"))
    cat(paste0("Number of cores: ", nCores, "\n"))
  }
  V <- cfda:::computeVmatrix(data, basisobj, K, nCores, verbose, ...)
  Uval <- cfda:::computeUmatrix(data, basisobj, K, nCores, verbose, 
                                ...)
  fullEncoding <- cfda:::computeEncoding(Uval, V, K, nBasis, uniqueId, 
                                         label, verbose, manage0 = TRUE)
  if (computeCI) {
    signRef <- cfda:::getSignReference(fullEncoding$alpha)
    bootEncoding <- cfda:::computeBootStrapEncoding(Uval, V, K, 
                                                    nBasis, label, nId, propBootstrap, nBootstrap, signRef, 
                                                    verbose)
    if (length(bootEncoding) == 0) {
      warning("All bootstrap samples return an error. Try to change the basis.")
      out <- c(fullEncoding, list(V = V, basisobj = basisobj, 
                                  label = label, pt = pt))
    }
    else {
      varAlpha <- cfda:::computeVarianceAlpha(bootEncoding, K, 
                                              nBasis)
      out <- c(fullEncoding, list(V = V, basisobj = basisobj, 
                                  label = label, pt = pt, bootstrap = bootEncoding, 
                                  varAlpha = varAlpha))
    }
  }
  else {
    out <- c(fullEncoding, list(V = V, basisobj = basisobj, 
                                label = label, pt = pt))
  }
  class(out) <- "fmca"
  t2 <- proc.time()
  out$runTime <- as.numeric((t2 - t1)[3])
  if (verbose) {
    cat(paste0("Run Time: ", round(out$runTime, 2), 
               "s\n"))
  }
  return(out)
}

investigate the use of invisible()

  • The use of invisible(return(NULL)) seems not to work as intended:
cfda:::checkData(d_JK2)
NULL

It would seem that invisible(NULL) is in fact intended.

Improve computation time of U and V

Currently, we compute U and V coeff for each individual separately.
But when the number of unique time values is not large (cf. biofam2: only 16 time values), some computation are repeated among individuals.

Idea

  • find all unique time values
  • compute and store integrals between 2 consecutive time values
  • compute the matrix U and V by summing the needed integrals

dplyr

investigate the use of dplyr and a tibble data.frame to ease the code

Create a class for data managing the data format

it would not be better to have a coercion function which transforms data to a suitable data format for the CFD analysis and
which has its associated print, summary, plot methods and hence no summary_cfd method is required. It would seem that some checks should be included such as having the time sequences within individuals suitably sorted.

Create a function to transform quantitative functional data to cfda data.frame

Input:

  • a fda object or a matrix containing the records of n individuals at p timestamps
  • a vector of threshold
  • associated labels
y <- c(1, 3, 4, 6, 7, 8, 7, 6)
thresholds <- c(2, 4, 6, 8)
labels <- c("A", "B", "C", "D", "E")

Output:

  • cfda data.frame

y converted to categorical (before the cfda format)

c("A", "B", "B", "C", "D", "D", "D", "C")  

right or left open interval ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.