Giter VIP home page Giter VIP logo

evabic's Introduction

  • 👋 Hi, I’m @abichat
  • 👀 I’m interested in R, statistical methods and biological applications
  • 💼 I’m working at Servier.
  • 📫 LinkedIn

evabic's People

Contributors

abichat avatar bisaloo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

evabic's Issues

`ebc_tidy_by_threshold()` is very slow for large vectors

library(evabic)
library(magrittr)

set.seed(1)
n <- 10000
fake_data <- data.frame(name        = paste("species", 1:n, sep = "_"), 
                        value       = (1:n)/n, 
                        true_status = sample(c(T, F), size = n, replace = T))

## Appel à ebc_tidy_by_threshold pour calculer l'AUC (rapide jusqu'à n = 1000)
tictoc::tic()
run_1 <- ebc_tidy_by_threshold(detection_values = with(fake_data, setNames(value, name)), 
                              true              = with(fake_data, name[true_status]), 
                              all               = with(fake_data, name),
                              measures          = c('TPR', "FPR", "FDR"), 
                              direction         = "<=")
tictoc::toc() ## 49.75 s

## méthode directe sans passer par ebc_tidy
## Prototypée uniquement pour direction = "<" et measures = c('TPR', "FPR", "FDR") mais s'adapte facilement aux autres cas
## Par flemme, j'utilise quelques fonctions de dplyr mais ce n'est pas strictement nécessaire et je ne reprend pas le préprocessing
my_ebc_tidy_by_threshold <- function(detection_values, true, all) {
  N <- length(detection_values)
  N_true <- length(true)
  d <- data.frame(
    ID        = names(detection_values), 
    threshold = detection_values, 
    status    = names(detection_values) %in% true
  ) %>% 
    ## Sort the data to compute TP / FP / FN / TN iteratively
    dplyr::arrange(threshold)  %>% ## desc(threshold) si direction = ">" ou ">!"
    dplyr::mutate(TP  = cumsum(status),          ## Number of TP when using the current threshold, ajouter -1 si direction = "<" au lieu de "<="
                  FP  = 1:N - TP,                ## Number of FP when using the current threshold
                  FN  = N_true - TP,             ## Number of FN when using the current threshold
                  TN  = N - TP - FP - FN,        ## Number of TN when using the current threshold
                  FDR = FP / pmax((FP + TP), 1),
                  TPR = TP / (TP + FN),          ## Recall / sensitivity
                  FPR = FP / (TN + FP)           ## 1 - specificity
    )
  ## Rajouter du code si on demande d'autres mesures
  ## Remove rows corresponding to duplicate scores by keeping only the last one (use rev twice to keep last one instead of first one)
  rows_to_exclude <- d$threshold %>% rev() %>% duplicated() %>% rev()
  d <- d[!rows_to_exclude, ]
  d
}

tictoc::tic()
run_2 <- my_ebc_tidy_by_threshold(detection_values = with(fake_data, setNames(value, name)), 
                                  true              = with(fake_data, name[true_status]), 
                                  all               = with(fake_data, name))
tictoc::toc() # 0.043 s

Probably because there's a lot a useless computation going on when using lapply and computing many related quantities. The prototype my_ebc_tidy_by_threshold() is a proof of concept for a faster implementation that computes basic quantities (TP, TN, FP, FN) efficiently for each threshold by sorting the data and then computes derived metrics from TP, TN, FP, FN. It does not handle border cases yet, i.e. when computing AUC, one should add c(0, 0) and c(1, 1) to c(TPR, FPR), but it's much faster.

ebc_tidy() fails when detected and true are unnamed logical vectors

library(evabic)
ebc_tidy(detected = c(T, T, F), true = c(T, F, T), all = letters[1:3])

(plus ebc_tidy() depends on the argument all only through m)

A simple fix would to rewrite n2lc() as

function (x, all) 
{
    if (is.logical(x)) {
        if (!is.null(names(x))) {
        return(names(x)[x])
        } else {
        return(all[x]) ## Assumes all and x elements are in the same order but without name information, that's reasonable
       }
    }
    else {
        return(x)
    }
}

AUC negative when direction = ">"

Hey,

It's not a big issue, but I got a negative AUC while trying to use ebc_AUC with the parameter direction = ">".
A minimal example would be, with your example from the README :
ebc_AUC(detection_values = 1-pvalues, true = predictors, m = 7, direction = ">")

Merci pour le package, qui m'a été utile !
Benoit

Add support for named boolean vectors

Add support for named boolean vectors as detected argument for all ebc_* function. As a nice side effect, it would allow all the functions to compute m automatically

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.