Giter VIP home page Giter VIP logo

Comments (14)

MarcinKosinski avatar MarcinKosinski commented on August 16, 2024

@krzyslom No idea. What happens in the FSelector?

from fselectorrcpp.

MarcinKosinski avatar MarcinKosinski commented on August 16, 2024

Actually we could have this as an input parametr: 'discretize.dependent = FALSE' or something More sexy

from fselectorrcpp.

zzawadz avatar zzawadz commented on August 16, 2024

Because to use MDL algorithm (our default used for discretization) uses the conditional entropy to find the splitting points. To calculate conditional entropy we need to have an independent and dependent variable. The question is - what should be an dependent variable for dependent variable in our case?

from fselectorrcpp.

MarcinKosinski avatar MarcinKosinski commented on August 16, 2024

@zzawadz maybe the dependent variable can be splitted by quantiles - maybe this is done for the FSelector?

from fselectorrcpp.

MarcinKosinski avatar MarcinKosinski commented on August 16, 2024

@krzyslom so for the dependent variable FSelector uses FSelector:::equal.frequency.binning.discretization

> FSelector:::information.gain.body
function (formula, data, type = c("infogain", "gainratio", "symuncert"), 
    unit) 
{
    type = match.arg(type)
    new_data = get.data.frame.from.formula(formula, data)
    new_data = discretize.all(formula, new_data)
    attr_entropies = sapply(new_data, entropyHelper, unit)
    class_entropy = attr_entropies[1]
    attr_entropies = attr_entropies[-1]
    joint_entropies = sapply(new_data[-1], function(t) {
        entropyHelper(data.frame(cbind(new_data[[1]], t)), unit)
    })
    results = class_entropy + attr_entropies - joint_entropies
    if (type == "gainratio") {
        results = ifelse(attr_entropies == 0, 0, results/attr_entropies)
    }
    else if (type == "symuncert") {
        results = 2 * results/(attr_entropies + class_entropy)
    }
    attr_names = dimnames(new_data)[[2]][-1]
    return(data.frame(attr_importance = results, row.names = attr_names))
}
<environment: namespace:FSelector>
> FSelector:::discretize.all
function (formula, data) 
{
    new_data = get.data.frame.from.formula(formula, data)
    dest_column_name = dimnames(new_data)[[2]][1]
    if (!is.factor(new_data[[1]])) {
        new_data[[1]] = equal.frequency.binning.discretization(new_data[[1]], 
            5)
    }
    new_data = supervised.discretization(formula, data = new_data)
    new_data = get.data.frame.from.formula(formula, new_data)
    return(new_data)
}
<environment: namespace:FSelector>
> FSelector:::equal.frequency.binning.discretization
function (data, bins) 
{
    bins = as.integer(bins)
    if (!is.numeric(data)) 
        stop("Data must be numeric")
    if (bins < 1) 
        stop("Number of bins too small")
    complete = complete.cases(data)
    ord = order(data)
    len = length(data[complete])
    blen = len/bins
    new_data = data
    p1 = p2 = 0
    for (i in 1:bins) {
        p1 = p2 + 1
        p2 = round(i * blen)
        new_data[ord[p1:min(p2, len)]] = i
    }
    return(factor(new_data))
}
<environment: namespace:FSelector>

Could we add this as an option to FSelectorRcpp ?

information_gain( equal = FALSE)
@param equal A logical; whether to discretize dependent variable with the \code{equal frequency binning discretization} or whether to not provide discretization for the dependent variable?

This looks very similiar (if not the same) to the bindings by quantile()s.

from fselectorrcpp.

krzyslom avatar krzyslom commented on August 16, 2024

Done.

from fselectorrcpp.

MarcinKosinski avatar MarcinKosinski commented on August 16, 2024

Perfect job @krzyslom !
Do we have any tests that verify new feature? Is so, please provide 2-3 examples Here And the close the issue (if the NEWS.md is updated with the Extra content About this feature).

PS. This is the regular way in which you state the issue was 'DONE' so you could provide those activities anyway, without being asked. Let's keep improving :)!

from fselectorrcpp.

krzyslom avatar krzyslom commented on August 16, 2024

@MarcinKosinski
Yes, the added test is in the penultimate commit. I don't understand, do you want me to provide a few examples here?

from fselectorrcpp.

MarcinKosinski avatar MarcinKosinski commented on August 16, 2024

@krzyslom yeah, so that we dont have to install the package And lanych RStudio to verify if that is Actually working :)

from fselectorrcpp.

MarcinKosinski avatar MarcinKosinski commented on August 16, 2024

I still don't believe this is done : P do I need to really install the package and write my own examples to verify?

from fselectorrcpp.

zzawadz avatar zzawadz commented on August 16, 2024

@krzyslom ping?

from fselectorrcpp.

krzyslom avatar krzyslom commented on August 16, 2024

@zzawadz I'll look into this next weekend.

from fselectorrcpp.

krzyslom avatar krzyslom commented on August 16, 2024

@zzawadz @MarcinKosinski

Warning and plain factorization with equal = FALSE (default)

FSelectorRcpp::information_gain(Sepal.Length ~ ., iris)
#> Warning in .information_gain.data.frame(x = x, y = y, type = type, equal
#> = equal, : Dependent variable is a numeric! It will be converted to
#> factor with simple factor(y). We do not discretize dependent variable
#> in FSelectorRcpp by default! You can choose equal frequency binning
#> discretization by setting equal argument to TRUE.
#> 
#>     attributes importance
#> 1  Sepal.Width  0.0000000
#> 2 Petal.Length  0.5237759
#> 3  Petal.Width  0.4908546
#> 4      Species  0.6078468

Same results for equal = TRUE and FSelector

FSelectorRcpp::information_gain(Sepal.Length ~ ., iris, equal = TRUE)
#>     attributes importance
#> 1  Sepal.Width  0.0000000
#> 2 Petal.Length  0.6645210
#> 3  Petal.Width  0.4350477
#> 4      Species  0.4871368

FSelector::information.gain(Sepal.Length ~ ., iris)
#>              attr_importance
#> Sepal.Width        0.0000000
#> Petal.Length       0.6645210
#> Petal.Width        0.4350477
#> Species            0.4871368

from fselectorrcpp.

MarcinKosinski avatar MarcinKosinski commented on August 16, 2024

Awsome. Thank you for the effort. Great job

from fselectorrcpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.