Giter VIP home page Giter VIP logo

Comments (1)

MarcinKosinski avatar MarcinKosinski commented on August 16, 2024

@krzyslom I think FSelectorRcpp completly removes rows with NAs . Can you provide a summary of behaviour for FSelector in this case?

From FSelector:::information.gain.body -> FSelector:::discretize.all -> FSelector:::supervised.discretization I see

function (formula, data) 
{
    data = get.data.frame.from.formula(formula, data)
    complete = complete.cases(data[[1]])
    all.complete = all(complete)
    if (!all.complete) {
        new_data = data[complete, , drop = FALSE]
        result = Discretize(formula, data = new_data, na.action = na.pass)
        return(result)
    }
    else {
        return(Discretize(formula, data = data, na.action = na.pass))
    }
}
<environment: namespace:FSelector>

That FSelector removes only rows where NA is in the dependent variable.
So the only thing is to check how does FSelector (by the interface to RWeka::Dicretize` deals with NAs in the explanatory variables

> RWeka::Discretize
An R interface to Weka class 'weka.filters.supervised.attribute.Discretize', which has
information

  An instance filter that discretizes a range of numeric attributes in the dataset
  into nominal attributes. Discretization is by Fayyad & Irani's MDL method (the
  default).

  For more information, see:

  Usama M. Fayyad, Keki B. Irani: Multi-interval discretization of continuousvalued
  attributes for classification learning. In: Thirteenth International Joint
  Conference on Articial Intelligence, 1022-1027, 1993.

  Igor Kononenko: On Biases in Estimating Multi-Valued Attributes. In: 14th
  International Joint Conference on Articial Intelligence, 1034-1040, 1995.

  BibTeX:

  @INPROCEEDINGS{Fayyad1993,
    publisher = {Morgan Kaufmann Publishers},
    year = {1993},
    pages = {1022-1027},
    author = {Usama M. Fayyad and Keki B. Irani},
    title = {Multi-interval discretization of continuousvalued attributes for
      classification learning},
    volume = {2},
    booktitle = {Thirteenth International Joint Conference on Articial Intelligence},
  }

  @INPROCEEDINGS{Kononenko1995,
    year = {1995},
    pages = {1034-1040},
    PS = {http://ai.fri.uni-lj.si/papers/kononenko95-ijcai.ps.gz},
    author = {Igor Kononenko},
    title = {On Biases in Estimating Multi-Valued Attributes},
    booktitle = {14th International Joint Conference on Articial Intelligence},
  }

Argument list:
  x(formula, data, subset, na.action, control = NULL)

Returns objects inheriting from classes:
  Discretize data.frame

from fselectorrcpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.