Comments (14)
@krzyslom No idea. What happens in the FSelector?
from fselectorrcpp.
Actually we could have this as an input parametr: 'discretize.dependent = FALSE' or something More sexy
from fselectorrcpp.
Because to use MDL algorithm (our default used for discretization) uses the conditional entropy to find the splitting points. To calculate conditional entropy we need to have an independent and dependent variable. The question is - what should be an dependent variable for dependent variable in our case?
from fselectorrcpp.
@zzawadz maybe the dependent
variable can be splitted by quantiles
- maybe this is done for the FSelector
?
from fselectorrcpp.
@krzyslom so for the dependent variable FSelector
uses FSelector:::equal.frequency.binning.discretization
> FSelector:::information.gain.body
function (formula, data, type = c("infogain", "gainratio", "symuncert"),
unit)
{
type = match.arg(type)
new_data = get.data.frame.from.formula(formula, data)
new_data = discretize.all(formula, new_data)
attr_entropies = sapply(new_data, entropyHelper, unit)
class_entropy = attr_entropies[1]
attr_entropies = attr_entropies[-1]
joint_entropies = sapply(new_data[-1], function(t) {
entropyHelper(data.frame(cbind(new_data[[1]], t)), unit)
})
results = class_entropy + attr_entropies - joint_entropies
if (type == "gainratio") {
results = ifelse(attr_entropies == 0, 0, results/attr_entropies)
}
else if (type == "symuncert") {
results = 2 * results/(attr_entropies + class_entropy)
}
attr_names = dimnames(new_data)[[2]][-1]
return(data.frame(attr_importance = results, row.names = attr_names))
}
<environment: namespace:FSelector>
> FSelector:::discretize.all
function (formula, data)
{
new_data = get.data.frame.from.formula(formula, data)
dest_column_name = dimnames(new_data)[[2]][1]
if (!is.factor(new_data[[1]])) {
new_data[[1]] = equal.frequency.binning.discretization(new_data[[1]],
5)
}
new_data = supervised.discretization(formula, data = new_data)
new_data = get.data.frame.from.formula(formula, new_data)
return(new_data)
}
<environment: namespace:FSelector>
> FSelector:::equal.frequency.binning.discretization
function (data, bins)
{
bins = as.integer(bins)
if (!is.numeric(data))
stop("Data must be numeric")
if (bins < 1)
stop("Number of bins too small")
complete = complete.cases(data)
ord = order(data)
len = length(data[complete])
blen = len/bins
new_data = data
p1 = p2 = 0
for (i in 1:bins) {
p1 = p2 + 1
p2 = round(i * blen)
new_data[ord[p1:min(p2, len)]] = i
}
return(factor(new_data))
}
<environment: namespace:FSelector>
Could we add this as an option to FSelectorRcpp ?
information_gain( equal = FALSE)
@param equal A logical; whether to discretize dependent variable with the \code{equal frequency binning discretization} or whether to not provide discretization for the dependent variable?
This looks very similiar (if not the same) to the bindings by quantile()
s.
from fselectorrcpp.
Done.
from fselectorrcpp.
Perfect job @krzyslom !
Do we have any tests that verify new feature? Is so, please provide 2-3 examples Here And the close the issue (if the NEWS.md is updated with the Extra content About this feature).
PS. This is the regular way in which you state the issue was 'DONE' so you could provide those activities anyway, without being asked. Let's keep improving :)!
from fselectorrcpp.
@MarcinKosinski
Yes, the added test is in the penultimate commit. I don't understand, do you want me to provide a few examples here?
from fselectorrcpp.
@krzyslom yeah, so that we dont have to install the package And lanych RStudio to verify if that is Actually working :)
from fselectorrcpp.
I still don't believe this is done : P do I need to really install the package and write my own examples to verify?
from fselectorrcpp.
@krzyslom ping?
from fselectorrcpp.
@zzawadz I'll look into this next weekend.
from fselectorrcpp.
Warning and plain factorization with equal = FALSE
(default)
FSelectorRcpp::information_gain(Sepal.Length ~ ., iris)
#> Warning in .information_gain.data.frame(x = x, y = y, type = type, equal
#> = equal, : Dependent variable is a numeric! It will be converted to
#> factor with simple factor(y). We do not discretize dependent variable
#> in FSelectorRcpp by default! You can choose equal frequency binning
#> discretization by setting equal argument to TRUE.
#>
#> attributes importance
#> 1 Sepal.Width 0.0000000
#> 2 Petal.Length 0.5237759
#> 3 Petal.Width 0.4908546
#> 4 Species 0.6078468
Same results for equal = TRUE
and FSelector
FSelectorRcpp::information_gain(Sepal.Length ~ ., iris, equal = TRUE)
#> attributes importance
#> 1 Sepal.Width 0.0000000
#> 2 Petal.Length 0.6645210
#> 3 Petal.Width 0.4350477
#> 4 Species 0.4871368
FSelector::information.gain(Sepal.Length ~ ., iris)
#> attr_importance
#> Sepal.Width 0.0000000
#> Petal.Length 0.6645210
#> Petal.Width 0.4350477
#> Species 0.4871368
from fselectorrcpp.
Awsome. Thank you for the effort. Great job
from fselectorrcpp.
Related Issues (20)
- Segfault during covr
- 100% coverage HOT 5
- feature_search output HOT 2
- R^2 example in feature search. HOT 1
- Discretize warning HOT 1
- Enable FSelectoRcpp dealing with NAs in explanatory variables as in the RWeka::Discretize HOT 1
- cutOff_k description HOT 1
- Bug when list interface used inside function HOT 1
- Get back to CRAN HOT 4
- FSelector:::information.gain.body using FSelectorRcpp:::information_gain - become a part of the FSelector HOT 1
- Typos in the Movitation vignette HOT 2
- Add `integer2numeric` to `information_gain` and `discretize` functions. HOT 1
- Compare similarities and differences with FSelector and FSelecotrRcpp in a vignette
- Filter request: Relief HOT 24
- Installation bug: "Error in if (nzchar(SHLIB_LIBADD)) SHLIB_LIBADD else character(): argument is of length zero" HOT 5
- RWeka::Discretize works and produces correct results while FSelectorRcpp::discretize issues many "Cannot find any split points for `Col_XXX`. Drops this column." warnings and incorrect results HOT 1
- Segfault in information_gain HOT 10
- solved
- Information gain equation in the documentation. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fselectorrcpp.