juliastats / mlbase.jl Goto Github PK

View Code? Open in Web Editor NEW

186.0 186.0 66.0 167 KB

A set of functions to support the development of machine learning algorithms

License: MIT License

Julia 100.00%

mlbase.jl's People

Contributors

Stargazers

Watchers

mlbase.jl's Issues

Performance evaluation features

@lindahua, I would recommend adding the following features to Performance evaluation package:

High-level:

Calculation of AUC (exact and approximated at subset of threshold values)
Statistical comparison of difference between 2 ROC curves (pROC package from R is a good reference point)
Bootstraped confitence intervals for calculated statistics
Adding more functions that calculate statistics, those included in ROCR package from R are a good reference; the minimal required stats in my opinion would be rate of positive predictions, lift and cost (based on cost of fp an fn)
Ability to calculate ROCNum not by threshold but by % of sample size (this is currently not available as it requires interpolation when there are several observations with exactly the same score and the % of sample is in the middle of their range; consider a vector of scores [1, 1, 1, 1, 2, 2, 2, 2] and asking about ROC at 10% of the sample - this problem is very typical for decision trees)

Low-level:

Current interface is not very simple when applied for plotting (we get a vector of ROCNums), The functions operating on ROCNums (like false_negative etc.) could have methods that also work on Vector{ROCNums}
It would be useful to store in ROCNums also threshold at which it was calculated (along with information about ord used)

I would recommend to discuss what you think that matches your idea of this package. I am also open to implement some of those additions.

abstract normalizer type

Please, consider an abstract type for normalizer (standardizer), so additional implementations ( MinMax, MAD, etc.) would be properly subtyped.

abstract Standardize

type ZScore <: Standardize
    dim::Int
    mean::Vector{Float64}
    scale::Vector{Float64}
end

type MinMax{T} <: Standardize
    dim::Int    
    min::T
    max::T
end

Can a new version be tagged ? It has been a while.

Failing to load MLBase in Julia v0.4

I need to have a working NMF package which depends on MLBase which fails on loading in a manner I do not understand. On line 217 in MLBase/src/perfeval.jl there is type assertion together with a tuple in the argument list which does not work in Julia v0.4 but works in the v0.3. Why is this so? If the "preds::(PV,SV)" is changed to "preds" the error is moved elsewhere indicating it is the "type tuple" which is the problem. Why? How should this be fixed? One cannot remove them all as there are other functions dependent on them.

Regards Johan

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Port to 0.7/1.0

An update might be in order. Tests are passing for 0.7, but the package fails to precompile in 1.0.

Tag new version

Could someone go ahead and tag a new version? travis keeps failing on me on 0.4 because I have this package in require. As far as I can test locally, it seems that the issue is already fixed in master

Example for cross_validate fails

julia> (c, v, inds) = cross_validate(
        inds -> compute_center(data[:, inds]),          # training function
        (c, inds) -> compute_rmse(c, data[:, inds]),    # evaluation function
        n,              # total number of samples
        Kfold(n, 5),    # cross validation plan: 5-fold
        Reverse)        # smaller score indicates better model
ERROR: `cross_validate` has no method matching cross_validate(::Function, ::Function, ::Int64, ::Kfold, ::ReverseOrdering{ForwardOrdering})

Looks like the 5 argument version was removed at some point.

using MLBase is failing with the latest julia distribution

julia> using MLBase
Warning: Method definition takeTuple{Any,Int64} in module Base at iterator.jl:132 overwritten in module Iterators at /users/stadius/vjumutc/.julia/v0.4/Iterators/src/Iterators.jl:50.

WARNING: deprecated syntax "(T=>Int)[]" at /users/stadius/vjumutc/.julia/v0.4/MLBase/src/classification.jl:132.
Use "Dict{T,Int}()" instead.
ERROR: LoadError: LoadError: ArgumentError: invalid type for argument preds in method definition for roc at /users/stadius/vjumutc/.julia/v0.4/MLBase/src/perfeval.jl:217
 in include at ./boot.jl:250
 in include_from_node1 at ./loading.jl:129
 in include at ./boot.jl:250
 in include_from_node1 at ./loading.jl:129
 in reload_path at ./loading.jl:153
 in _require at ./loading.jl:68
 in require at ./loading.jl:51
while loading /users/stadius/vjumutc/.julia/v0.4/MLBase/src/perfeval.jl, in expression starting on line 217
while loading /users/stadius/vjumutc/.julia/v0.4/MLBase/src/MLBase.jl, in expression starting on line 76

Reason might be in JuliaLang/julia@ef06211

Control randomness of Kfold

Amonst other reasons, it would be useful for the sake of reproducibility if we could control the randomness in Kfold's behaviour. For instance:

Kfold(10,2) # call first time
Kfold([10, 2, 7, 5, 8, 1, 9, 6, 4, 3], 2, 5.0)

Kfold(10,2) # when called again
Kfold([2, 6, 10, 8, 5, 7, 3, 1, 4, 9], 2, 5.0)

Perhaps something along the lines of Kfold(10,2; seed = 123) would be nice?

Or perhaps there are reasons that speak against controlling the randomness?

Many thanks for this wonderful package.

Error in the roc function?

Hi, I don't think roc function is working properly. Here's an example.

gt = [1,2,1,2,1,2,1,2,1,2]
pr = [1,1,1,1,1,1,1,1,1,1]
roc(gt, pr)

This produces

p = 10
n = 0
tp = 5
tn = 0
fp = 0
fn = 0

Above don't look right to me. Or perhaps, I'm using it wrongly in which updated documentation would be useful. The current documentation says,

roc(gt::AbstractArray{T,1} where T<:Integer, pr::AbstractArray{T,1} where T<:Integer) in MLBase at /Users/youngjaewoo/.julia/v0.6/MLBase/src/perfeval.jl:162

Add random train-test splitting

It can be implemented via sample family of functions from StatsBase. Example implementation with sklearn-like interface is here. If it's okay I can make a PR; what holds me from it is that I'm a newcomer and may have just missed an already existing and obvious way to do it.

EDIT: also a nice addition would be to support several arrays simultaneously -- I'll work on this if it's accepted to be useful.

Issues with Julia v0.5

Hi all,
I have found that I need to comment
using ArrayViews in MLBase.jl due to clash with Julia's own function in v0.5.

Tomas

crossval.jl example fails

I am running julia v0.4.5 under Mac OS X.
crossval.jl example fails with the following error:

ERROR: LoadError: MethodError: `cross_validate` has no method matching cross_validate(::Function, ::Function, ::Int64, ::MLBase.Kfold, ::Base.Order.ReverseOrdering{Base.Order.ForwardOrdering})
Closest candidates are:
  cross_validate(::Function, ::Function, ::Int64, ::Any)
  cross_validate(::Function, ::Function, ::Integer, ::Any)
 in include at ../julia/lib/julia/sys.dylib
 in include_from_node1 at ../julia/lib/julia/sys.dylib
 in process_options at ../julia/lib/julia/sys.dylib
 in _start at ../julia/lib/julia/sys.dylib
while loading ~/.julia/v0.4/MLBase/examples/crossval.jl, in expression starting on line 32

Documentation site is broken

ReadTheDocs is completely broken

Tag a new version to avoid deprecation warnings on 0.4?

Stratified Sampling

Great work with MLBase, proves very helpful!

Are there any thoughts on implementing stratified sampling in the future?

Merge with StatsBase?

This package on its own is not all that discoverable, plus a lot of the methodology is also relevant to "classical" statistics, not just to machine learning (e.g. cross validation, classification, etc.). Thoughts?

cc @nalimilan

confusmat() errors if class contains 0

if the classes in gt or pred contains 0 the function confusmat errors out with the following error.

gr=[0,1,0,1]
pr=[0,0,1,1]
confusmat(2, gr, pr)

ERROR: BoundsError: attempt to access 2×2 Array{Int64,2} at index [0,0]
 in confusmat(::Int64, ::Array{Int64,1}, ::Array{Int64,1}) at /Users/abisen/.julia/v0.5/MLBase/src/perfeval.jl:17

where as if the classes does not contain 0 everything works

gr=[1,2,1,2]
pr=[1,1,2,2]
confusmat(2, gr, pr)

2×2 Array{Int64,2}:
 1  1
 1  1

Move package to JuliaML

As per the discussion in JuliaML/MLUtils.jl#2 and specifically @nalimilan's comment, we would like to move this package to the JuliaML org and repurpose it for our needs if that's ok with you people

Package compatibility caps

Ref: https://discourse.julialang.org/t/package-compatibility-caps/15301

cross_validate use case?

I'm confused about the use case for cross_validate. It doesn't seem useful to return the fold corresponding to the maximum or minimum value of evalfun, since this essentially just tells you which fold is easiest. I think it would make more sense to return an array containing the result of applying evalfun to the model returned by estfun for each split. That way the user can do things like average out-out-sample accuracy, compute, variance, etc. Thoughts?

Overlap with MachineLearning.jl?

Is there any level of cooperation in MLBase with MachineLearning.jl? Perhaps a difference in scope?

juliastats / mlbase.jl Goto Github PK

mlbase.jl's People

Contributors

Stargazers

Watchers

Forkers

mlbase.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org