Giter VIP home page Giter VIP logo

mlbase.jl's People

Contributors

aiorla avatar alexmorley avatar ararslan avatar asafmanela avatar bigcrunsh avatar femtocleaner[bot] avatar iainnz avatar jumutc avatar lendle avatar lindahua avatar lsindoni avatar maximsch2 avatar ngiann avatar simonster avatar thomlake avatar tkelman avatar zacsketches avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlbase.jl's Issues

Performance evaluation features

@lindahua, I would recommend adding the following features to Performance evaluation package:

High-level:

  1. Calculation of AUC (exact and approximated at subset of threshold values)
  2. Statistical comparison of difference between 2 ROC curves (pROC package from R is a good reference point)
  3. Bootstraped confitence intervals for calculated statistics
  4. Adding more functions that calculate statistics, those included in ROCR package from R are a good reference; the minimal required stats in my opinion would be rate of positive predictions, lift and cost (based on cost of fp an fn)
  5. Ability to calculate ROCNum not by threshold but by % of sample size (this is currently not available as it requires interpolation when there are several observations with exactly the same score and the % of sample is in the middle of their range; consider a vector of scores [1, 1, 1, 1, 2, 2, 2, 2] and asking about ROC at 10% of the sample - this problem is very typical for decision trees)

Low-level:

  1. Current interface is not very simple when applied for plotting (we get a vector of ROCNums), The functions operating on ROCNums (like false_negative etc.) could have methods that also work on Vector{ROCNums}
  2. It would be useful to store in ROCNums also threshold at which it was calculated (along with information about ord used)

I would recommend to discuss what you think that matches your idea of this package. I am also open to implement some of those additions.

abstract normalizer type

Please, consider an abstract type for normalizer (standardizer), so additional implementations ( MinMax, MAD, etc.) would be properly subtyped.

abstract Standardize

type ZScore <: Standardize
    dim::Int
    mean::Vector{Float64}
    scale::Vector{Float64}
end

type MinMax{T} <: Standardize
    dim::Int    
    min::T
    max::T
end

Failing to load MLBase in Julia v0.4

I need to have a working NMF package which depends on MLBase which fails on loading in a manner I do not understand. On line 217 in MLBase/src/perfeval.jl there is type assertion together with a tuple in the argument list which does not work in Julia v0.4 but works in the v0.3. Why is this so? If the "preds::(PV,SV)" is changed to "preds" the error is moved elsewhere indicating it is the "type tuple" which is the problem. Why? How should this be fixed? One cannot remove them all as there are other functions dependent on them.

Regards Johan

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Port to 0.7/1.0

An update might be in order. Tests are passing for 0.7, but the package fails to precompile in 1.0.

Tag new version

Could someone go ahead and tag a new version? travis keeps failing on me on 0.4 because I have this package in require. As far as I can test locally, it seems that the issue is already fixed in master

Example for cross_validate fails

julia> (c, v, inds) = cross_validate(
        inds -> compute_center(data[:, inds]),          # training function
        (c, inds) -> compute_rmse(c, data[:, inds]),    # evaluation function
        n,              # total number of samples
        Kfold(n, 5),    # cross validation plan: 5-fold
        Reverse)        # smaller score indicates better model
ERROR: `cross_validate` has no method matching cross_validate(::Function, ::Function, ::Int64, ::Kfold, ::ReverseOrdering{ForwardOrdering})

Looks like the 5 argument version was removed at some point.

using MLBase is failing with the latest julia distribution

julia> using MLBase
Warning: Method definition takeTuple{Any,Int64} in module Base at iterator.jl:132 overwritten in module Iterators at /users/stadius/vjumutc/.julia/v0.4/Iterators/src/Iterators.jl:50.

WARNING: deprecated syntax "(T=>Int)[]" at /users/stadius/vjumutc/.julia/v0.4/MLBase/src/classification.jl:132.
Use "Dict{T,Int}()" instead.
ERROR: LoadError: LoadError: ArgumentError: invalid type for argument preds in method definition for roc at /users/stadius/vjumutc/.julia/v0.4/MLBase/src/perfeval.jl:217
 in include at ./boot.jl:250
 in include_from_node1 at ./loading.jl:129
 in include at ./boot.jl:250
 in include_from_node1 at ./loading.jl:129
 in reload_path at ./loading.jl:153
 in _require at ./loading.jl:68
 in require at ./loading.jl:51
while loading /users/stadius/vjumutc/.julia/v0.4/MLBase/src/perfeval.jl, in expression starting on line 217
while loading /users/stadius/vjumutc/.julia/v0.4/MLBase/src/MLBase.jl, in expression starting on line 76

Reason might be in JuliaLang/julia@ef06211

Control randomness of Kfold

Amonst other reasons, it would be useful for the sake of reproducibility if we could control the randomness in Kfold's behaviour. For instance:

Kfold(10,2) # call first time
Kfold([10, 2, 7, 5, 8, 1, 9, 6, 4, 3], 2, 5.0)

Kfold(10,2) # when called again
Kfold([2, 6, 10, 8, 5, 7, 3, 1, 4, 9], 2, 5.0)

Perhaps something along the lines of Kfold(10,2; seed = 123) would be nice?

Or perhaps there are reasons that speak against controlling the randomness?

Many thanks for this wonderful package.

Error in the roc function?

Hi, I don't think roc function is working properly. Here's an example.

gt = [1,2,1,2,1,2,1,2,1,2]
pr = [1,1,1,1,1,1,1,1,1,1]
roc(gt, pr)

This produces

p = 10
n = 0
tp = 5
tn = 0
fp = 0
fn = 0

Above don't look right to me. Or perhaps, I'm using it wrongly in which updated documentation would be useful. The current documentation says,

roc(gt::AbstractArray{T,1} where T<:Integer, pr::AbstractArray{T,1} where T<:Integer) in MLBase at /Users/youngjaewoo/.julia/v0.6/MLBase/src/perfeval.jl:162

Add random train-test splitting

It can be implemented via sample family of functions from StatsBase. Example implementation with sklearn-like interface is here. If it's okay I can make a PR; what holds me from it is that I'm a newcomer and may have just missed an already existing and obvious way to do it.

EDIT: also a nice addition would be to support several arrays simultaneously -- I'll work on this if it's accepted to be useful.

Issues with Julia v0.5

Hi all,
I have found that I need to comment
using ArrayViews in MLBase.jl due to clash with Julia's own function in v0.5.

Tomas

crossval.jl example fails

I am running julia v0.4.5 under Mac OS X.
crossval.jl example fails with the following error:

ERROR: LoadError: MethodError: `cross_validate` has no method matching cross_validate(::Function, ::Function, ::Int64, ::MLBase.Kfold, ::Base.Order.ReverseOrdering{Base.Order.ForwardOrdering})
Closest candidates are:
  cross_validate(::Function, ::Function, ::Int64, ::Any)
  cross_validate(::Function, ::Function, ::Integer, ::Any)
 in include at ../julia/lib/julia/sys.dylib
 in include_from_node1 at ../julia/lib/julia/sys.dylib
 in process_options at ../julia/lib/julia/sys.dylib
 in _start at ../julia/lib/julia/sys.dylib
while loading ~/.julia/v0.4/MLBase/examples/crossval.jl, in expression starting on line 32

Stratified Sampling

Great work with MLBase, proves very helpful!

Are there any thoughts on implementing stratified sampling in the future?

Merge with StatsBase?

This package on its own is not all that discoverable, plus a lot of the methodology is also relevant to "classical" statistics, not just to machine learning (e.g. cross validation, classification, etc.). Thoughts?

cc @nalimilan

confusmat() errors if class contains 0

if the classes in gt or pred contains 0 the function confusmat errors out with the following error.

gr=[0,1,0,1]
pr=[0,0,1,1]
confusmat(2, gr, pr)

ERROR: BoundsError: attempt to access 2×2 Array{Int64,2} at index [0,0]
 in confusmat(::Int64, ::Array{Int64,1}, ::Array{Int64,1}) at /Users/abisen/.julia/v0.5/MLBase/src/perfeval.jl:17

where as if the classes does not contain 0 everything works

gr=[1,2,1,2]
pr=[1,1,2,2]
confusmat(2, gr, pr)

2×2 Array{Int64,2}:
 1  1
 1  1

cross_validate use case?

I'm confused about the use case for cross_validate. It doesn't seem useful to return the fold corresponding to the maximum or minimum value of evalfun, since this essentially just tells you which fold is easiest. I think it would make more sense to return an array containing the result of applying evalfun to the model returned by estfun for each split. That way the user can do things like average out-out-sample accuracy, compute, variance, etc. Thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.