juliastats / mlbase.jl Goto Github PK
View Code? Open in Web Editor NEWA set of functions to support the development of machine learning algorithms
License: MIT License
A set of functions to support the development of machine learning algorithms
License: MIT License
@lindahua, I would recommend adding the following features to Performance evaluation package:
High-level:
pROC
package from R is a good reference point)ROCR
package from R are a good reference; the minimal required stats in my opinion would be rate of positive predictions, lift and cost (based on cost of fp an fn)ROCNum
not by threshold but by % of sample size (this is currently not available as it requires interpolation when there are several observations with exactly the same score and the % of sample is in the middle of their range; consider a vector of scores [1, 1, 1, 1, 2, 2, 2, 2]
and asking about ROC at 10% of the sample - this problem is very typical for decision trees)Low-level:
ROCNums
), The functions operating on ROCNums
(like false_negative
etc.) could have methods that also work on Vector{ROCNums}
ROCNums
also threshold at which it was calculated (along with information about ord
used)I would recommend to discuss what you think that matches your idea of this package. I am also open to implement some of those additions.
Please, consider an abstract type for normalizer (standardizer), so additional implementations ( MinMax, MAD, etc.) would be properly subtyped.
abstract Standardize
type ZScore <: Standardize
dim::Int
mean::Vector{Float64}
scale::Vector{Float64}
end
type MinMax{T} <: Standardize
dim::Int
min::T
max::T
end
I need to have a working NMF package which depends on MLBase which fails on loading in a manner I do not understand. On line 217 in MLBase/src/perfeval.jl there is type assertion together with a tuple in the argument list which does not work in Julia v0.4 but works in the v0.3. Why is this so? If the "preds::(PV,SV)" is changed to "preds" the error is moved elsewhere indicating it is the "type tuple" which is the problem. Why? How should this be fixed? One cannot remove them all as there are other functions dependent on them.
Regards Johan
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
An update might be in order. Tests are passing for 0.7, but the package fails to precompile in 1.0.
Could someone go ahead and tag a new version? travis keeps failing on me on 0.4 because I have this package in require. As far as I can test locally, it seems that the issue is already fixed in master
julia> (c, v, inds) = cross_validate(
inds -> compute_center(data[:, inds]), # training function
(c, inds) -> compute_rmse(c, data[:, inds]), # evaluation function
n, # total number of samples
Kfold(n, 5), # cross validation plan: 5-fold
Reverse) # smaller score indicates better model
ERROR: `cross_validate` has no method matching cross_validate(::Function, ::Function, ::Int64, ::Kfold, ::ReverseOrdering{ForwardOrdering})
Looks like the 5 argument version was removed at some point.
julia> using MLBase
Warning: Method definition takeTuple{Any,Int64} in module Base at iterator.jl:132 overwritten in module Iterators at /users/stadius/vjumutc/.julia/v0.4/Iterators/src/Iterators.jl:50.
WARNING: deprecated syntax "(T=>Int)[]" at /users/stadius/vjumutc/.julia/v0.4/MLBase/src/classification.jl:132.
Use "Dict{T,Int}()" instead.
ERROR: LoadError: LoadError: ArgumentError: invalid type for argument preds in method definition for roc at /users/stadius/vjumutc/.julia/v0.4/MLBase/src/perfeval.jl:217
in include at ./boot.jl:250
in include_from_node1 at ./loading.jl:129
in include at ./boot.jl:250
in include_from_node1 at ./loading.jl:129
in reload_path at ./loading.jl:153
in _require at ./loading.jl:68
in require at ./loading.jl:51
while loading /users/stadius/vjumutc/.julia/v0.4/MLBase/src/perfeval.jl, in expression starting on line 217
while loading /users/stadius/vjumutc/.julia/v0.4/MLBase/src/MLBase.jl, in expression starting on line 76
Reason might be in JuliaLang/julia@ef06211
Amonst other reasons, it would be useful for the sake of reproducibility if we could control the randomness in Kfold
's behaviour. For instance:
Kfold(10,2) # call first time
Kfold([10, 2, 7, 5, 8, 1, 9, 6, 4, 3], 2, 5.0)
Kfold(10,2) # when called again
Kfold([2, 6, 10, 8, 5, 7, 3, 1, 4, 9], 2, 5.0)
Perhaps something along the lines of Kfold(10,2; seed = 123)
would be nice?
Or perhaps there are reasons that speak against controlling the randomness?
Many thanks for this wonderful package.
Hi, I don't think roc function is working properly. Here's an example.
gt = [1,2,1,2,1,2,1,2,1,2]
pr = [1,1,1,1,1,1,1,1,1,1]
roc(gt, pr)
This produces
p = 10
n = 0
tp = 5
tn = 0
fp = 0
fn = 0
Above don't look right to me. Or perhaps, I'm using it wrongly in which updated documentation would be useful. The current documentation says,
roc(gt::AbstractArray{T,1} where T<:Integer, pr::AbstractArray{T,1} where T<:Integer) in MLBase at /Users/youngjaewoo/.julia/v0.6/MLBase/src/perfeval.jl:162
It can be implemented via sample
family of functions from StatsBase
. Example implementation with sklearn
-like interface is here. If it's okay I can make a PR; what holds me from it is that I'm a newcomer and may have just missed an already existing and obvious way to do it.
EDIT: also a nice addition would be to support several arrays simultaneously -- I'll work on this if it's accepted to be useful.
Hi all,
I have found that I need to comment
using ArrayViews in MLBase.jl due to clash with Julia's own function in v0.5.
Tomas
I am running julia v0.4.5
under Mac OS X.
crossval.jl
example fails with the following error:
ERROR: LoadError: MethodError: `cross_validate` has no method matching cross_validate(::Function, ::Function, ::Int64, ::MLBase.Kfold, ::Base.Order.ReverseOrdering{Base.Order.ForwardOrdering})
Closest candidates are:
cross_validate(::Function, ::Function, ::Int64, ::Any)
cross_validate(::Function, ::Function, ::Integer, ::Any)
in include at ../julia/lib/julia/sys.dylib
in include_from_node1 at ../julia/lib/julia/sys.dylib
in process_options at ../julia/lib/julia/sys.dylib
in _start at ../julia/lib/julia/sys.dylib
while loading ~/.julia/v0.4/MLBase/examples/crossval.jl, in expression starting on line 32
ReadTheDocs is completely broken
Great work with MLBase, proves very helpful!
Are there any thoughts on implementing stratified sampling in the future?
This package on its own is not all that discoverable, plus a lot of the methodology is also relevant to "classical" statistics, not just to machine learning (e.g. cross validation, classification, etc.). Thoughts?
cc @nalimilan
if the classes in gt
or pred
contains 0
the function confusmat
errors out with the following error.
gr=[0,1,0,1]
pr=[0,0,1,1]
confusmat(2, gr, pr)
ERROR: BoundsError: attempt to access 2×2 Array{Int64,2} at index [0,0]
in confusmat(::Int64, ::Array{Int64,1}, ::Array{Int64,1}) at /Users/abisen/.julia/v0.5/MLBase/src/perfeval.jl:17
where as if the classes does not contain 0 everything works
gr=[1,2,1,2]
pr=[1,1,2,2]
confusmat(2, gr, pr)
2×2 Array{Int64,2}:
1 1
1 1
As per the discussion in JuliaML/MLUtils.jl#2 and specifically @nalimilan's comment, we would like to move this package to the JuliaML org and repurpose it for our needs if that's ok with you people
I'm confused about the use case for cross_validate
. It doesn't seem useful to return the fold corresponding to the maximum or minimum value of evalfun
, since this essentially just tells you which fold is easiest. I think it would make more sense to return an array containing the result of applying evalfun
to the model returned by estfun
for each split. That way the user can do things like average out-out-sample accuracy, compute, variance, etc. Thoughts?
Is there any level of cooperation in MLBase
with MachineLearning.jl
? Perhaps a difference in scope?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.