cdipaolo / goml Goto Github PK
View Code? Open in Web Editor NEWOn-line Machine Learning in Go (and so much more)
License: MIT License
On-line Machine Learning in Go (and so much more)
License: MIT License
Are you planning on implementing the silhouette method for validation the clustering results. Is this a wanted feature? I could implement if you like
It would be appreciated if there was a global flag to disable logging to standard out. When creating models, it's not always wanted to fill screen space with the model output.
Alternatively it'd be nice if instead of default printing, you could call a method that would give you the variables, like
model.OptimzationMethod() -> "Batch Gradient Ascent
model.TrainingExamples() -> 4000
or return a struct with all the information, so that callers can decide what and where they want to print that information.
Hi this looks like a great library but I can't see any activity in the past 3 years. What do people use instead for scikit-learn-style capabilities with golang these days?
This line in kmeans throws an error fmt.Errorf format %v reads arg #2, but call has 1 arg
while running tests.
A simple fix would be to replace the line in question
errors <- fmt.Errorf("ERROR: point.X must have the same dimensions as clusters (len %v). Point: %v", point)
with this
errors <- fmt.Errorf("ERROR: point.X must have the same dimensions as clusters (len %v). Point: %v", centroids, point)
Follow up question, is this project in active development?
TFIDF doesn't work unless we actually save the DocsSeen value in the Bayes model.
Currently the struct for Word doesn't do this.
type Word struct { Count []uint64 Seen uint64 DocsSeen uint64 json:"-" }
Should be:
type Word struct { Count []uint64 Seen uint64 DocsSeen uint64 }
I understand why the Naive Bayes "Predict" function uses a math.Log() to avoid an underflow. I don't understand why on lines 288 and 293 the operator is += instead of *=... Could you provide an explanation? Maybe an update to the docs?
In theOnlineLearn
method, Word
s are written to the model's counts of words while the Predict
, Probability
, and TFIDF.InverseDocumentFrequency
methods read from that same map. The only indication that training is done is that the errors
channel passed in to OnlineLearn
is closed, at which point it's safe to use the model. Otherwise, a runtime error will occur as a result of the concurrent map reads and writes.
Hello,
I have an experimental high performance XGBoost (tree_method=exact only) implementation here:
https://github.com/Statfactory/cortado (python + llvm)
https://github.com/Statfactory/cortado-fs (F#)
https://github.com/Statfactory/JuML.jl (Julia)
I could port it to golang with some help if there is interest:)
Adam
When trying to cast the NaiveBayes model to the TFIDF model, I get a go-vet warning saying "TFIDF copies lock value".
There was another related issue with a fix that allowed access to the concurrent map, however I can't find a way to cast one model to the other without this issue.
I get the same issue when running tfidf_test.go
It would be very useful to compare performance (run time, memory used) with other commonly used machine learning libraries/frameworks. like Weka and Apache Mahout....
I don't know that much at the moment about ML so pardon me if this is ignorant. Is there a reason that the number of classes for text classification is limited to 255 via uint8? Would it be possible to increase this?
How does goml compare to some of the other Go libraries in terms of product vision / roadmap?
There's a decent amount of overlap in terms of the implemented algorithms / models. Is your goal to eventually include all of the other types (neural networks, collaborative filtering, etc)? It seems like the stated goal of being more stream oriented than batch oriented differentiates this library too.
At the end of the day, this seems like the most active repo with an exciting direction. I'm very curious to know where you plan on taking things.
I'd like to learn more about machine learning and this library looks like a good place to start building something with. Are there any examples you could post to demonstrate some simple use cases?
ARIMA could be used for forecast.
In the Predict
function in knn.go
you "initialize" the neighbors
array with random elements from k.trainingSet
and then use insertSorted
to insert new data into the neighbors
array.
This is a problem because insertSorted
requires that the array you are inserting into be sorted; it uses binary search. The random data you initialize the neighbors
vector with may not sorted.
A possible fix is to get rid of the rand
package altogether, initialize the neighbors
vector with the first k.K
elements from k.trainingSet
, and sort neighbors
before calculating the nearest neighbors.
I can submit a pull request if you like.
Hello!
Great library. I noticed during tests that the code decides to just fmt.Printf
. I don't want the ML lib in my app to be outputting to the console without me knowing. Can we disable that? Or provide a way to provide an alternate io.Writer
?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.