Comments (7)
Hi @geoHeil -
We're currently working on adding documentation for MLeap, including how to add custom transformers.
Until then, I would take a look at the String Indexer. High level, you'll have too add:
- Your custom model and custom transformer
- Bundle.ml serialization for the mleap transformer
- Bundle.ml serialization for the spark transformer
- Add it to the list of available ops
We'll follow-up when documentation is up.
from mleap.
@geoHeil We have added documentation for adding a Spark/MLeap transformer to our wiki. Please take a look and let us know if we should add anything else. We have also stubbed out an article that we will fill out soon for writing transformers that operate on custom data types.
https://github.com/combust-ml/mleap/wiki/Adding-an-MLeap-Spark-Transformer
https://github.com/combust-ml/mleap/wiki/Custom-Data-Type-with-Transformer:-UdfTransformer
from mleap.
@hollinwilkins probably the same applies to your recent support of sklearn regarding custom transformers.
I wonder if it is a good to have 2 code bases e.g. real version and mleap version of classifiers. What do you think of the approach of https://github.com/Hydrospheredata/mist who try to use as much as possible of the original code?
from mleap.
@geoHeil See our full list of support here:
http://mleap-docs.combust.ml/core-concepts/transformers/support.html
Scikit-learn is a new integration, and we are planning to release a stable version of it to PyPi in the next version of MLeap, 0.6.0. This will be a couple of weeks away, but will definitely not take as long as the time between 0.4.0 and 0.5.0.
from mleap.
@geoHeil In regards to mist, it looks like a cool project. MLeap's goals are not to be exclusive to Spark however, and instead our aims are to provide a common execution engine across many technologies: Spark, PySpark, Scikit-learn, Tensorflow, MLeap, (and xgboost? :) ). MLeap Bundles are meant to be serializable/deserializable to any of these technologies. We also want to support 3rd party integrations, like xgboost.
It also aims to be as lightweight as possible, and the Spark libraries are anything but. I agree it would be much easier to reuse as much code from Spark as possible, but we are growing beyond its capabilities as we add in support for Tensorflow and Scikit-learn.
Another goal is to offer as low-latency results as possible with MLeap, and building on top of Spark streaming will not get us to that goal.
Sorry for the long response, but I am hoping that I can take this and put it into a formal mission statement somewhere.
Cheers,
Hollin
from mleap.
@hollinwilkins thanks for the clarification. Sounds good.
from mleap.
Hey,
We have added pretty good documentation for adding in custom transformers here:
http://mleap-docs.combust.ml/mleap-runtime/custom-transformer.html
I am going to close this issue for now, but please reopen if there is more we can do to document this process.
Thank you!
from mleap.
Related Issues (20)
- MLeap Transformer issue. HOT 1
- MathBinary input validation
- Mleap and python 3.8 HOT 27
- Exception in thread "Thread-4" java.lang.NoClassDefFoundError: ml/combust/bundle/HasBundleRegistry HOT 3
- Need help installing MLeap and XGBoost on databricks HOT 6
- Using XGBoost with the newest mleap=0.22.0 in Python 3.8 HOT 10
- Error while testing root project HOT 2
- org.apache.spark.sql.mleap.TypeConverters can not convert 2D tensor to Matrix
- Tensor to Proto Bug with SparseTensor: " java.lang.IllegalArgumentException: size of dimensions must equals size of values"
- org.apache.spark.ml.parity.SparkParityBase.spark should be a method
- Please release version 0.22.0 and 0.23.0 of mleap-spring-boot on dockerhub HOT 5
- Error building mleap-spring-boot HOT 7
- Mleap Spring Boot Swagger Documentation Seems Incorrect HOT 4
- Update springboot to 2.7.10 and snakeyaml to 2.0
- Reporting a security issue in MLeap HOT 1
- Question about Mleap Spring Boot API HOT 2
- How to use XGBoost PySpark API with MLeap? HOT 3
- Need Help on Contributing towards Mleap for 2.13 HOT 3
- Pyspark DecisionTreeRegressionModel bundle does not include all attributes HOT 7
- Getting Key Not Found Exception while Serializing to a mleap bundle HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mleap.