Giter VIP home page Giter VIP logo

Comments (7)

seme0021 avatar seme0021 commented on August 11, 2024

Hi @geoHeil -

We're currently working on adding documentation for MLeap, including how to add custom transformers.

Until then, I would take a look at the String Indexer. High level, you'll have too add:

  • Your custom model and custom transformer
  • Bundle.ml serialization for the mleap transformer
  • Bundle.ml serialization for the spark transformer
  • Add it to the list of available ops

We'll follow-up when documentation is up.

from mleap.

hollinwilkins avatar hollinwilkins commented on August 11, 2024

@geoHeil We have added documentation for adding a Spark/MLeap transformer to our wiki. Please take a look and let us know if we should add anything else. We have also stubbed out an article that we will fill out soon for writing transformers that operate on custom data types.

https://github.com/combust-ml/mleap/wiki/Adding-an-MLeap-Spark-Transformer
https://github.com/combust-ml/mleap/wiki/Custom-Data-Type-with-Transformer:-UdfTransformer

from mleap.

geoHeil avatar geoHeil commented on August 11, 2024

@hollinwilkins probably the same applies to your recent support of sklearn regarding custom transformers.

I wonder if it is a good to have 2 code bases e.g. real version and mleap version of classifiers. What do you think of the approach of https://github.com/Hydrospheredata/mist who try to use as much as possible of the original code?

from mleap.

hollinwilkins avatar hollinwilkins commented on August 11, 2024

@geoHeil See our full list of support here:
http://mleap-docs.combust.ml/core-concepts/transformers/support.html

Scikit-learn is a new integration, and we are planning to release a stable version of it to PyPi in the next version of MLeap, 0.6.0. This will be a couple of weeks away, but will definitely not take as long as the time between 0.4.0 and 0.5.0.

from mleap.

hollinwilkins avatar hollinwilkins commented on August 11, 2024

@geoHeil In regards to mist, it looks like a cool project. MLeap's goals are not to be exclusive to Spark however, and instead our aims are to provide a common execution engine across many technologies: Spark, PySpark, Scikit-learn, Tensorflow, MLeap, (and xgboost? :) ). MLeap Bundles are meant to be serializable/deserializable to any of these technologies. We also want to support 3rd party integrations, like xgboost.

It also aims to be as lightweight as possible, and the Spark libraries are anything but. I agree it would be much easier to reuse as much code from Spark as possible, but we are growing beyond its capabilities as we add in support for Tensorflow and Scikit-learn.

Another goal is to offer as low-latency results as possible with MLeap, and building on top of Spark streaming will not get us to that goal.

Sorry for the long response, but I am hoping that I can take this and put it into a formal mission statement somewhere.

Cheers,
Hollin

from mleap.

geoHeil avatar geoHeil commented on August 11, 2024

@hollinwilkins thanks for the clarification. Sounds good.

from mleap.

hollinwilkins avatar hollinwilkins commented on August 11, 2024

Hey,

We have added pretty good documentation for adding in custom transformers here:
http://mleap-docs.combust.ml/mleap-runtime/custom-transformer.html

I am going to close this issue for now, but please reopen if there is more we can do to document this process.

Thank you!

from mleap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.