Giter VIP home page Giter VIP logo

sklearn-weka-plugin's Introduction

sklearn-weka-plugin

Makes Weka algorithms available in scikit-learn.

Built on top of the python-weka-wrapper3 library, it uses the jpype library under the hood for communicating with Weka objects in the Java Virtual Machine.

Functionality

The following is currently available:

  • Classifiers (classification/regression)
  • Clusters
  • Filters

Things to be aware of:

  • You need to start the JVM in your Python code before you can use Weka (and stop it again).
  • Unlikely to work in multi-threaded/process environments (like flask).
  • Jupyter Notebooks do not play nice with jpype, as you might have to restart the kernel in order to be able to restart the JVM (e.g., with additional packages).
  • The conversion to Weka data structures involves guesswork, i.e., if targets are to be treated as nominal, you need to convert the numeric values to strings (e.g., using to_nominal_labels and/or to_nominal_attributes functions from sklweka.dataset or the MakeNominal transformer from sklweka.preprocessing).
  • Check the list of known problems before reporting one.

Requirements

The library has the following requirements:

  • Python 3 (does not work with Python 2)

    • python-weka-wrapper (>=0.3.0, required)
  • OpenJDK 8 or later (11 is recommended)

Installation

  • install the python-weka-wrapper3 library in a virtual environment, see instructions here:

    https://fracpete.github.io/python-weka-wrapper3/install.html

  • install the sklearn-weka-plugin library itself in the same virtual environment

    • latest release from PyPI

      ./venv/bin/pip install sklearn-weka-plugin
      
    • from local source

      ./venv/bin/pip install .   
      
    • from Github repository

      ./venv/bin/pip install git+https://github.com/fracpete/sklearn-weka-plugin.git   
      

Examples

Here is a quick example (of which you need to adjust the paths to the datasets, of course):

import sklweka.jvm as jvm
from sklweka.dataset import load_arff, to_nominal_labels
from sklweka.classifiers import WekaEstimator
from sklweka.clusters import WekaCluster
from sklweka.preprocessing import WekaTransformer
from sklearn.model_selection import cross_val_score
from sklweka.datagenerators import DataGenerator, generate_data

# start JVM with Weka package support
jvm.start(packages=True)

# regression
X, y, meta = load_arff("/some/where/bolts.arff", class_index="last")
lr = WekaEstimator(classname="weka.classifiers.functions.LinearRegression")
scores = cross_val_score(lr, X, y, cv=10, scoring='neg_root_mean_squared_error')
print("Cross-validating LR on bolts (negRMSE)\n", scores)

# classification
X, y, meta = load_arff("/some/where/iris.arff", class_index="last")
y = to_nominal_labels(y)
j48 = WekaEstimator(classname="weka.classifiers.trees.J48", options=["-M", "3"])
j48.fit(X, y)
scores = j48.predict(X)
probas = j48.predict_proba(X)
print("\nJ48 on iris\nactual label -> predicted label, probabilities")
for i in range(len(y)):
    print(y[i], "->", scores[i], probas[i])

# clustering
X, y, meta = load_arff("/some/where/iris.arff", class_index="last")
cl = WekaCluster(classname="weka.clusterers.SimpleKMeans", options=["-N", "3"])
clusters = cl.fit_predict(X)
print("\nSimpleKMeans on iris\nclass label -> cluster")
for i in range(len(y)):
    print(y[i], "->", clusters[i])

# preprocessing
X, y, meta = load_arff("/some/where/bolts.arff", class_index="last")
tr = WekaTransformer(classname="weka.filters.unsupervised.attribute.Standardize", options=["-unset-class-temporarily"])
X_new, y_new = tr.fit(X, y).transform(X, y)
print("\nStandardize filter")
print("\ntransformed X:\n", X_new)
print("\ntransformed y:\n", y_new)

# generate data
gen = DataGenerator(
    classname="weka.datagenerators.classifiers.classification.BayesNet",
    options=["-S", "2", "-n", "10", "-C", "10"])
X, y, X_names, y_name = generate_data(gen, att_names=True)
print("X:", X_names)
print(X)
print("y:", y_name)
print(y)

# stop JVM
jvm.stop()

See the example repository for more examples:

https://github.com/fracpete/sklearn-weka-plugin-examples

Direct links:

Documentation

You can find the project documentation here:

https://fracpete.github.io/sklearn-weka-plugin/

sklearn-weka-plugin's People

Contributors

dev-adri avatar fracpete avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

sylwekczmil

sklearn-weka-plugin's Issues

sklweka import error

Hello there
I have installed PyPI on my environment with three different methods that have been presented in the Installation guideline, and I receive "Requirement already satisfied" message when I try to reinstall it.
I have already created an environment for WEKA and I can use it; however, sklweka function does not work and I can't find the issue.

I would much appreciate it if you can help me to solve this problem.

Calibration don't works

I tried:
calibrated = CalibratedClassifierCV(model, method='sigmoid', cv=5)
calibrated.fit(train, ytrain)

where model is a WekaEstimator
for example:
WekaEstimator(classname="weka.classifiers.functions.Logistic",options=["-R","1.0E-8","-M","-1","-num-decimal-places","4"])

I get this error:
TypeError: predict_proba() got an unexpected keyword argument 'X'

roughset classifier problem

I have problem with roughset weka classifier:

from sklearn.calibration import CalibratedClassifierCV
from sklweka.classifiers import WekaEstimator
from sklweka.dataset import load_arff
jvm.start(packages=True)
X, y, meta = load_arff("C:\Program Files\Weka-3-9-6\data\iris.arff", class_index="last")
rough = WekaEstimator(classname="weka.classifiers.rules.RoughSet",
options=["-D","5","-R","0","-I","0","-X","3"])
roughfit=rough.fit(X,y)
roughfit.predict(X)
jvm.stop()

I get:

INFO:weka.core.jvm:JVM already running, call jvm.stop() first
Exception in thread "Thread-0" java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 3
java.base/jdk.internal.util.Preconditions.outOfBounds(Unknown Source)
java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Unknown Source)
java.base/jdk.internal.util.Preconditions.checkIndex(Unknown Source)
java.base/java.util.Objects.checkIndex(Unknown Source)
java.base/java.util.ArrayList.get(Unknown Source)
weka.core.Attribute.value(Attribute.java:778)
weka.core.AbstractInstance.stringValue(AbstractInstance.java:668)
weka.core.AbstractInstance.stringValue(AbstractInstance.java:644)
rseslib.structure.data.formats.ArffDoubleDataInput.convertToDoubleData(ArffDoubleDataInput.java:219)
weka.classifiers.AbstractRseslibClassifierWrapper.classifyInstance(AbstractRseslibClassifierWrapper.java:224)
at java.base/jdk.internal.util.Preconditions.outOfBounds(Unknown Source)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Unknown Source)
at java.base/jdk.internal.util.Preconditions.checkIndex(Unknown Source)
at java.base/java.util.Objects.checkIndex(Unknown Source)
at java.base/java.util.ArrayList.get(Unknown Source)
at weka.core.Attribute.value(Attribute.java:778)
at weka.core.AbstractInstance.stringValue(AbstractInstance.java:668)
at weka.core.AbstractInstance.stringValue(AbstractInstance.java:644)
at rseslib.structure.data.formats.ArffDoubleDataInput.convertToDoubleData(ArffDoubleDataInput.java:219)
at weka.classifiers.AbstractRseslibClassifierWrapper.classifyInstance(AbstractRseslibClassifierWrapper.java:224)
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3460, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 9, in
roughfit.predict(X)
File "C:\ProgramData\Anaconda3\lib\site-packages\sklweka\classifiers.py", line 135, in predict
result.append(self.header_.class_attribute.value(int(self._classifier.classify_instance(inst))))
File "C:\Users\Massimo\AppData\Roaming\Python\Python310\site-packages\weka\classifiers.py", line 128, in classify_instance
return self._mc_classify(inst.jobject)
File "C:\Users\Massimo\AppData\Roaming\Python\Python310\site-packages\javabridge\jutil.py", line 859, in fn
raise JavaException(x)
javabridge.jutil.JavaException: Index -1 out of bounds for length 3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.