eloquentarduino / micromlgen Goto Github PK

This project forked from agrimagsrl/micromlgen

Generate C code for microcontrollers from Python's sklearn classifiers

License: MIT License

Python 97.79% Shell 2.21%

micromlgen's Introduction

Introducing MicroML

MicroML is an attempt to bring Machine Learning algorithms to microcontrollers. Please refer to this blog post to an introduction to the topic.

This repository is archived because it does what it was meant to do: generate C++ code for the supported models. I'm focusing on a more comprehensive library (https://github.com/eloquentarduino/tinyml4all-python/), so this will not receive updates.

Install

pip install micromlgen

Supported classifiers

micromlgen can port to plain C many types of classifiers:

DecisionTree
RandomForest
XGBoost
GaussianNB
Support Vector Machines (SVC and OneClassSVM)
Relevant Vector Machines (from skbayes.rvm_ard_models package)
SEFR
PCA

from micromlgen import port
from sklearn.svm import SVC
from sklearn.datasets import load_iris


if __name__ == '__main__':
    iris = load_iris()
    X = iris.data
    y = iris.target
    clf = SVC(kernel='linear').fit(X, y)
    print(port(clf))

You may pass a classmap to get readable class names in the ported code

from micromlgen import port
from sklearn.svm import SVC
from sklearn.datasets import load_iris


if __name__ == '__main__':
    iris = load_iris()
    X = iris.data
    y = iris.target
    clf = SVC(kernel='linear').fit(X, y)
    print(port(clf, classmap={
        0: 'setosa',
        1: 'virginica',
        2: 'versicolor'
    }))

PCA

It can export a PCA transformer.

from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from micromlgen import port

if __name__ == '__main__':
    X = load_iris().data
    pca = PCA(n_components=2, whiten=False).fit(X)
    
    print(port(pca))

SEFR

Read the post about SEFR.

pip install sefr

from sefr import SEFR
from micromlgen import port


clf = SEFR()
clf.fit(X, y)
print(port(clf))

DecisionTreeRegressor and RandomForestRegressor

pip install micromlgen>=1.1.26

from sklearn.datasets import load_boston
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from micromlgen import port


if __name__ == '__main__':
    X, y = load_boston(return_X_y=True)
    regr = DecisionTreeRegressor(max_depth=10, min_samples_leaf=5).fit(X, y)
    regr = RandomForestRegressor(n_estimators=10, max_depth=10, min_samples_leaf=5).fit(X, y)
    
    with open('RandomForestRegressor.h', 'w') as file:
        file.write(port(regr))

// Arduino sketch
#include "RandomForestRegressor.h"

Eloquent::ML::Port::RandomForestRegressor regressor;
float X[] = {...};


void setup() {
}

void loop() {
    float y_pred = regressor.predict(X);
}

micromlgen's People

Contributors

Stargazers

Watchers

micromlgen's Issues

Switch RVM support to EMRVC

Hello, and congratulations for your work. I've used random forest and SVC on complex datasets and work like a charm, on the other hand I didn't manage to get either RVC(sklearn_bayes) working properly or port_rvm, so my suggestion is to switch RVM support to EMRVC(sklearn_rvm)

sklearn_bayes fails to install, probably because it supports an older version of sklearn. I've managed to install it by modifying the code to support sklearn 0.24 and manually compiling cython source files, but even then it does not work properly and the C++ models I've managed to produce all seem to have 8 relevance vectors. On the flip side sklearn_rvm works really well and it will make your work less sklearn version dependent.

Add support for LinearSVC

ZeroDivisionError: integer division or modulo by zero

Hi,
I'm using XGBClassifier and getting this error:
~/anaconda3/envs/kando/lib/python3.8/site-packages/micromlgen/templates/xgboost/xgboost.jinja in block 'predict'()
4 float votes[{{ n_classes }}] = { 0.0f };
5
----> 6 {% for k, tree in f.enumerate(trees) %}
7 {% with i = 0, class_idx = k % n_classes %}
8 // tree #{{ k + 1 }}

ZeroDivisionError: integer division or modulo by zero
Can you please explain why this is happening?

Consider support for scikit's MLPClassifier

AttributeError: 'DecisionTreeRegressor' object has no attribute 'tree_'

I tried to train a DecisionTreeRegressor and port it to c but i got this error :

Traceback (most recent call last):
  File "C:\Users\---\Desktop\model.py", line 11, in <module>
    out = str(port(model))
  File "C:\Users\---\AppData\Local\Programs\Python\Python38\lib\site-packages\
micromlgen\micromlgen.py", line 50, in port
    return port_decisiontree_regressor(**locals(), **kwargs)
  File "C:\Users\---\AppData\Local\Programs\Python\Python38\lib\site-packages\
micromlgen\decisiontreeregressor.py", line 21, in port_decisiontree_regressor
    'left': clf.tree_.children_left,
AttributeError: 'DecisionTreeRegressor' object has no attribute 'tree_'

This is my code :

from micromlgen import port
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import fetch_openml


if __name__ == '__main__':
    dataset = fetch_openml('housing')
    X = dataset.data
    y = dataset.target
    model = DecisionTreeRegressor(random_state=0)
    out = str(port(model))
    with open('model.h','w') as f :
        f.write(out)

Can we use micromlgen with xgboost itself - without XGBClassifier

micromlgen works well with XGBClassifier. Which is imported from xgboost.

from xgboost import XGBClassifier

But in my program, I am using xgboost without any importa like this

import xgboost as xgb
model = xgb.cv(param,dtrain,num_boost_round=420,nfold=10,stratified=True,verbose_eval=20)
model = xgb.train(param,dtrain,420,[(dtrain,'train')],verbose_eval=20)

I tried using port on model, which gives me error

Traceback (most recent call last):
  File "model_microml_training.py", line 161, in <module>
    predict(sys.argv[1])
  File "model_microml_training.py", line 151, in predict
    print(port(model))
  File "/home/admin/dawid_venv_xgboost_1.1.0/lib/python3.5/site-packages/micromlgen/micromlgen.py", line 45, in port
    raise TypeError('clf MUST be one of %s' % ', '.join(platforms.ALLOWED_CLASSIFIERS))
TypeError: clf MUST be one of SVC, OneClassSVC, RVC, SEFR, DecisionTree, RandomForest, GaussianNB, LogisticRegression, PCA, PrincipalFFT, LinearRegression, XGBClassifier

Decision function computed doesn't match with sklearn.svm.SVC

Hi,

I train my SVC in Python and port it to C++ as follows:

with open(fname, "w") as f: f.write(port(clf))

However, the decision function computed by clf.decision_function(X) (in Python) and the one computed inside Eloquent::ML::Port::SVM::predict (in the generated header file) do not match. The predictions do not match.

Is there something I'm missing?

suggestion for two bugfixes: adding two needed std libs and fixing gauss function in GNB.

Hello, I'm sorry if this isn't the way to do this, I am a bit new to Github.

I found a few bugs, and I fixed them in my own copy of microMLgen. I am still debugging a few others.
it's usefull to include and in the _skeleton.jinja, because e.g. the GNB uses math functions.

second of all, the gauss function is implemented wrong.

This worked, but the gauss is now a negative value, and needs to be added to the priors in votes.jinja

Hopefully I've been of some help. Thank you for this library, it helped a lot so far.

Wrong string for OneClassSVM in ALLOWED_CLASSIFIERS

Having a look on the supported model in platforms.py I realized that OneClassSVC is not a valid scikit-learn estimator. It should be OneClassSVM as correctly checked in is_svm.

Template not found for Random forest

I was trying to export random forest classifier and trying to print the code.
But it is not able to find the template , which I saw is there in the folder "randomforest/randomforest.jinja"

the code I have written

from sklearn.svm import SVC
from micromlgen import port
from glob import glob
from os.path import basename
import numpy as np
from sklearn.ensemble import RandomForestClassifier

def load_features(folder):
    dataset = None
    classmap = {}
    for class_idx, filename in enumerate(glob('%s/*.csv' % folder)):
        class_name = basename(filename)[:-4]
        classmap[class_idx] = class_name
        samples = np.loadtxt(filename, dtype=float, delimiter=',')
        labels = np.ones((len(samples), 1)) * class_idx
        samples = np.hstack((samples, labels))
        dataset = samples if dataset is None else np.vstack((dataset, samples))

    return dataset, classmap
# put your samples in the dataset folder
# one class per file
# one feature vector per line, in CSV format
features, classmap = load_features('data/')
X, y = features[:, :-1], features[:, -1]
classifier = RandomForestClassifier(random_state=0,max_depth=20).fit(X, y)
c_code = port(classifier, classmap=classmap)
print(c_code)

Given output :

Expected Output :

should have printed c code

XGBoost Port Code requires access to Temp Files

As the title says, the XGboost port code uses a temporary file in APPDATA/LOCAL to create a temporary json file.
There is no info about this provided to the user. In fact, tested on 3 systems, the file was not generated because the Jupyter Notebook does not have access to the APPDATA/LOCAL folder, even with admin right or by trusting the notebook, it still cannot create it.

This is the type of error generated:
XGBoostError: [14:36:23] C:\Users\Administrator\workspace\xgboost-win64_release_1.0.0\dmlc-core\src\io\local_filesys.cc:209: Check failed: allow_null: LocalFileSystem::Open "C:\Users\ZW\AppData\Local\Temp\tmp_mu9qwkg": Permission denied

I have checked the xgboost.py file. The original code is:

def port_xgboost(clf, tmp_file=None, **kwargs):
    if tmp_file is None:
        with NamedTemporaryFile('w+', suffix='.json', encoding='utf-8') as tmp:
            clf.save_model(tmp.name)
            tmp.seek(0)
            decoded = json.load(tmp)
    else:
        clf.save_model(tmp_file)

        with open(tmp_file, encoding='utf-8') as file:
            decoded = json.load(file)

    trees = [format_tree(tree) for tree in decoded['learner']['gradient_booster']['model']['trees']]

    return jinja('xgboost/xgboost.jinja', {
        'n_classes': int(decoded['learner']['learner_model_param']['num_class']),
        'trees': trees,
    }, {
        'classname': 'XGBClassifier'
    }, **kwargs)

SOLUTION:
By removing the None from:
def port_xgboost(clf, tmp_file=None, **kwargs):

The user can then specify the None in their python script if they would prefer (and if it works) a temp file in APPDATA/LOCAL or they can actually specify the directory with the file ending in .json:
print(port(xgb, tmp_file = "C:\\Users\\*username*\\Desktop\\test.json")))

And they can use the code exemplified for the DecisionTree/RandomForest to create a .h file:

with open('XGBoostClassifier.h', 'w') as file:
    file.write(port(xgb, tmp_file = "C:\\Users\\*username*\\Desktop\\test.json"))

Please update the library and add the documentation for the temp file/specified location.

Furthermore, please add all classes in the documentation. So the users know exactly how to use the namespace:
Example given: Eloquent::ML::Port::RandomForestRegressor regressor;

Correct namespace call for other ML types:

Eloquent::ML::Port::SVM name_to_be_used_in_code;
Eloquent::ML::Port::OneClassSVM name_to_be_used_in_code;
Eloquent::ML::Port::SEFR name_to_be_used_in_code;
Eloquent::ML::Port::DecisionTreeClassifier name_to_be_used_in_code;
Eloquent::ML::Port::DecisionTreeRegressor name_to_be_used_in_code;
Eloquent::ML::Port::RandomForestClassifier name_to_be_used_in_code;
Eloquent::ML::Port::GaussianNB name_to_be_used_in_code;
Eloquent::ML::Port::LogisticRegression name_to_be_used_in_code;
Eloquent::ML::Port::PCA name_to_be_used_in_code;
Eloquent::ML::Port::PrincipalFFT name_to_be_used_in_code;
Eloquent::ML::Port::LinearRegression name_to_be_used_in_code;
Eloquent::ML::Port::XGBClassifier name_to_be_used_in_code;

Thank you and take care!

'division or modulo by zero' in micromlgen for xgboost.

File "train_XGDBoost.py", line 415, in exportModel
cppCode = port(clf, classmap=classmap)
File "/home/andreas/.local/lib/python3.6/site-packages/micromlgen/micromlgen.py", line 44, in port
return port_xgboost(**locals(), **kwargs)
File "/home/andreas/.local/lib/python3.6/site-packages/micromlgen/xgboost.py", line 41, in port_xgboost
}, **kwargs)
File "/home/andreas/.local/lib/python3.6/site-packages/micromlgen/utils.py", line 79, in jinja
code = template.render(data)
File "/home/andreas/.local/lib/python3.6/site-packages/jinja2/environment.py", line 1090, in render
self.environment.handle_exception()
File "/home/andreas/.local/lib/python3.6/site-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/home/andreas/.local/lib/python3.6/site-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "/home/andreas/.local/lib/python3.6/site-packages/micromlgen/templates/xgboost/xgboost.jinja", line 1, in top-level template code
{% extends '_skeleton.jinja' %}
File "/home/andreas/.local/lib/python3.6/site-packages/micromlgen/templates/_skeleton.jinja", line 14, in top-level template code
{% block predict %}{% endblock %}
File "/home/andreas/.local/lib/python3.6/site-packages/micromlgen/templates/xgboost/xgboost.jinja", line 6, in block "predict"
{% for k, tree in f.enumerate(trees) %}
ZeroDivisionError: integer division or modulo by zero

versions:

andreas@krwork:/media/sourcen/ml_tools$ pip3 install --upgrade micromlgen
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: micromlgen in /home/andreas/.local/lib/python3.6/site-packages (1.1.20)
Requirement already satisfied: jinja2 in /home/andreas/.local/lib/python3.6/site-packages (from micromlgen) (2.11.2)
Requirement already satisfied: MarkupSafe>=0.23 in /home/andreas/.local/lib/python3.6/site-packages (from jinja2->micromlgen) (1.1.1)

andreas@krwork:/media/sourcen/ml_tools$ pip3 install --upgrade xgboost
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: xgboost in /home/andreas/.local/lib/python3.6/site-packages (1.3.1)
Requirement already satisfied: scipy in /home/andreas/.local/lib/python3.6/site-packages (from xgboost) (1.5.4)
Requirement already satisfied: numpy in /home/andreas/.local/lib/python3.6/site-packages (from xgboost) (1.19.4)

Are you forget Intercept term?

micromlgen/micromlgen/templates/logisticregression/vote.arduino.jinja

Line 2 in 3bf7d57

 votes[{{ i }}] = dot(x, {% for j, wj in f.enumerate(w) %} {% if j > 0 %},{% endif %} {{ f.round(wj) }} {% endfor %}); 

SVC generted code issue

Im having trouble to get code generated based on SVC to work.

attahced example work nice in python but not on mcu.. on mcu it always give 0.

Attached files are jupyter notebook as well as data source.
Cupcakes vs Muffins.xls
testofcodegenerator.zip

Add support for public votes on vote-based classifiers

One Class SVM just gives the wrong output

Hello,
I train the One Class SVC in Python and port it to C++ as follows:
import micromlgen
from micromlgen import port
classmap = {
-1: 'Fault',
1: 'No fault'
}
c_code_OCSVM= port(one_class_svm)
modelfile= open('OCSVM.h', 'w')
modelfile.write(c_code_OCSVM)
modelfile.close()

I import the OCSVM as follows to the Arduino IDE, (Board: ESP32)
#include "OCSVM.h"

Eloquent::ML::Port::OneClassSVM clf;
void setup() {
Serial.begin(115200);
delay(2000);

float data [2] ={0.45,1.0};
Serial.println(clf.predict(data));
}

The Problem is, that "clf.predict(data)" gives another outout as "prediction = one_class_svm.predict(r)" in python.
Is soemthing missing in my Code?