axelderomblay / mlbox Goto Github PK

View Code? Open in Web Editor NEW

1.5K 65.0 276.0 51.21 MB

MLBox is a powerful Automated Machine Learning python library.

Home Page: https://mlbox.readthedocs.io/en/latest/

License: Other

Makefile 0.64% Python 99.36%

machine-learning auto-ml kaggle deep-learning stacking pipeline optimization preprocessing encoding prediction

mlbox's Introduction

MLBox is a powerful Automated Machine Learning python library. It provides the following features:

Fast reading and distributed data preprocessing/cleaning/formatting
Highly robust feature selection and leak detection
Accurate hyper-parameter optimization in high-dimensional space
State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,...)
Prediction with models interpretation

For more details, please refer to the official documentation

How to Contribute

MLBox has been developed and used by many active community members. Your help is very valuable to make it better for everyone.

Check out call for contributions to see what can be improved, or open an issue if you want something.
Contribute to the tests to make it more reliable.
Contribute to the documents to make it clearer for everyone.
Contribute to the examples to share your experience with other users.
Open issue if you met problems during development.

For more details, please refer to CONTRIBUTING.

mlbox's People

Contributors

Stargazers

Watchers

Forkers

chrisbg sahanduiuc sgaied micseb dim25 satya786 stevenlol allensmile royshan hedgefair adolfoeliazat ncherel lohithsubramanya mathematixy alimpolat mustufain ericbdc ashrafoh ashishkej neerajsarwan vdt andreyferriyan etraiger loryculaire kevinbsc manasvikundalia mark-t-conrad mydp2017 87sanchavan saumopal97 rajathjain markjacksonfishing haoybl sunsure ntastevin nifannn maskani-moh zkk995 ghellstern lleonson athenagoras 4n6strider timeahead hanhanwu codeaudit yuanjie-ai cyzhao0709 shenjiawei19 eliotbarril saadmahboob atouil1 anubhav-bhargava jaganatha shubh26 arunnitk selvamshan onisimchukv tusharbihani jane-hnatiuk ofergold jnfarooq 17ai o7s8r6 langtung ekkat hiredd kishoreg15 vlad-bsu tanminhnu xiangbai clustersdata lokeshgithub rth ravi-code-ranjan kjeanclaude eycab eaboelhamd sibanjan autodataplatform puremath86 neelisha-saxena ianmadlenya hujiaogen scofieldyoo spprabhu ck032 ziiin liyingkun1237 brandynm34 jiannan28 kitdongbo cidawkins shubhampachori12110095 xingyezhi primeston lianglin0310 essobi 123saga kaburelabs stenpiren

mlbox's Issues

Classification example notebook ModuleNotFoundError

Hi. Using python 3.6 under anaconda. I am using the classification notebook example under mlbox/examples/classification/example.ipynb. I see this error from the top cell:

Code

from mlbox.preprocessing import *
from mlbox.optimisation import *
from mlbox.prediction import *

Error

/home/jlefman/anaconda2/envs/py36/lib/python3.6/site-packages/mlbox/preprocessing/drift/__init__.py:9: UserWarning: ipCluster is starting. Please wait 30 sec and check in terminal that 'the engines appear to have started successfully'.
  warnings.warn("ipCluster is starting. Please wait 30 sec and check in terminal that 'the engines appear to have started successfully'.")
Using Theano backend.

------------------------------------------------------------------------
ModuleNotFoundError                    Traceback (most recent call last)
<ipython-input-1-4e0c3b5e221c> in <module>()
----> 1 from mlbox.preprocessing import *
      2 from mlbox.optimisation import *
      3 from mlbox.prediction import *

~/anaconda2/envs/py36/lib/python3.6/site-packages/mlbox/__init__.py in <module>()
      7 from .preprocessing import *
      8 from .encoding import *
----> 9 from .optimisation import *
     10 from .prediction import *
     11 from .model import *

~/anaconda2/envs/py36/lib/python3.6/site-packages/mlbox/optimisation/__init__.py in <module>()
----> 1 from .optimiser import *

~/anaconda2/envs/py36/lib/python3.6/site-packages/mlbox/optimisation/optimiser.py in <module>()
     16 from ..encoding.na_encoder import NA_encoder
     17 from ..encoding.categorical_encoder import Categorical_encoder
---> 18 from ..model.supervised.classification.feature_selector import Clf_feature_selector
     19 from ..model.supervised.regression.feature_selector import Reg_feature_selector
     20 from ..model.supervised.classification.stacking_classifier import StackingClassifier

~/anaconda2/envs/py36/lib/python3.6/site-packages/mlbox/model/__init__.py in <module>()
----> 1 import supervised
      2 
      3 __all__ = ['supervised']

ModuleNotFoundError: No module named 'supervised'

Is this an installation issue? User error?

ImportError: No module named 'preprocessing'

Hi team,

This might just be an issue with the init files,

but I get this when importing mlbox.preprocessing

      6 
      7 
----> 8 from preprocessing import *
      9 from encoding import *
     10 from optimisation import *

ImportError: No module named 'preprocessing'

Thanks for helping !

Install fails. Can't tell if it is MLbox or XGboost that doesnt work

Hi Axel,

We're trying to install MLbox and get the following error :

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-ob6fq6jv/xgboost/
Error installing the following packages:
['xgboost==0.6a2']
Please install them manually

Now, given that xgboost is installed and works, I suppose two issues can cause the error :

The version of xgboost is too old. We're running 0.6.
MLbox can't find the available xgboost version

Indeed, it MLbox seems to try and install xgboost 0.6a2 even though a version is already installed, which is surprising.

Maybe it is me.
Thank you for your help.

Regarding Text classification

Hello,
Am getting poor results on classifying texts. Is this expected or should i change any search space ?.

P.S Am using default search space

You have no test dataset !

I already split my dataset into X_train and X_test

why this issue occurs ? please help

Overriding string numerical conversion

Hi,

For a data set I'm using the target and categorical variables are already label encoded to integers. MLBox is incorrectly identifying the categorical values & target as continuous. Thus MLBox is incorrectly converting a classification task into a regression task.

I tried setting the columns as string but still MLBox is converting to integer. Can I override this behavior?

dataset

Target encoding

Hello.

Great package Axel !

Could you add target encoding ?

https://www.kaggle.com/ogrellier/python-target-encoding-for-categorical-features

Thank you !

install fails

Collecting xgboost==0.6a2
Downloading xgboost-0.6a2.tar.gz (1.2MB)
100% |████████████████████████████████| 1.2MB 358kB/s
Complete output from command python setup.py egg_info:
rm -f -rf build build_plugin lib bin *~ /~ //*~ ///~ /.o //.o ///.o xgboost
clang-omp++ -std=c++0x -Wall -O3 -msse2 -Wno-unknown-pragmas -funroll-loops -Iinclude -Idmlc-core/include -Irabit/include -fPIC -fopenmp -MM -MT build/learner.o src/learner.cc >build/learner.d
/bin/sh: clang-omp++: command not found
clang-omp++ -std=c++0x -Wall -O3 -msse2 -Wno-unknown-pragmas -funroll-loops -Iinclude -Idmlc-core/include -Irabit/include -fPIC -fopenmp -MM -MT build/logging.o src/logging.cc >build/logging.d
make: *** [build/learner.o] Error 127
make: *** Waiting for unfinished jobs....
/bin/sh: clang-omp++: command not found
make: *** [build/logging.o] Error 127
-----------------------------
Building multi-thread xgboost failed
Start to build single-thread xgboost
rm -f -rf build build_plugin lib bin *~ /~ //~ ///~ /.o //.o ///*.o xgboost
clang-omp++ -std=c++0x -Wall -O3 -msse2 -Wno-unknown-pragmas -funroll-loops -Iinclude -Idmlc-core/include -Irabit/include -fPIC -fopenmp -MM -MT build/learner.o src/learner.cc >build/learner.d
/bin/sh: clang-omp++: command not found
make: *** [build/learner.o] Error 127
make: *** Waiting for unfinished jobs....
clang-omp++ -std=c++0x -Wall -O3 -msse2 -Wno-unknown-pragmas -funroll-loops -Iinclude -Idmlc-core/include -Irabit/include -fPIC -fopenmp -MM -MT build/logging.o src/logging.cc >build/logging.d
/bin/sh: clang-omp++: command not found
make: *** [build/logging.o] Error 127
Successfully build single-thread xgboost
If you want multi-threaded version
See additional instructions in doc/build.md
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/sr/dbsldvwn0fg1p3wp248kjz0w0000gn/T/pip-build-LkVyg6/xgboost/setup.py", line 29, in
LIB_PATH = libpath'find_lib_path'
File "/private/var/folders/sr/dbsldvwn0fg1p3wp248kjz0w0000gn/T/pip-build-LkVyg6/xgboost/xgboost/libpath.py", line 45, in find_lib_path
'List of candidates:\n' + ('\n'.join(dll_path)))
builtin.XGBoostLibraryNotFound: Cannot find XGBoost Libarary in the candicate path, did you install compilers and run build.sh in root path?
List of candidates:
/private/var/folders/sr/dbsldvwn0fg1p3wp248kjz0w0000gn/T/pip-build-LkVyg6/xgboost/xgboost/libxgboost.so
/private/var/folders/sr/dbsldvwn0fg1p3wp248kjz0w0000gn/T/pip-build-LkVyg6/xgboost/xgboost/../../lib/libxgboost.so
/private/var/folders/sr/dbsldvwn0fg1p3wp248kjz0w0000gn/T/pip-build-LkVyg6/xgboost/xgboost/./lib/libxgboost.so

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/sr/dbsldvwn0fg1p3wp248kjz0w0000gn/T/pip-build-LkVyg6/xgboost/

Error installing the following packages:
['xgboost==0.6a2']
Please install them manually

error: Aborting

Please help solving this problem, thanks

Hub connection request timed out

I tried to run the code in the picture below, but I got the error saying TimeoutError: Hub connection request timed out.
I'm using Python2.7 under Ubuntu 16.04

Thanks for your help

pytest fails with OSError: dlopen: cannot load any more object with static TLS

OS: MacOS 10.14.6
Docker for Mac: 2.1.01
docker base image: continuumio/minconda3:4.3.27
mlbox: 0.8.0
Python: 3.6

pytest unit tests fail with error OSError: dlopen: cannot load any more object with static TLS. This contains full error log mlbox_pytest_failure_static_tls_py36.txt

In researching this issue, found this posting. The dlopen error in this posting was solved by changing the order of imports. From this, I changed the oder of imports in mlbox/model/classification/classifier.py for 'lightgbm'. See this screenshot for the change of import order, i.e., import lightgbm before the sklearn imports.

With this change, the all the unit test cases pass. Here is the pytest run log after changing import order.
mlbox_pytest_run_afer_import_change.txt

I'll submit a PR to address this issue.

For reference, this is the Dockerfile used for this test.

FROM continuumio/miniconda3:4.3.27

# 
# install additional packages
#
RUN pip install mlflow mlbox

WORKDIR /opt/project
ENV MLFLOW_TRACKING_URI /opt/project/tracking

Trying to install mlbox on ubuntu 18.05 but getting errors

Hi, Thanks for creating such an awesome package, i have got lot of good review on 'mlbox' and i always want to try it :)

I'm trying to install mlbox on Ubuntu 18.04.2 LTS & Python 3.7.3 , but getting some errors while installation.

I have done install per-requisite (refer to Installation guide) before install mlbox, below are the details:~

gcc - Done
build-essential: Installed: 12.4ubuntu1 Candidate: 12.4ubuntu1 Version table: *** 12.4ubuntu1 500 500 http://mirror.pregi.net/ubuntu bionic/main amd64 Packages 100 /var/lib/dpkg/status
cmake - Done
cmake: Installed: (none) Candidate: 3.10.2-1ubuntu2 Version table: 3.10.2-1ubuntu2 500 500 http://mirror.pregi.net/ubuntu bionic/main amd64 Packages
xgboost - Done
Name: xgboost Version: 0.82 Summary: XGBoost Python Package Home-page: https://github.com/dmlc/xgboost Author: Hyunsu Cho Author-email: [email protected] License: Apache-2.0 Location: /home/lenovo/anaconda/anaconda3/lib/python3.7/site-packages Requires: scipy, numpy Required-by: auto-sklearn
lightGBM - Done
Name: lightgbm Version: 2.2.3 Summary: LightGBM Python Package Home-page: https://github.com/Microsoft/LightGBM Author: None Author-email: None License: The MIT License (Microsoft) Location: /home/lenovo/anaconda/anaconda3/lib/python3.7/site-packages Requires: scipy, scikit-learn, numpy Required-by:

After install all pre-requisite above, i proceed to pip install mlbox but getting errors (attached is the mlbox installation log file)

Could you assist? Thanks!

RNN and LSTM related question

Hello Mr
Does this source support to generate RNN or CNN-LSTM (LRCN) architectures?

Pinned version of dependencies

MLbox currently uses pinned versions of dependencies,

MLBox/python-package/setup.py

Lines 8 to 20 in 35e0be5

 requirements = [ 

 "numpy==1.13.0", 

 "matplotlib==2.0.2", 

 "hyperopt==0.1", 

 "Keras==2.0.4", 

 "pandas==0.20.3", 

 "joblib==0.11", 

 "scikit-learn==0.19.0", 

 "Theano==0.9.0", 

 "xgboost==0.6a2", 

 "lightgbm==2.0.2", 

 "networkx==1.11" 

 ]

which doesn't play very nicely with other possibly installed packages in the environment. Relaxing at least some of these would be helpful.

All scores with Nan

Hi,

When running with None in optimizer, it works and give scores for roc_auc.
But when specifying a space of parameters, I get this:

NA ENCODER :{'numerical_strategy': {'search': 'choice', 'space': [0, 'mean']}, 'categorical_strategy': ''}

CA ENCODER :{'strategy': {'search': 'choice', 'space': ['label_encoding', 'random_projection']}}

ESTIMATOR :{'num_leaves': 31, 'reg_alpha': 0, 'subsample_for_bin': 50000, 'colsample_bytree': 0.8, 'silent': True, 'learning_rate': 0.05, 'nthread': -1, 'min_child_weight': 5, 'strategy': 'LightGBM', 'n_estimators': 500, 'subsample': 0.9, 'reg_lambda': 0, 'subsample_freq': 1, 'min_child_samples': 10, 'max_bin': 255, 'objective': 'binary', 'min_split_gain': 0, 'seed': 0, 'max_depth': {'search': 'choice', 'space': [5, 6]}, 'boosting_type': 'gbdt'}

MEAN SCORE : accuracy = -inf
VARIANCE : nan (fold 1 = -inf, fold 2 = -inf, fold 3 = -inf, fold 4 = -inf, fold 5 = -inf)
CPU time: 0.561537027359 seconds

The following code was used:

space = {
'ne__numerical_strategy' : {"search":"choice", "space":[0, 'mean']},

    'ce__strategy' : {"search":"choice", "space":["label_encoding", "random_projection"]},

    'est__max_depth' : {"search":"choice", "space":[5,6]}
    }

opt = Optimiser(scoring="accuracy",n_folds=5)
best = opt.evaluate(space, data)

I do not understand why but may you help figure it out?

pipeline cannot be fitted

Hello,

I got this error message when i ran my model, what does this actually mean?

`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

{'fs__threshold': 0.23151363142104533, 'ce__strategy': 'label_encoding', 'est__max_depth': 7, 'ne__categorical_strategy': nan, 'est__n_estimators': 500, 'fs__strategy': 'l1', 'ne__numerical_strategy': 'mean'}

fitting the pipeline ...
Traceback (most recent call last):
File "mlbox_model.py", line 66, in
sys.exit(main())
File "mlbox_model.py", line 62, in main
mlboxfunc()
File "mlbox_model.py", line 39, in mlboxfunc
Predictor().fit_predict(best,df)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/mlbox/prediction/predictor.py", line 389, in fit_predict
raise ValueError("Pipeline cannot be fitted")
ValueError: Pipeline cannot be fitted
`

module mlbox.preprocessing not found error when my mlbox.py in the same folder

When I have mlbox.py file in the same folder,

from mlbox.preprocessing import *
from mlbox.optimisation import *
from mlbox.prediction import *

throws an error saying module mlbox.preprocessing not found. On changing the filename of that python file to something else, it works fine. However, it was a little confusion why the above lines were executing the code from my mlbox.py file.

Trying to install and getting xgboost errors

Systems is Kaggle kernel which is Ubuntu based which seems to be the desired environment

I rung this:

!apt-get install build-essential
!pip install cmake
!pip install xgboost>=0.6a2
!pip install lightgbm>=2.0.2
!pip install mlbox

Resulting in this:

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xib6_1h7/xgboost/

Can you please help me out? I see your examples are also Kaggle based, but they don't have the install steps. Do you somehow install packages from setup within the kernel???

Error starting IPython Cluster

hi @AxeldeRomblay ,

After you informed upon the fix per #16 , I did the update on my end to try it again.

I faced another issue with the complex stack trace below. Could you please advise?

George

`C:\ProgramData\Anaconda3\python.exe "E:/gvyshnya/some_path/mlbox_pipeline.py"
"ipcluster" �� �� ७�� 譥�
��, �ᯮ��塞�� ணࠬ�� 䠩��.
C:\ProgramData\Anaconda3\lib\site-packages\mlbox\preprocessing\drift_init_.py:9: UserWarning: ipCluster is starting. Please wait 30 sec and check in terminal that 'the engines appear to have started successfully'.
warnings.warn("ipCluster is starting. "
Using Theano backend.

00001 #include <Python.h>
00002 #include "theano_mod_helper.h"
00003 #include "structmember.h"
00004 #include <sys/time.h>
00005
00006 #if PY_VERSION_HEX >= 0x03000000
00007 #include "numpy/npy_3kcompat.h"
00008 #define PyCObject_AsVoidPtr NpyCapsule_AsVoidPtr
00009 #define PyCObject_GetDesc NpyCapsule_GetDesc
00010 #define PyCObject_Check NpyCapsule_Check
00011 #endif
00012
00013 #ifndef Py_TYPE
00014 #define Py_TYPE(obj) obj->ob_type
00015 #endif
00016
00017 /**
00018
00019 TODO:
00020 - Check max supported depth of recursion
00021 - CLazyLinker should add context information to errors caught during evaluation. Say what node we were on, add the traceback attached to the node.
00022 - Clear containers of fully-useed intermediate results if allow_gc is 1
00023 - Add timers for profiling
00024 - Add support for profiling space used.
00025
00026
00027 /
00028 static double pytime(const struct timeval * tv)
00029 {
00030 struct timeval t;
00031 if (!tv)
00032 {
00033 tv = &t;
00034 gettimeofday(&t, NULL);
00035 }
00036 return (double) tv->tv_sec + (double) tv->tv_usec / 1000000.0;
00037 }
00038
00039 /*
00040 Helper routine to convert a PyList of integers to a c array of integers.
00041 /
00042 static int unpack_list_of_ssize_t(PyObject * pylist, Py_ssize_t dst, Py_ssize_t len,
00043 const char kwname)
00044 {
00045 Py_ssize_t buflen, buf;
00046 if (!PyList_Check(pylist))
00047 {
00048 PyErr_Format(PyExc_TypeError, "%s must be list", kwname);
00049 return -1;
00050 }
00051 assert (NULL == dst);
00052 len = buflen = PyList_Size(pylist);
00053 dst = buf = (Py_ssize_t)calloc(buflen, sizeof(Py_ssize_t));
00054 assert(buf);
00055 for (int ii = 0; ii < buflen; ++ii)
00056 {
00057 PyObject * el_i = PyList_GetItem(pylist, ii);
00058 Py_ssize_t n_i = PyNumber_AsSsize_t(el_i, PyExc_IndexError);
00059 if (PyErr_Occurred())
00060 {
00061 free(buf);
00062 dst = NULL;
00063 return -1;
00064 }
00065 buf[ii] = n_i;
00066 }
00067 return 0;
00068 }
00069
00070 /
00071
00072 CLazyLinker
00073
00074
00075 /
00076 typedef struct {
00077 PyObject_HEAD
00078 / Type-specific fields go here. /
00079 PyObject * nodes; // the python list of nodes
00080 PyObject * thunks; // python list of thunks
00081 PyObject * pre_call_clear; //list of cells to clear on call.
00082 int allow_gc;
00083 Py_ssize_t n_applies;
00084 int n_vars; // number of variables in the graph
00085 int * var_computed; // 1 or 0 for every variable
00086 PyObject var_computed_cells;
00087 PyObject var_value_cells;
00088 Py_ssize_t dependencies; // list of vars dependencies for GC
00089 Py_ssize_t n_dependencies;
00090
00091 Py_ssize_t n_output_vars;
00092 Py_ssize_t * output_vars; // variables that must be evaluated by call
00093
00094 int * is_lazy; // 1 or 0 for every thunk
00095
00096 Py_ssize_t * var_owner; // nodes[[var_owner[var_idx]]] is var[var_idx]->owner
00097 int * var_has_owner; // 1 or 0
00098
00099 Py_ssize_t * node_n_inputs;
00100 Py_ssize_t * node_n_outputs;
00101 Py_ssize_t node_inputs;
00102 Py_ssize_t node_outputs;
00103 Py_ssize_t * node_inputs_outputs_base; // node_inputs and node_outputs point into this
00104 Py_ssize_t * node_n_prereqs;
00105 Py_ssize_t ** node_prereqs;
00106
00107 Py_ssize_t * update_storage; // input cells to update with the last outputs in output_vars
00108 Py_ssize_t n_updates;
00109
00110 void thunk_cptr_fn;
00111 void thunk_cptr_data;
00112 PyObject * call_times;
00113 PyObject * call_counts;
00114 int do_timing;
00115 int need_update_inputs;
00116 int position_of_error; // -1 for no error, otw the index into `thunks` that failed.
00117 } CLazyLinker;
00118
00119
00120 static void
00121 CLazyLinker_dealloc(PyObject _self)
00122 {
00123 CLazyLinker self = (CLazyLinker ) _self;
00124 free(self->thunk_cptr_fn);
00125 free(self->thunk_cptr_data);
00126
00127 free(self->is_lazy);
00128
00129 free(self->update_storage);
00130
00131 if (self->node_n_prereqs)
00132 {
00133 for (int i = 0; i < self->n_applies; ++i)
00134 {
00135 free(self->node_prereqs[i]);
00136 }
00137 }
00138 free(self->node_n_prereqs);
00139 free(self->node_prereqs);
00140 free(self->node_inputs_outputs_base);
00141 free(self->node_n_inputs);
00142 free(self->node_n_outputs);
00143 free(self->node_inputs);
00144 free(self->node_outputs);
00145
00146 if (self->dependencies)
00147 {
00148 for (int i = 0; i < self->n_vars; ++i)
00149 {
00150 free(self->dependencies[i]);
00151 }
00152 free(self->dependencies);
00153 free(self->n_dependencies);
00154 }
00155
00156 free(self->var_owner);
00157 free(self->var_has_owner);
00158 free(self->var_computed);
00159 if (self->var_computed_cells)
00160 {
00161 for (int i = 0; i < self->n_vars; ++i)
00162 {
00163 Py_DECREF(self->var_computed_cells[i]);
00164 Py_DECREF(self->var_value_cells[i]);
00165 }
00166 }
00167 free(self->var_computed_cells);
00168 free(self->var_value_cells);
00169 free(self->output_vars);
00170
00171 Py_XDECREF(self->nodes);
00172 Py_XDECREF(self->thunks);
00173 Py_XDECREF(self->call_times);
00174 Py_XDECREF(self->call_counts);
00175 Py_XDECREF(self->pre_call_clear);
00176 Py_TYPE(self)->tp_free((PyObject)self);
00177 }
00178 static PyObject *
00179 CLazyLinker_new(PyTypeObject type, PyObject args, PyObject kwds)
00180 {
00181 CLazyLinker self;
00182
00183 self = (CLazyLinker )type->tp_alloc(type, 0);
00184 if (self != NULL) {
00185 self->nodes = NULL;
00186 self->thunks = NULL;
00187 self->pre_call_clear = NULL;
00188
00189 self->allow_gc = 1;
00190 self->n_applies = 0;
00191 self->n_vars = 0;
00192 self->var_computed = NULL;
00193 self->var_computed_cells = NULL;
00194 self->var_value_cells = NULL;
00195 self->dependencies = NULL;
00196 self->n_dependencies = NULL;
00197
00198 self->n_output_vars = 0;
00199 self->output_vars = NULL;
00200
00201 self->is_lazy = NULL;
00202
00203 self->var_owner = NULL;
00204 self->var_has_owner = NULL;
00205
00206 self->node_n_inputs = NULL;
00207 self->node_n_outputs = NULL;
00208 self->node_inputs = NULL;
00209 self->node_outputs = NULL;
00210 self->node_inputs_outputs_base = NULL;
00211 self->node_prereqs = NULL;
00212 self->node_n_prereqs = NULL;
00213
00214 self->update_storage = NULL;
00215 self->n_updates = 0;
00216
00217 self->thunk_cptr_data = NULL;
00218 self->thunk_cptr_fn = NULL;
00219 self->call_times = NULL;
00220 self->call_counts = NULL;
00221 self->do_timing = 0;
00222
00223 self->need_update_inputs = 0;
00224 self->position_of_error = -1;
00225 }
00226 return (PyObject )self;
00227 }
00228
00229 static int
00230 CLazyLinker_init(CLazyLinker self, PyObject args, PyObject kwds)
00231 {
00232 static char kwlist[] = {
00233 (char)"nodes",
00234 (char)"thunks",
00235 (char)"pre_call_clear",
00236 (char)"allow_gc",
00237 (char)"call_counts",
00238 (char)"call_times",
00239 (char)"compute_map_list",
00240 (char)"storage_map_list",
00241 (char)"base_input_output_list",
00242 (char)"node_n_inputs",
00243 (char)"node_n_outputs",
00244 (char)"node_input_offset",
00245 (char)"node_output_offset",
00246 (char)"var_owner",
00247 (char)"is_lazy_list",
00248 (char)"output_vars",
00249 (char)"node_prereqs",
00250 (char)"node_output_size",
00251 (char)"update_storage",
00252 (char)"dependencies",
00253 NULL};
00254
00255 PyObject compute_map_list=NULL,
00256 storage_map_list=NULL,
00257 base_input_output_list=NULL,
00258 node_n_inputs=NULL,
00259 node_n_outputs=NULL,
00260 node_input_offset=NULL,
00261 node_output_offset=NULL,
00262 var_owner=NULL,
00263 is_lazy=NULL,
00264 output_vars=NULL,
00265 node_prereqs=NULL,
00266 node_output_size=NULL,
00267 update_storage=NULL,
00268 dependencies=NULL;
00269
00270 assert(!self->nodes);
00271 if (! PyArg_ParseTupleAndKeywords(args, kwds, "OOOiOOOOOOOOOOOOOOOO", kwlist,
00272 &self->nodes,
00273 &self->thunks,
00274 &self->pre_call_clear,
00275 &self->allow_gc,
00276 &self->call_counts,
00277 &self->call_times,
00278 &compute_map_list,
00279 &storage_map_list,
00280 &base_input_output_list,
00281 &node_n_inputs,
00282 &node_n_outputs,
00283 &node_input_offset,
00284 &node_output_offset,
00285 &var_owner,
00286 &is_lazy,
00287 &output_vars,
00288 &node_prereqs,
00289 &node_output_size,
00290 &update_storage,
00291 &dependencies
00292 ))
00293 return -1;
00294 Py_INCREF(self->nodes);
00295 Py_INCREF(self->thunks);
00296 Py_INCREF(self->pre_call_clear);
00297 Py_INCREF(self->call_counts);
00298 Py_INCREF(self->call_times);
00299
00300 Py_ssize_t n_applies = PyList_Size(self->nodes);
00301
00302 self->n_applies = n_applies;
00303 self->n_vars = PyList_Size(var_owner);
00304
00305 if (PyList_Size(self->thunks) != n_applies) return -1;
00306 if (PyList_Size(self->call_counts) != n_applies) return -1;
00307 if (PyList_Size(self->call_times) != n_applies) return -1;
00308
00309 // allocated and initialize thunk_cptr_data and thunk_cptr_fn
00310 if (n_applies)
00311 {
00312 self->thunk_cptr_data = (void)calloc(n_applies, sizeof(void));
00313 self->thunk_cptr_fn = (void)calloc(n_applies, sizeof(void));
00314 self->is_lazy = (int)calloc(n_applies, sizeof(int));
00315 self->node_prereqs = (Py_ssize_t)calloc(n_applies, sizeof(Py_ssize_t));
00316 self->node_n_prereqs = (Py_ssize_t)calloc(n_applies, sizeof(Py_ssize_t));
00317 assert(self->node_prereqs);
00318 assert(self->node_n_prereqs);
00319 assert(self->is_lazy);
00320 assert(self->thunk_cptr_fn);
00321 assert(self->thunk_cptr_data);
00322
00323 for (int i = 0; i < n_applies; ++i)
00324 {
00325 PyObject thunk = PyList_GetItem(self->thunks, i);
00326 //thunk is borrowed
00327 if (PyObject_HasAttrString(thunk, "cthunk"))
00328 {
00329 PyObject * cthunk = PyObject_GetAttrString(thunk, "cthunk");
00330 //new reference
00331 assert (cthunk && PyCObject_Check(cthunk));
00332 self->thunk_cptr_fn[i] = PyCObject_AsVoidPtr(cthunk);
00333 self->thunk_cptr_data[i] = PyCObject_GetDesc(cthunk);
00334 Py_DECREF(cthunk);
00335 // cthunk is kept alive by membership in self->thunks
00336 }
00337
00338 PyObject * el_i = PyList_GetItem(is_lazy, i);
00339 self->is_lazy[i] = PyNumber_AsSsize_t(el_i, NULL);
00340
00341 / now get the prereqs /
00342 el_i = PyList_GetItem(node_prereqs, i);
00343 assert (PyList_Check(el_i));
00344 self->node_n_prereqs[i] = PyList_Size(el_i);
00345 if (self->node_n_prereqs[i])
00346 {
00347 self->node_prereqs[i] = (Py_ssize_t)malloc(
00348 PyList_Size(el_i)sizeof(Py_ssize_t));
00349 for (int j = 0; j < PyList_Size(el_i); ++j)
00350 {
00351 PyObject * el_ij = PyList_GetItem(el_i, j);
00352 Py_ssize_t N = PyNumber_AsSsize_t(el_ij, PyExc_IndexError);
00353 if (PyErr_Occurred())
00354 return -1;
00355 // N < n. variables
00356 assert(N < PyList_Size(var_owner));
00357 self->node_prereqs[i][j] = N;
00358 }
00359 }
00360 }
00361 }
00362 if (PyList_Check(base_input_output_list))
00363 {
00364 Py_ssize_t n_inputs_outputs_base = PyList_Size(base_input_output_list);
00365 self->node_inputs_outputs_base = (Py_ssize_t)calloc(n_inputs_outputs_base,sizeof(Py_ssize_t));
00366 assert(self->node_inputs_outputs_base);
00367 for (int i = 0; i < n_inputs_outputs_base; ++i)
00368 {
00369 PyObject el_i = PyList_GetItem(base_input_output_list, i);
00370 Py_ssize_t idx = PyNumber_AsSsize_t(el_i, PyExc_IndexError);
00371 if (PyErr_Occurred()) return -1;
00372 self->node_inputs_outputs_base[i] = idx;
00373 }
00374 self->node_n_inputs = (Py_ssize_t)calloc(n_applies,sizeof(Py_ssize_t));
00375 assert(self->node_n_inputs);
00376 self->node_n_outputs = (Py_ssize_t)calloc(n_applies,sizeof(Py_ssize_t));
00377 assert(self->node_n_outputs);
00378 self->node_inputs = (Py_ssize_t)calloc(n_applies,sizeof(Py_ssize_t));
00379 assert(self->node_inputs);
00380 self->node_outputs = (Py_ssize_t**)calloc(n_applies,sizeof(Py_ssize_t));
00381 assert(self->node_outputs);
00382 for (int i = 0; i < n_applies; ++i)
00383 {
00384 Py_ssize_t N;
00385 N = PyNumber_AsSsize_t(PyList_GetItem(node_n_inputs, i),PyExc_IndexError);
00386 if (PyErr_Occurred()) return -1;
00387 assert (N <= n_inputs_outputs_base);
00388 self->node_n_inputs[i] = N;
00389 N = PyNumber_AsSsize_t(PyList_GetItem(node_n_outputs, i),PyExc_IndexError);
00390 if (PyErr_Occurred()) return -1;
00391 assert (N <= n_inputs_outputs_base);
00392 self->node_n_outputs[i] = N;
00393 N = PyNumber_AsSsize_t(PyList_GetItem(node_input_offset, i),PyExc_IndexError);
00394 if (PyErr_Occurred()) return -1;
00395 assert (N <= n_inputs_outputs_base);
00396 self->node_inputs[i] = &self->node_inputs_outputs_base[N];
00397 N = PyNumber_AsSsize_t(PyList_GetItem(node_output_offset, i),PyExc_IndexError);
00398 if (PyErr_Occurred()) return -1;
00399 assert (N <= n_inputs_outputs_base);
00400 self->node_outputs[i] = &self->node_inputs_outputs_base[N];
00401 }
00402 }
00403 else
00404 {
00405 PyErr_SetString(PyExc_TypeError, "base_input_output_list must be list");
00406 return -1;
00407 }
00408
00409 // allocation for var_owner
00410 if (PyList_Check(var_owner))
00411 {
00412 self->var_owner = (Py_ssize_t)calloc(self->n_vars,sizeof(Py_ssize_t));
00413 self->var_has_owner = (int)calloc(self->n_vars,sizeof(int));
00414 self->var_computed = (int)calloc(self->n_vars,sizeof(int));
00415 self->var_computed_cells = (PyObject**)calloc(self->n_vars,sizeof(PyObject*));
00416 self->var_value_cells = (PyObject**)calloc(self->n_vars,sizeof(PyObject));
00417 for (int i = 0; i < self->n_vars; ++i)
00418 {
00419 PyObject el_i = PyList_GetItem(var_owner, i);
00420 if (el_i == Py_None)
00421 {
00422 self->var_has_owner[i] = 0;
00423 }
00424 else
00425 {
00426 Py_ssize_t N = PyNumber_AsSsize_t(el_i, PyExc_IndexError);
00427 if (PyErr_Occurred()) return -1;
00428 assert (N <= n_applies);
00429 self->var_owner[i] = N;
00430 self->var_has_owner[i] = 1;
00431 }
00432 self->var_computed_cells[i] = PyList_GetItem(compute_map_list, i);
00433 Py_INCREF(self->var_computed_cells[i]);
00434 self->var_value_cells[i] = PyList_GetItem(storage_map_list, i);
00435 Py_INCREF(self->var_value_cells[i]);
00436 }
00437 }
00438 else
00439 {
00440 PyErr_SetString(PyExc_TypeError, "var_owner must be list");
00441 return -1;
00442 }
00443
00444 if (dependencies != Py_None)
00445 {
00446 self->dependencies = (Py_ssize_t**)calloc(self->n_vars, sizeof(Py_ssize_t ));
00447 self->n_dependencies = (Py_ssize_t)calloc(self->n_vars, sizeof(Py_ssize_t));
00448 assert(self->dependencies);
00449 assert(self->n_dependencies);
00450
00451 for (int i = 0; i < self->n_vars; ++i)
00452 {
00453 PyObject tmp = PyList_GetItem(dependencies, i);
00454 // refcounting - tmp is borrowed
00455 if (unpack_list_of_ssize_t(tmp, &self->dependencies[i], &self->n_dependencies[i],
00456 "dependencies"))
00457 return -1;
00458 }
00459 }
00460
00461 if (unpack_list_of_ssize_t(output_vars, &self->output_vars, &self->n_output_vars,
00462 "output_vars"))
00463 return -1;
00464 for (int i = 0; i < self->n_output_vars; ++i)
00465 {
00466 assert(self->output_vars[i] < self->n_vars);
00467 }
00468 if (unpack_list_of_ssize_t(update_storage, &self->update_storage, &self->n_updates,
00469 "updates_storage"))
00470 return -1;
00471 return 0;
00472 }
00473 static void set_position_of_error(CLazyLinker * self, int owner_idx)
00474 {
00475 if (self->position_of_error == -1)
00476 {
00477 self->position_of_error = owner_idx;
00478 }
00479 }
00480 static PyObject * pycall(CLazyLinker * self, Py_ssize_t node_idx, int verbose)
00481 {
00482 // call thunk to see which inputs it wants
00483 PyObject * thunk = PyList_GetItem(self->thunks, node_idx);
00484 // refcounting - thunk is borrowed
00485 PyObject * rval = NULL;
00486 if (self->do_timing)
00487 {
00488 double t0 = pytime(NULL);
00489 if (verbose) fprintf(stderr, "calling via Python (node %i)\n", (int)node_idx);
00490 rval = PyObject_CallObject(thunk, NULL);
00491 if (rval)
00492 {
00493 double t1 = pytime(NULL);
00494 double ti = PyFloat_AsDouble(
00495 PyList_GetItem(self->call_times, node_idx));
00496 PyList_SetItem(self->call_times, node_idx,
00497 PyFloat_FromDouble(t1 - t0 + ti));
00498 PyObject * count = PyList_GetItem(self->call_counts, node_idx);
00499 long icount = PyInt_AsLong(count);
00500 PyList_SetItem(self->call_counts, node_idx,
00501 PyInt_FromLong(icount + 1));
00502 }
00503 }
00504 else
00505 {
00506 if (verbose)
00507 {
00508 fprintf(stderr, "calling via Python (node %i)\n", (int)node_idx);
00509 }
00510 rval = PyObject_CallObject(thunk, NULL);
00511 }
00512 return rval;
00513 }
00514 static int c_call(CLazyLinker * self, Py_ssize_t node_idx, int verbose)
00515 {
00516 void * ptr_addr = self->thunk_cptr_fn[node_idx];
00517 int (fn)(void) = (int ()(void))(ptr_addr);
00518 if (verbose) fprintf(stderr, "calling non-lazy shortcut (node %i)\n", (int)node_idx);
00519 int err = 0;
00520 if (self->do_timing)
00521 {
00522 double t0 = pytime(NULL);
00523 err = fn(self->thunk_cptr_data[node_idx]);
00524 double t1 = pytime(NULL);
00525 double ti = PyFloat_AsDouble(PyList_GetItem(self->call_times, node_idx));
00526 PyList_SetItem(self->call_times, node_idx, PyFloat_FromDouble(t1 - t0 + ti));
00527 PyObject count = PyList_GetItem(self->call_counts, node_idx);
00528 long icount = PyInt_AsLong(count);
00529 PyList_SetItem(self->call_counts, node_idx, PyInt_FromLong(icount+1));
00530 }
00531 else
00532 {
00533 err = fn(self->thunk_cptr_data[node_idx]);
00534 }
00535
00536 if (err)
00537 {
00538 // cast the argument to a PyList (as described near line 226 of cc.py)
00539 PyObject * __ERROR = ((PyObject**)self->thunk_cptr_data[node_idx])[0];
00540 assert (PyList_Check(__ERROR));
00541 assert (PyList_Size(__ERROR) == 3);
00542 PyObject * err_type = PyList_GetItem(__ERROR, 0); //stolen ref
00543 PyObject * err_msg = PyList_GetItem(__ERROR, 1); //stolen ref
00544 PyObject * err_trace = PyList_GetItem(__ERROR, 2); //stolen ref
00545 PyList_SET_ITEM(__ERROR, 0, Py_None); Py_INCREF(Py_None); //clobbers old ref
00546 PyList_SET_ITEM(__ERROR, 1, Py_None); Py_INCREF(Py_None); //clobbers old ref
00547 PyList_SET_ITEM(__ERROR, 2, Py_None); Py_INCREF(Py_None); //clobbers old ref
00548
00549 assert(!PyErr_Occurred()); // because CLinker hid the exception in __ERROR aka data
00550 PyErr_Restore(err_type, err_msg, err_trace); //steals refs to args
00551 }
00552 if (err) set_position_of_error(self, node_idx);
00553 return err;
00554 }
00555 static
00556 int lazy_rec_eval(CLazyLinker * self, Py_ssize_t var_idx, PyObjectone, PyObjectzero)
00557 {
00558 PyObject rval = NULL;
00559 int verbose = 0;
00560 int err = 0;
00561
00562 if (verbose) fprintf(stderr, "lazy_rec computing %i\n", (int)var_idx);
00563
00564 if (self->var_computed[var_idx] || !self->var_has_owner[var_idx])
00565 return 0;
00566
00567 Py_ssize_t owner_idx = self->var_owner[var_idx];
00568
00569 // STEP 1: compute the pre-requirements of the node
00570 // Includes input nodes for non-lazy ops.
00571 for (int i = 0; i < self->node_n_prereqs[owner_idx]; ++i)
00572 {
00573 Py_ssize_t prereq_idx = self->node_prereqs[owner_idx][i];
00574 if (!self->var_computed[prereq_idx])
00575 {
00576 err = lazy_rec_eval(self, prereq_idx, one, zero);
00577 if (err) return err;
00578 }
00579 assert (self->var_computed[prereq_idx]);
00580 }
00581
00582 // STEP 2: compute the node itself
00583 if (self->is_lazy[owner_idx])
00584 {
00585 // update the compute_map cells corresponding to the inputs of this thunk
00586 for (int i = 0; i < self->node_n_inputs[owner_idx]; ++i)
00587 {
00588 int in_idx = self->node_inputs[owner_idx][i];
00589 if (self->var_computed[in_idx])
00590 {
00591 Py_INCREF(one);
00592 err = PyList_SetItem(self->var_computed_cells[in_idx], 0, one);
00593 }
00594 else
00595 {
00596 Py_INCREF(zero);
00597 err = PyList_SetItem(self->var_computed_cells[in_idx], 0, zero);
00598 }
00599 if (err) goto fail;
00600 }
00601
00602 rval = pycall(self, owner_idx, verbose);
00603 // refcounting - rval is new ref
00604 //TODO: to prevent infinite loops
00605 // - consider check that a thunk does not ask for an input that is already computed
00606 if (rval == NULL)
00607 {
00608 assert (PyErr_Occurred());
00609 err = 1;
00610 goto fail;
00611 }
00612
00613 //update the computed-ness of any output cells
00614 for (int i = 0; i < self->node_n_outputs[owner_idx]; ++i)
00615 {
00616 int out_idx = self->node_outputs[owner_idx][i];
00617 PyObject * el_i = PyList_GetItem(self->var_computed_cells[out_idx], 0);
00618 Py_ssize_t N = PyNumber_AsSsize_t(el_i, PyExc_IndexError);
00619 if (PyErr_Occurred())
00620 {
00621 err = -1;
00622 goto pyfail;
00623 }
00624 assert (N==0 || N==1);
00625 self->var_computed[out_idx] = N;
00626 }
00627 if (!self->var_computed[var_idx])
00628 {
00629 /
00630 * If self is not computed after the call, this means that some
00631 * inputs are needed. Compute the ones on the returned list
00632 * and try to compute the current node again (with recursive call).
00633 * This allows a node to request more nodes more than once before
00634 * finally yielding a result.
00635 /
00636 if (!PyList_Check(rval))
00637 {
00638 //TODO: More helpful error to help find which node made this
00639 // bad thunk
00640 PyErr_SetString(PyExc_TypeError,
00641 "lazy thunk should return a list");
00642 err = 1;
00643 goto pyfail;
00644 }
00645
00646 if (!PyList_Size(rval))
00647 {
00648 PyErr_SetString(PyExc_ValueError,
00649 "lazy thunk returned empty list without computing output");
00650 err = 1;
00651 goto pyfail;
00652 }
00653
00654 for (int i = 0; i < PyList_Size(rval); ++i)
00655 {
00656 PyObject * el_i = PyList_GetItem(rval, i);
00657 Py_ssize_t N = PyNumber_AsSsize_t(el_i, PyExc_IndexError);
00658 if (PyErr_Occurred())
00659 {
00660 err = 1;
00661 goto pyfail;
00662 }
00663 assert (N <= self->node_n_inputs[owner_idx]);
00664 Py_ssize_t input_idx = self->node_inputs[owner_idx][N];
00665 err = lazy_rec_eval(self, input_idx, one, zero);
00666 if (err) goto pyfail;
00667 }
00668
00669 Py_DECREF(rval);
00670 /
00671 * We intentionally skip all the end-of-function processing
00672 * (mark outputs, GC) as it will be performed by the call
00673 * that actually manages to compute the result.
00674 /
00675 return lazy_rec_eval(self, var_idx, one, zero);
00676 }
00677
00678 Py_DECREF(rval);
00679 }
00680 else //owner is not a lazy op. Ensure all intputs are evaluated.
00681 {
00682 // loop over inputs to owner
00683 // call lazy_rec_eval on each one that is not computed.
00684 // if there's an error, pass it up the stack
00685 for (int i = 0; i < self->node_n_inputs[owner_idx]; ++i)
00686 {
00687 Py_ssize_t input_idx = self->node_inputs[owner_idx][i];
00688 if (!self->var_computed[input_idx])
00689 {
00690 err = lazy_rec_eval(self, input_idx, one, zero);
00691 if (err) return err;
00692 }
00693 assert (self->var_computed[input_idx]);
00694 }
00695
00696 // call the thunk for this owner.
00697 if (self->thunk_cptr_fn[owner_idx])
00698 {
00699 err = c_call(self, owner_idx, verbose);
00700 if (err) goto fail;
00701 }
00702 else
00703 {
00704 rval = pycall(self, owner_idx, verbose);
00705 //rval is new ref
00706 if (rval) //pycall returned normally (no exception)
00707 {
00708 if (rval == Py_None)
00709 {
00710 Py_DECREF(rval); //ignore a return of None
00711 }
00712 else if (PyList_Check(rval))
00713 {
00714 PyErr_SetString(PyExc_TypeError,
00715 "non-lazy thunk should return None, not list");
00716 err = 1;
00717 goto pyfail;
00718 }
00719 else // don't know what it returned, but it wasn't right.
00720 {
00721 PyErr_SetObject(PyExc_TypeError, rval);
00722 err = 1;
00723 // We don't release rval since we put it in the error above
00724 goto fail;
00725 }
00726 }
00727 else // pycall returned NULL (internal error)
00728 {
00729 err = 1;
00730 goto fail;
00731 }
00732 }
00733 }
00734
00735 // loop over all outputs and mark them as computed
00736 for (int i = 0; i < self->node_n_outputs[owner_idx]; ++i)
00737 {
00738 self->var_computed[self->node_outputs[owner_idx][i]] = 1;
00739 }
00740
00741 // Free vars that are not needed anymore
00742 if (self->allow_gc)
00743 {
00744 for (int i = 0; i < self->node_n_inputs[owner_idx]; ++i)
00745 {
00746 int cleanup = 1;
00747 Py_ssize_t i_idx = self->node_inputs[owner_idx][i];
00748 if (!self->var_has_owner[i_idx])
00749 continue;
00750
00751 for (int j = 0; j < self->n_output_vars; ++j)
00752 {
00753 if (i_idx == self->output_vars[j])
00754 {
00755 cleanup = 0;
00756 break;
00757 }
00758 }
00759 if (!cleanup) continue;
00760
00761 for (int j = 0; j < self->n_dependencies[i_idx]; ++j)
00762 {
00763 if (!self->var_computed[self->dependencies[i_idx][j]])
00764 {
00765 cleanup = 0;
00766 break;
00767 }
00768 }
00769 if (!cleanup) continue;
00770
00771 Py_INCREF(Py_None);
00772 err = PyList_SetItem(self->var_value_cells[i_idx], 0, Py_None);
00773 //See the Stack gc implementation for why we change it to 2 and not 0.
00774 self->var_computed[i_idx] = 2;
00775 if (err) goto fail;
00776 }
00777 }
00778
00779 return 0;
00780 pyfail:
00781 Py_DECREF(rval);
00782 fail:
00783 set_position_of_error(self, owner_idx);
00784 return err;
00785 }
00786
00787 static PyObject *
00788 CLazyLinker_call(PyObject _self, PyObject args, PyObject kwds)
00789 {
00790 CLazyLinker * self = (CLazyLinker)_self;
00791 static char kwlist[] = {
00792 (char )"time_thunks",
00793 (char )"n_calls",
00794 (char )"output_subset",
00795 NULL};
00796 int n_calls=1;
00797 PyObject output_subset_ptr = NULL;
00798 if (! PyArg_ParseTupleAndKeywords(args, kwds, "|iiO", kwlist,
00799 &self->do_timing,
00800 &n_calls,
00801 &output_subset_ptr))
00802 return NULL;
00803
00804 int err = 0;
00805 // parse an output_subset list
00806 // it is stored as a bool list of length n_output_vars: calculate a var or not
00807 char output_subset = NULL;
00808 int output_subset_size = -1;
00809 if (output_subset_ptr != NULL)
00810 {
00811 if (! PyList_Check(output_subset_ptr))
00812 {
00813 err = 1;
00814 PyErr_SetString(PyExc_RuntimeError, "Output_subset is not a list");
00815 }
00816 else
00817 {
00818 output_subset_size = PyList_Size(output_subset_ptr);
00819 output_subset = (char)calloc(self->n_output_vars, sizeof(char));
00820 for (int it = 0; it < output_subset_size; ++it)
00821 {
00822 PyObject elem = PyList_GetItem(output_subset_ptr, it);
00823 if (! PyInt_Check(elem))
00824 {
00825 err = 1;
00826 PyErr_SetString(PyExc_RuntimeError, "Some elements of output_subset list are not int");
00827 }
00828 output_subset[PyInt_AsLong(elem)] = 1;
00829 }
00830 }
00831 }
00832
00833 self->position_of_error = -1;
00834 // create constants used to fill the var_compute_cells
00835 PyObject * one = PyInt_FromLong(1);
00836 PyObject * zero = PyInt_FromLong(0);
00837
00838 // pre-allocate our return value
00839 Py_INCREF(Py_None);
00840 PyObject * rval = Py_None;
00841 //clear storage of pre_call_clear elements
00842 for (int call_i = 0; call_i < n_calls && (!err); ++call_i)
00843 {
00844 Py_ssize_t n_pre_call_clear = PyList_Size(self->pre_call_clear);
00845 assert(PyList_Check(self->pre_call_clear));
00846 for (int i = 0; i < n_pre_call_clear; ++i)
00847 {
00848 PyObject * el_i = PyList_GetItem(self->pre_call_clear, i);
00849 Py_INCREF(Py_None);
00850 PyList_SetItem(el_i, 0, Py_None);
00851 }
00852 //clear the computed flag out of all non-input vars
00853 for (int i = 0; i < self->n_vars; ++i)
00854 {
00855 self->var_computed[i] = !self->var_has_owner[i];
00856 if (self->var_computed[i])
00857 {
00858 Py_INCREF(one);
00859 PyList_SetItem(self->var_computed_cells[i], 0, one);
00860 }
00861 else
00862 {
00863 Py_INCREF(zero);
00864 PyList_SetItem(self->var_computed_cells[i], 0, zero);
00865 }
00866 }
00867
00868 int first_updated = self->n_output_vars - self->n_updates;
00869 for (int i = 0; i < self->n_output_vars && (!err); ++i)
00870 {
00871 if (i >= first_updated || output_subset == NULL || output_subset[i] == 1)
00872 {
00873 err = lazy_rec_eval(self, self->output_vars[i], one, zero);
00874 }
00875 }
00876
00877 if (!err)
00878 {
00879 // save references to outputs prior to updating storage containers
00880 assert (self->n_output_vars >= self->n_updates);
00881 Py_DECREF(rval);
00882 rval = PyList_New(self->n_output_vars);
00883 for (int i = 0; i < (self->n_output_vars); ++i)
00884 {
00885 Py_ssize_t src = self->output_vars[i];
00886 PyObject * item = PyList_GetItem(self->var_value_cells[src], 0);
00887 if ((output_subset == NULL || output_subset[i]) &&
00888 self->var_computed[src] != 1)
00889 {
00890 err = 1;
00891 PyErr_Format(PyExc_AssertionError,
00892 "The compute map of output %d should contain "
00893 "1 at the end of execution, not %d.",
00894 i, self->var_computed[src]);
00895 break;
00896 }
00897 Py_INCREF(item);
00898 PyList_SetItem(rval, i, item);
00899 }
00900 }
00901
00902 if (!err)
00903 {
00904 // Update the inputs that have an update rule
00905 for (int i = 0; i < self->n_updates; ++i)
00906 {
00907 PyObject tmp = PyList_GetItem(rval, self->n_output_vars - self->n_updates + i);
00908 Py_INCREF(tmp);
00909 Py_ssize_t dst = self->update_storage[i];
00910 PyList_SetItem(self->var_value_cells[dst], 0, tmp);
00911 }
00912 }
00913 }
00914
00915 /
00916 Clear everything that is left and not an output. This is needed
00917 for lazy evaluation since the current GC algo is too conservative
00918 with lazy graphs.
00919 /
00920 if (self->allow_gc && !err)
00921 {
00922 for (Py_ssize_t i = 0; i < self->n_vars; ++i)
00923 {
00924 int do_cleanup = 1;
00925 if (!self->var_has_owner[i] || !self->var_computed[i])
00926 continue;
00927 for (int j = 0; j < self->n_output_vars; ++j)
00928 {
00929 if (i == self->output_vars[j])
00930 {
00931 do_cleanup = 0;
00932 break;
00933 }
00934 }
00935 if (!do_cleanup)
00936 continue;
00937 Py_INCREF(Py_None);
00938 PyList_SetItem(self->var_value_cells[i], 0, Py_None);
00939 }
00940 }
00941 if (output_subset != NULL)
00942 free(output_subset);
00943
00944 Py_DECREF(one);
00945 Py_DECREF(zero);
00946 if (err)
00947 {
00948 Py_DECREF(rval);
00949 return NULL;
00950 }
00951 return rval;
00952 }
00953
00954 #if 0
00955 static PyMethodDef CLazyLinker_methods[] = {
00956 {
00957 //"name", (PyCFunction)CLazyLinker_accept, METH_VARARGS, "Return the name, combining the first and last name"
00958 },
00959 {NULL} / Sentinel /
00960 };
00961 #endif
00962
00963
00964 static PyObject *
00965 CLazyLinker_get_allow_gc(CLazyLinker self, void closure)
00966 {
00967 return PyBool_FromLong(self->allow_gc);
00968 }
00969
00970 static int
00971 CLazyLinker_set_allow_gc(CLazyLinker self, PyObject value, void closure)
00972 {
00973 if(!PyBool_Check(value))
00974 return -1;
00975
00976 if (value == Py_True)
00977 self->allow_gc = true;
00978 else
00979 self->allow_gc = false;
00980 return 0;
00981 }
00982
00983 static PyGetSetDef CLazyLinker_getset[] = {
00984 {(char)"allow_gc",
00985 (getter)CLazyLinker_get_allow_gc,
00986 (setter)CLazyLinker_set_allow_gc,
00987 (char)"do this function support allow_gc",
00988 NULL},
00989 {NULL, NULL, NULL, NULL} / Sentinel /
00990 };
00991 static PyMemberDef CLazyLinker_members[] = {
00992 {(char)"nodes", T_OBJECT_EX, offsetof(CLazyLinker, nodes), 0,
00993 (char)"list of nodes"},
00994 {(char)"thunks", T_OBJECT_EX, offsetof(CLazyLinker, thunks), 0,
00995 (char)"list of thunks in program"},
00996 {(char)"call_counts", T_OBJECT_EX, offsetof(CLazyLinker, call_counts), 0,
00997 (char)"number of calls of each thunk"},
00998 {(char)"call_times", T_OBJECT_EX, offsetof(CLazyLinker, call_times), 0,
00999 (char)"total runtime in each thunk"},
01000 {(char)"position_of_error", T_INT, offsetof(CLazyLinker, position_of_error), 0,
01001 (char)"position of failed thunk"},
01002 {(char)"time_thunks", T_INT, offsetof(CLazyLinker, do_timing), 0,
01003 (char)"bool: nonzero means call will time thunks"},
01004 {(char)"need_update_inputs", T_INT, offsetof(CLazyLinker, need_update_inputs), 0,
01005 (char)"bool: nonzero means Function.call must implement update mechanism"},
01006 {NULL} / Sentinel /
01007 };
01008
01009 static PyTypeObject lazylinker_ext_CLazyLinkerType = {
01010 #if defined(NPY_PY3K)
01011 PyVarObject_HEAD_INIT(NULL, 0)
01012 #else
01013 PyObject_HEAD_INIT(NULL)
01014 0, /ob_size/
01015 #endif
01016 "lazylinker_ext.CLazyLinker", /tp_name/
01017 sizeof(CLazyLinker), /tp_basicsize/
01018 0, /tp_itemsize/
01019 CLazyLinker_dealloc, /tp_dealloc/
01020 0, /tp_print/
01021 0, /tp_getattr/
01022 0, /tp_setattr/
01023 0, /tp_compare/
01024 0, /tp_repr/
01025 0, /tp_as_number/
01026 0, /tp_as_sequence/
01027 0, /tp_as_mapping/
01028 0, /tp_hash /
01029 CLazyLinker_call, /tp_call/
01030 0, /tp_str/
01031 0, /tp_getattro/
01032 0, /tp_setattro/
01033 0, /tp_as_buffer/
01034 Py_TPFLAGS_DEFAULT|Py_TPFLAGS_BASETYPE, /tp_flags/
01035 "CLazyLinker object", / tp_doc /
01036 0, / tp_traverse /
01037 0, / tp_clear /
01038 0, / tp_richcompare /
01039 0, / tp_weaklistoffset /
01040 0, / tp_iter /
01041 0, / tp_iternext /
01042 0,//CLazyLinker_methods, / tp_methods /
01043 CLazyLinker_members, / tp_members /
01044 CLazyLinker_getset, / tp_getset /
01045 0, / tp_base /
01046 0, / tp_dict /
01047 0, / tp_descr_get /
01048 0, / tp_descr_set /
01049 0, / tp_dictoffset /
01050 (initproc)CLazyLinker_init,/ tp_init /
01051 0, / tp_alloc /
01052 CLazyLinker_new, / tp_new /
01053 };
01054
01055 static PyObject get_version(PyObject dummy, PyObject args)
01056 {
01057 PyObject result = PyFloat_FromDouble(0.211);
01058 return result;
01059 }
01060
01061 static PyMethodDef lazylinker_ext_methods[] = {
01062 {"get_version", get_version, METH_VARARGS, "Get extension version."},
01063 {NULL, NULL, 0, NULL} / Sentinel /
01064 };
01065
01066 #if defined(NPY_PY3K)
01067 static struct PyModuleDef moduledef = {
01068 PyModuleDef_HEAD_INIT,
01069 "lazylinker_ext",
01070 NULL,
01071 -1,
01072 lazylinker_ext_methods,
01073 NULL,
01074 NULL,
01075 NULL,
01076 NULL
01077 };
01078 #endif
01079 #if defined(NPY_PY3K)
01080 #define RETVAL m
01081 PyMODINIT_FUNC
01082 PyInit_lazylinker_ext(void) {
01083 #else
01084 #define RETVAL
01085 PyMODINIT_FUNC
01086 initlazylinker_ext(void)
01087 {
01088 #endif
01089 PyObject m;
01090
01091 lazylinker_ext_CLazyLinkerType.tp_new = PyType_GenericNew;
01092 if (PyType_Ready(&lazylinker_ext_CLazyLinkerType) < 0)
01093 return RETVAL;
01094 #if defined(NPY_PY3K)
01095 m = PyModule_Create(&moduledef);
01096 #else
01097 m = Py_InitModule3("lazylinker_ext", lazylinker_ext_methods,
01098 "Example module that creates an extension type.");
01099 #endif
01100 Py_INCREF(&lazylinker_ext_CLazyLinkerType);
01101 PyModule_AddObject(m, "CLazyLinker", (PyObject *)&lazylinker_ext_CLazyLinkerType);
01102
01103 return RETVAL;
01104 }
01105
Problem occurred during compilation with the command line below:
"C:\ProgramData\Anaconda3\Library\mingw-w64\bin\g++.exe" -shared -g -march=haswell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -DMS_WIN64 -I"C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\include" -I"C:\ProgramData\Anaconda3\include" -I"C:\ProgramData\Anaconda3\lib\site-packages\theano\gof" -L"C:\ProgramData\Anaconda3\libs" -L"C:\ProgramData\Anaconda3" -o C:\Users\User\AppData\Local\Theano\compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64\lazylinker_ext\lazylinker_ext.pyd C:\Users\User\AppData\Local\Theano\compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64\lazylinker_ext\mod.cpp -lpython36

C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function _import_array': C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1460: undefined reference to __imp_PyExc_ImportError'
C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1466: undefined reference to __imp_PyExc_AttributeError' C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1471: undefined reference to __imp_PyCapsule_Type'
C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1472: undefined reference to __imp_PyExc_RuntimeError' C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1487: undefined reference to __imp_PyExc_RuntimeError'
C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1495: undefined reference to __imp_PyExc_RuntimeError' C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1501: undefined reference to __imp_PyExc_RuntimeError'
C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1511: undefined reference to __imp_PyExc_RuntimeError' C:\Users\User\AppData\Local\Temp\cc3D1RNn.o:C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/__multiarray_api.h:1523: more undefined references to __imp_PyExc_RuntimeError' follow
C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function NpyCapsule_Check': C:/ProgramData/Anaconda3/lib/site-packages/numpy/core/include/numpy/npy_3kcompat.h:456: undefined reference to __imp_PyCapsule_Type'
C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function unpack_list_of_ssize_t': C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:48: undefined reference to __imp_PyExc_TypeError'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:58: undefined reference to __imp_PyExc_IndexError' C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function CLazyLinker_init':
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:352: undefined reference to __imp_PyExc_IndexError' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:370: undefined reference to __imp_PyExc_IndexError'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:385: undefined reference to __imp_PyExc_IndexError' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:389: undefined reference to __imp_PyExc_IndexError'
C:\Users\User\AppData\Local\Temp\cc3D1RNn.o:C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:393: more undefined references to __imp_PyExc_IndexError' follow C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function CLazyLinker_init':
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:405: undefined reference to __imp_PyExc_TypeError' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:420: undefined reference to __imp__Py_NoneStruct'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:426: undefined reference to __imp_PyExc_IndexError' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:440: undefined reference to __imp_PyExc_TypeError'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:444: undefined reference to __imp__Py_NoneStruct' C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function c_call':
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:545: undefined reference to __imp__Py_NoneStruct' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:545: undefined reference to __imp__Py_NoneStruct'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:545: undefined reference to __imp__Py_NoneStruct' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:546: undefined reference to __imp__Py_NoneStruct'
C:\Users\User\AppData\Local\Temp\cc3D1RNn.o:C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:546: more undefined references to __imp__Py_NoneStruct' follow C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function lazy_rec_eval':
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:618: undefined reference to __imp_PyExc_IndexError' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:641: undefined reference to __imp_PyExc_TypeError'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:649: undefined reference to __imp_PyExc_ValueError' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:657: undefined reference to __imp_PyExc_IndexError'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:708: undefined reference to __imp__Py_NoneStruct' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:715: undefined reference to __imp_PyExc_TypeError'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:721: undefined reference to __imp_PyExc_TypeError' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:771: undefined reference to __imp__Py_NoneStruct'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:771: undefined reference to __imp__Py_NoneStruct' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:772: undefined reference to __imp__Py_NoneStruct'
C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function CLazyLinker_call': C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:814: undefined reference to __imp_PyExc_RuntimeError'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:826: undefined reference to __imp_PyExc_RuntimeError' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:839: undefined reference to __imp__Py_NoneStruct'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:839: undefined reference to __imp__Py_NoneStruct' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:840: undefined reference to __imp__Py_NoneStruct'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:849: undefined reference to __imp__Py_NoneStruct' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:849: undefined reference to __imp__Py_NoneStruct'
C:\Users\User\AppData\Local\Temp\cc3D1RNn.o:C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:850: more undefined references to __imp__Py_NoneStruct' follow C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function CLazyLinker_call':
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:894: undefined reference to __imp_PyExc_AssertionError' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:937: undefined reference to __imp__Py_NoneStruct'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:937: undefined reference to __imp__Py_NoneStruct' C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:938: undefined reference to __imp__Py_NoneStruct'
C:\Users\User\AppData\Local\Temp\cc3D1RNn.o: In function CLazyLinker_set_allow_gc': C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:973: undefined reference to __imp_PyBool_Type'
C:/Users/User/AppData/Local/Theano/compiledir_Windows-7-6.1.7601-SP1-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.6.1-64/lazylinker_ext/mod.cpp:976: undefined reference to `__imp__Py_TrueStruct'
collect2.exe: error: ld returned 1 exit status

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\theano\gof\lazylinker_c.py", line 75, in
raise ImportError()
ImportError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\theano\gof\lazylinker_c.py", line 92, in
raise ImportError()
ImportError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:/gvyshnya/bp/freelance_sd/projects/dvc/blog posts/3_dvc_vs_other_ml_pipeline/dev/mlbox_pipeline.py", line 10, in
import mlbox as mlb
File "C:\ProgramData\Anaconda3\lib\site-packages\mlbox_init_.py", line 7, in
from .preprocessing import *
File "C:\ProgramData\Anaconda3\lib\site-packages\mlbox\preprocessing_init_.py", line 1, in
from .drift_thresholder import *
File "C:\ProgramData\Anaconda3\lib\site-packages\mlbox\preprocessing\drift_thresholder.py", line 9, in
from ..encoding.na_encoder import NA_encoder
File "C:\ProgramData\Anaconda3\lib\site-packages\mlbox\encoding_init_.py", line 2, in
from .categorical_encoder import *
File "C:\ProgramData\Anaconda3\lib\site-packages\mlbox\encoding\categorical_encoder.py", line 15, in
from keras.layers.core import Dense, Reshape, Dropout
File "C:\ProgramData\Anaconda3\lib\site-packages\keras_init_.py", line 3, in
from . import utils
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils_init_.py", line 6, in
from . import conv_utils
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils\conv_utils.py", line 3, in
from .. import backend as K
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend_init_.py", line 80, in
from .theano_backend import *
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\theano_backend.py", line 3, in
import theano
File "C:\ProgramData\Anaconda3\lib\site-packages\theano_init_.py", line 66, in
from theano.compile import (
File "C:\ProgramData\Anaconda3\lib\site-packages\theano\compile_init_.py", line 10, in
from theano.compile.function_module import *
File "C:\ProgramData\Anaconda3\lib\site-packages\theano\compile\function_module.py", line 21, in
import theano.compile.mode
File "C:\ProgramData\Anaconda3\lib\site-packages\theano\compile\mode.py", line 10, in
import theano.gof.vm
File "C:\ProgramData\Anaconda3\lib\site-packages\theano\gof\vm.py", line 662, in
from . import lazylinker_c
File "C:\ProgramData\Anaconda3\lib\site-packages\theano\gof\lazylinker_c.py", line 127, in
preargs=args)
File "C:\ProgramData\Anaconda3\lib\site-packages\theano\gof\cmodule.py", line 2316, in compile_str
(status, compile_stderr.replace('\n', '. ')))
.

Process finished with exit code 1
`

should python 3.7 be compatible?

I just installed the latest version of python 3.7.4. Is there any particular reason why MLBox is not compatible with 3.7?

timeout while calling Reader()

Hi, I installed MLbox and tried to run the example code with the given dataset, but when I run the rd = Reader(sep)
(py36) katekong@cufyp:~$ python
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.

rd = Reader(sep)

the following error occurs:

File "/home/katekong/anaconda3/envs/py27/lib/python2.7/site-packages/mlbox/preprocessing/reader.py", line 145, in init
self.__client = ipp.Client(profile='home')
File "/home/katekong/anaconda3/envs/py27/lib/python2.7/site-packages/ipyparallel/client/client.py", line 495, in init
self._connect(sshserver, ssh_kwargs, timeout)
File "/home/katekong/anaconda3/envs/py27/lib/python2.7/site-packages/ipyparallel/client/client.py", line 615, in _connect
raise error.TimeoutError("Hub connection request timed out")
ipyparallel.error.TimeoutError: Hub connection request timed out

pip install looks OK, but import mlbox fails

Hi Axel,

I tried to install MLBox on my Ubuntu 16.04 system.

I created a virtual environment with anaconda, then activate the environment and proceed to installation

conda create -n py35mlbox python=3.5 anaconda
source activate py35mlbox
sudo apt-get install build-essential
pip install cmake
pip install mlbox

All this did run OK. After that, I tried to run a python script with the instruction "from mlbox.optimisation import *" in it. The following error message was returned : "ImportError: No module named 'mlbox'."
Then I created an ipython kernel like this :

python -m ipykernel install --user --name py35mlbox --display-name "py35mlbox"

In the ipython console I tried to import mlbox:

In [2]: import mlbox
/home/drussier/anaconda2/envs/py35mlbox/lib/python3.5/site-packages/mlbox/preprocessing/drift/init.py:9: UserWarning: ipCluster is starting. Please wait 30 sec and check in terminal that 'the engines appear to have started successfully'.
warnings.warn("ipCluster is starting. Please wait 30 sec and check in terminal that 'the engines appear to have started successfully'.")
Using Theano backend.
Traceback (most recent call last):
File "/usr/bin/ipcluster", line 4, in
from IPython.parallel.apps.ipclusterapp import launch_new_instance
File "/usr/lib/python2.7/dist-packages/IPython/parallel/init.py", line 21, in
import zmq
ImportError: No module named zmq
File "/home/drussier/anaconda2/envs/py35mlbox/lib/python3.5/site-packages/mlbox/optimisation/optimiser.py", line 69
self.to_path = to_path
^
TabError: inconsistent use of tabs and spaces in indentation

In [3]: Traceback (most recent call last):
File "/usr/bin/ipcluster", line 4, in
from IPython.parallel.apps.ipclusterapp import launch_new_instance
File "/usr/lib/python2.7/dist-packages/IPython/parallel/init.py", line 21, in
import zmq
ImportError: No module named zmq

Then nothing happens. It looks like there is a wrong path setting : "/usr/lib/python2.7/dist-packages/IPython/parallel/init.py" is not a file from my virtual environnment.

Do you have an idea what's going on ?

Thanks

cross_val_predict_proba is redundant with sklearn.model_selection.cross_val_predict

you can do method="predict_proba"

Code implementation frozen

Hello,

I tried implementing the code in https://www.analyticsvidhya.com/blog/2017/07/mlbox-library-automated-machine-learning/.
The engines start but the code implementation is frozen (still running but no task is done). I get the following message on my screen:

I tried to put time.sleep but it doesn't change.
I'm on Windows 10 Pro, Python 3.5 with Anaconda
Do you have any idea why?

dict target

I have a question regarding the format of target dataframe which is passed to fit_predict(), does it contain the labels of training dataset or for test dataset as well ? and how it should be formatted?
I think it's not clear if someone has to process the data himself before passing it to the predictor.

Thanks

Error while computing the cross validation mean score.

Hi,

I am interested in MLBox and tried for a Kaggle classification project. When processing to the step of optimizing the best hyperparameters, an error message showed as 'An error occurred while computing the cross validation mean score. Check the parameter values and your scoring function.'

Here's the code I used:

`
Path = ['train_path', 'test_path']
target = 'target_name'

rd = Reader(sep = ",")
df = rd.train_test_split(paths, target_name)

dft = Drift_thresholder()
df = dft.fit_transform(df)

space = {'ne__numerical_strategy':{"search":"choice",
"space":['mean','median']},

     'ne__categorical_strategy':{"search":"choice",
                                 "space":[np.NaN]},
     
     'ce__strategy':{"search":"choice",
                     "space":['label_encoding','entity_embedding','random_projection']},
     
    'est__strategy':{"search":"choice",
                              "space":["LightGBM"]},    
    'est__n_estimators':{"search":"choice",
                              "space":[150]},    
    'est__colsample_bytree':{"search":"uniform",
                              "space":[0.8,0.95]},
    'est__subsample':{"search":"uniform",
                              "space":[0.8,0.95]},
    'est__max_depth':{"search":"choice",
                              "space":[5,6,7,8,9]},
    'est__learning_rate':{"search":"choice",
                              "space":[0.07]} 

    }

opt = Optimiser(scoring = "roc_auc", n_folds = 5)
best_params = opt.optimise(space, df, 15)

`
Can you help me with fixing it? Thanks for that!

maplotlib import error

matplotlib is missing from requirements.txt leading to an error when importing mlbox.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mlbox/__init__.py", line 11, in <module>
    from prediction import *
  File "mlbox/prediction/__init__.py", line 1, in <module>
    from predictor import *
  File "mlbox/prediction/predictor.py", line 14, in <module>
    import matplotlib.pyplot as plt
ImportError: No module named matplotlib.pyplot

How to Check model RMSE and MAE ?

I m just compile the code it returns predictions but, how I check MAE, RMSE, MSE?

FYI: ColumnTransformer

We'll have a ColumnTransformer in sklearn pretty soon that will make it easier to treat different columns differently. That should make is much simpler to have different pipelines for categorical and continuous data, which seems one of the big issues MLBox addresses.

NameError for DriftEstimator

When computing drifts (Jupyter Notebook, MLBox 0.4.4) using the notebook in the classification example

dft = Drift_thresholder()
df = dft.fit_transform(df)

I get a name error:

computing drifts...
[0:apply]:
---------------------------------------------------------------------------NameError Traceback (most recent call last) in ()
c:\users\wettsteinm\appdata\local\continuum\anaconda3\lib\site-packages\ipyparallel\client\remotefunction.py in (f, *sequences)
248 if _mapping:
249 if sys.version_info[0] >= 3:
--> 250 f = lambda f, *sequences: list(map(f, *sequences))
251 else:
252 f = map
c:\users\wettsteinm\appdata\local\continuum\anaconda3\lib\site-packages\mlbox\preprocessing\drift\drift_threshold.py in sync_fit(df_train, df_test, estimator, n_folds, stratify, random_state)
46
47 # We will compute the indices of the CV in each thread
---> 48 de = DriftEstimator(estimator, n_folds, stratify, random_state)
49 de.fit(df_train, df_test)
50
NameError: name 'DriftEstimator' is not defined

[1:apply]:
---------------------------------------------------------------------------NameError Traceback (most recent call last) in ()
c:\users\wettsteinm\appdata\local\continuum\anaconda3\lib\site-packages\ipyparallel\client\remotefunction.py in (f, *sequences)
248 if _mapping:
249 if sys.version_info[0] >= 3:
--> 250 f = lambda f, *sequences: list(map(f, *sequences))
251 else:
252 f = map
c:\users\wettsteinm\appdata\local\continuum\anaconda3\lib\site-packages\mlbox\preprocessing\drift\drift_threshold.py in sync_fit(df_train, df_test, estimator, n_folds, stratify, random_state)
46
47 # We will compute the indices of the CV in each thread
---> 48 de = DriftEstimator(estimator, n_folds, stratify, random_state)
49 de.fit(df_train, df_test)
50
NameError: name 'DriftEstimator' is not defined

[2:apply]:
---------------------------------------------------------------------------NameError Traceback (most recent call last) in ()
c:\users\wettsteinm\appdata\local\continuum\anaconda3\lib\site-packages\ipyparallel\client\remotefunction.py in (f, *sequences)
248 if _mapping:
249 if sys.version_info[0] >= 3:
--> 250 f = lambda f, *sequences: list(map(f, *sequences))
251 else:
252 f = map
c:\users\wettsteinm\appdata\local\continuum\anaconda3\lib\site-packages\mlbox\preprocessing\drift\drift_threshold.py in sync_fit(df_train, df_test, estimator, n_folds, stratify, random_state)
46
47 # We will compute the indices of the CV in each thread
---> 48 de = DriftEstimator(estimator, n_folds, stratify, random_state)
49 de.fit(df_train, df_test)
50
NameError: name 'DriftEstimator' is not defined

[3:apply]:
---------------------------------------------------------------------------NameError Traceback (most recent call last) in ()
c:\users\wettsteinm\appdata\local\continuum\anaconda3\lib\site-packages\ipyparallel\client\remotefunction.py in (f, *sequences)
248 if _mapping:
249 if sys.version_info[0] >= 3:
--> 250 f = lambda f, *sequences: list(map(f, *sequences))
251 else:
252 f = map
c:\users\wettsteinm\appdata\local\continuum\anaconda3\lib\site-packages\mlbox\preprocessing\drift\drift_threshold.py in sync_fit(df_train, df_test, estimator, n_folds, stratify, random_state)
46
47 # We will compute the indices of the CV in each thread
---> 48 de = DriftEstimator(estimator, n_folds, stratify, random_state)
49 de.fit(df_train, df_test)
50
NameError: name 'DriftEstimator' is not defined

IndexError

when I use

dft = Drift_thresholder()
df = dft.fit_transform(df)
There jumps an IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

IndexError Traceback (most recent call last)
in ()
1 dft = Drift_thresholder()
----> 2 df = dft.fit_transform(df)

/usr/local/lib/python3.4/site-packages/mlbox/preprocessing/drift_thresholder.py in fit_transform(self, df)
108 print("computing drifts ...")
109
--> 110 ds.fit(pp.transform(df['train']), pp.transform(df['test']))
111
112 if (self.verbose):

/usr/local/lib/python3.4/site-packages/mlbox/preprocessing/drift/drift_threshold.py in fit(self, df_train, df_test)
163 self.stratify,
164 self.random_state)
--> 165 for col in df_train.columns)
166
167 for i, col in enumerate(df_train.columns):

/usr/local/lib/python3.4/site-packages/joblib/parallel.py in call(self, iterable)
777 # was dispatched. In particular this covers the edge
778 # case of Parallel used with an exhausted iterator.
--> 779 while self.dispatch_one_batch(iterator):
780 self._iterating = True
781 else:

/usr/local/lib/python3.4/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
618
619 with self._lock:
--> 620 tasks = BatchedCalls(itertools.islice(iterator, batch_size))
621 if len(tasks) == 0:
622 # No more tasks available in the iterator: tell caller to stop.

/usr/local/lib/python3.4/site-packages/joblib/parallel.py in init(self, iterator_slice)
125
126 def init(self, iterator_slice):
--> 127 self.items = list(iterator_slice)
128 self._size = len(self.items)
129

/usr/local/lib/python3.4/site-packages/mlbox/preprocessing/drift/drift_threshold.py in (.0)
163 self.stratify,
164 self.random_state)
--> 165 for col in df_train.columns)
166
167 for i, col in enumerate(df_train.columns):

/usr/local/lib64/python3.4/site-packages/pandas/core/indexing.py in getitem(self, key)
1284 return self._getitem_tuple(key)
1285 else:
-> 1286 return self._getitem_axis(key, axis=0)
1287
1288 def _getitem_axis(self, key, axis=0):

/usr/local/lib64/python3.4/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1573 self._is_valid_integer(key, axis)
1574
-> 1575 return self._get_loc(key, axis=axis)
1576
1577 def _convert_to_indexer(self, obj, axis=0, is_setter=False):

/usr/local/lib64/python3.4/site-packages/pandas/core/indexing.py in _get_loc(self, key, axis)
94
95 def _get_loc(self, key, axis=0):
---> 96 return self.obj._ixs(key, axis=axis)
97
98 def _slice(self, obj, axis=0, kind=None):

/usr/local/lib64/python3.4/site-packages/pandas/core/frame.py in _ixs(self, i, axis)
1904 return self[i]
1905 else:
-> 1906 label = self.index[i]
1907 if isinstance(label, Index):
1908 # a location index by definition

/usr/local/lib64/python3.4/site-packages/pandas/indexes/base.py in getitem(self, key)
1265
1266 key = _values_from_object(key)
-> 1267 result = getitem(key)
1268 if not lib.isscalar(result):
1269 return promote(result)

How to save model in .pkl or joblib ?

I have RUN mlbox in one system and which automatically generate SAVE folder is directory so ,I copy this SAVE folder in another pc and try to run below command. and got error>
My biggest issue is ,

HOW I SERVE MY MODEL LIKE .PKL IN OTHER PC AND PREDICT SINGLE RECORD

TypeError: 'generator' object is not subscriptable

When running on a python 3.6 environment in a jupyter notebook, ubuntu 14.04 I get the following:

'
from mlbox.preprocessing import *
from mlbox.optimisation import *
from mlbox.prediction import *

paths = ["train.csv", "test.csv"]
target_name = "target"

data = Reader(sep=",").train_test_split(paths, target_name) #reading

space = {

    'ne__numerical_strategy' : {"space" : [0, 'mean']},

    'ce__strategy' : {"space" : ["label_encoding", "random_projection", "entity_embedding"]},

    'fs__strategy' : {"space" : ["variance", "rf_feature_importance"]},
    'fs__threshold': {"search" : "choice", "space" : [0.1, 0.2, 0.3]},

    'est__strategy' : {"space" : ["XGBoost"]},
    'est__max_depth' : {"search" : "choice", "space" : [5,6]},
    'est__subsample' : {"search" : "uniform", "space" : [0.6,0.9]}

    }

opt = Optimiser(scoring = 'roc_auc', n_folds = 4)

best = opt.optimise(space, data, max_evals = 5)

`TypeError Traceback (most recent call last)
in ()
16 opt = Optimiser(scoring = 'roc_auc', n_folds = 4)
17
---> 18 best = opt.optimise(space, data, max_evals = 5)
19

~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/mlbox/optimisation/optimiser.py in optimise(self, space, df, max_evals)
565 space=hyper_space,
566 algo=tpe.suggest,
--> 567 max_evals=max_evals)
568
569 # Displaying best_params

~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin)
312
313 domain = base.Domain(fn, space,
--> 314 pass_expr_memo_ctrl=pass_expr_memo_ctrl)
315
316 rval = FMinIter(algo, domain, trials, max_evals=max_evals,

~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/base.py in init(self, fn, expr, workdir, pass_expr_memo_ctrl, name, loss_target)
784 before = pyll.dfs(self.expr)
785 # -- raises exception if expr contains cycles
--> 786 pyll.toposort(self.expr)
787 vh = self.vh = VectorizeHelper(self.expr, self.s_new_ids)
788 # -- raises exception if v_expr contains cycles

~/anaconda2/envs/insurance_v2/lib/python3.6/site-packages/hyperopt/pyll/base.py in toposort(expr)
713 G.add_edges_from([(n_in, node) for n_in in node.inputs()])
714 order = nx.topological_sort(G)
--> 715 assert order[-1] == expr
716 return order
717 `

How to add custom algo

How we can add custom algo as below:

   algo.git(X,y)
    algo.predict(X,y)

to MLBox algo ?
Am more interested in feature engineering process.

Get Entity Embeddings

Hi, neat package, just getting my teeth into it.

One thing that stands out is that I cannot extract the entity embeddings. They seem to work really well so naturally, I want to plot them, explore them, tweak them etc... Is it possible to do this? Many thanks!

Memory error when reading in Data

Data results in MemoryError

is there a way to process a DataFrame?? The csv file I'm using is about 9GB, unless there is a way to parse a segment of the data using Reader method in MLBox??

Testing with Predicting Blood Donation challenge

Hi,
Doing some tests with this challenge
https://www.drivendata.org/competitions/2/warm-up-predict-blood-donations/

With minimal understanding I rank around 700 on 2400 !
I must document some questions on
how to get features importance
how to set up stacking

Rgds
Bruno Seznec

Setup script exited with usage: setup.py [global_opts] error: no commands supplied

Hi AxeldeRomblay,

System Information
Ubuntu 16.04

I was actually trying to install your MLBox to give it a try. The steps I took were -
1- clone the repository.
2- run setup.py file.

The building part finishes till 100% but then throws a couple of errors. I am posting the actual error below.

[100%] Linking CXX shared library /tmp/easy_install-1gz4zwpu/lightgbm-2.0.2/lightgbm/lib_lightgbm.so
[100%] Built target _lightgbm
Install lib_lightgbm from: ['lightgbm/lib_lightgbm.so']
error: Setup script exited with usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

error: no commands supplied.

Drift computation error

Hi Axel,
On a fresh install , I have this error at he Drift computation
any idea ?
with the classification example.

Thanks

Bruno
OS Ubuntu 14.04

computing drifts ...

IndexError Traceback (most recent call last)
in ()
1 dft = Drift_thresholder()
----> 2 df = dft.fit_transform(df)

/home/bruno/anaconda3/lib/python3.6/site-packages/mlbox/preprocessing/drift_thresholder.py in fit_transform(self, df)
108 print("computing drifts ...")
109
--> 110 ds.fit(pp.transform(df['train']), pp.transform(df['test']))
111
112 if (self.verbose):

/home/bruno/anaconda3/lib/python3.6/site-packages/mlbox/preprocessing/drift/drift_threshold.py in fit(self, df_train, df_test)
163 self.stratify,
164 self.random_state)
--> 165 for col in df_train.columns)
166
167 for i, col in enumerate(df_train.columns):

/home/bruno/anaconda3/lib/python3.6/site-packages/joblib/parallel.py in call(self, iterable)
777 # was dispatched. In particular this covers the edge
778 # case of Parallel used with an exhausted iterator.
--> 779 while self.dispatch_one_batch(iterator):
780 self._iterating = True
781 else:

/home/bruno/anaconda3/lib/python3.6/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
618
619 with self._lock:
--> 620 tasks = BatchedCalls(itertools.islice(iterator, batch_size))
621 if len(tasks) == 0:
622 # No more tasks available in the iterator: tell caller to stop.

/home/bruno/anaconda3/lib/python3.6/site-packages/joblib/parallel.py in init(self, iterator_slice)
125
126 def init(self, iterator_slice):
--> 127 self.items = list(iterator_slice)
128 self._size = len(self.items)
129

/home/bruno/anaconda3/lib/python3.6/site-packages/mlbox/preprocessing/drift/drift_threshold.py in (.0)
163 self.stratify,
164 self.random_state)
--> 165 for col in df_train.columns)
166
167 for i, col in enumerate(df_train.columns):

/home/bruno/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in getitem(self, key)
1284 return self._getitem_tuple(key)
1285 else:
-> 1286 return self._getitem_axis(key, axis=0)
1287
1288 def _getitem_axis(self, key, axis=0):

/home/bruno/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1573 self._is_valid_integer(key, axis)
1574
-> 1575 return self._get_loc(key, axis=axis)
1576
1577 def _convert_to_indexer(self, obj, axis=0, is_setter=False):

/home/bruno/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _get_loc(self, key, axis)
94
95 def _get_loc(self, key, axis=0):
---> 96 return self.obj._ixs(key, axis=axis)
97
98 def _slice(self, obj, axis=0, kind=None):

/home/bruno/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _ixs(self, i, axis)
1904 return self[i]
1905 else:
-> 1906 label = self.index[i]
1907 if isinstance(label, Index):
1908 # a location index by definition

/home/bruno/anaconda3/lib/python3.6/site-packages/pandas/indexes/base.py in getitem(self, key)
1265
1266 key = _values_from_object(key)
-> 1267 result = getitem(key)
1268 if not lib.isscalar(result):
1269 return promote(result)

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

TypeError: 'generator' object has no attribute 'getitem'

reproducing the code directly from Kaggle on the Porto Insurance dataset:

from mlbox.preprocessing import *
from mlbox.optimisation import *
from mlbox.prediction import *


# In[ ]:

paths = ["data/train.csv", "data/test.csv"]
target_name = "target"


# # Now let MLBox do the job !

# ## ... to read and clean all the files

# In[ ]:

rd = Reader(sep = ",")
df = rd.train_test_split(paths, target_name)   #reading and preprocessing (dates, ...)


# In[ ]:

dft = Drift_thresholder()
df = dft.fit_transform(df)   #removing non-stable features (like ID,...)


# ## ... to tune all the hyper-parameters

# In[ ]:

def gini(actual, pred, cmpcol = 0, sortcol = 1):
    assert( len(actual) == len(pred) )
    all = np.asarray(np.c_[ actual, pred, np.arange(len(actual)) ], dtype=np.float)
    all = all[ np.lexsort((all[:,2], -1*all[:,1])) ]
    totalLosses = all[:,0].sum()
    giniSum = all[:,0].cumsum().sum() / totalLosses

    giniSum -= (len(actual) + 1) / 2.
    return giniSum / len(actual)

def gini_normalized(a, p):
    return np.abs(gini(a, p) / gini(a, a))


opt = Optimiser(scoring = make_scorer(gini_normalized, greater_is_better=True, needs_proba=True), n_folds=2)


# In[ ]:

space = {

        'est__strategy':{"search":"choice",
                                  "space":["LightGBM"]},
        'est__n_estimators':{"search":"choice",
                                  "space":[700]},
        'est__colsample_bytree':{"search":"uniform",
                                  "space":[0.77,0.82]},
        'est__subsample':{"search":"uniform",
                                  "space":[0.73,0.8]},
        'est__max_depth':{"search":"choice",
                                  "space":[5,6,7]},
        'est__learning_rate':{"search":"uniform",
                                  "space":[0.008, 0.02]}

        }

params = opt.optimise(space, df, 7)

it returns

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-801cc68c2cb6> in <module>()
     16         }
     17
---> 18 params = opt.optimise(space, df, 7)

/usr/local/lib/python2.7/site-packages/mlbox/optimisation/optimiser.pyc in optimise(self, space, df, max_evals)
    564                                    space=hyper_space,
    565                                    algo=tpe.suggest,
--> 566                                    max_evals=max_evals)
    567
    568                 # Displaying best_params

/usr/local/lib/python2.7/site-packages/hyperopt/fmin.pyc in fmin(fn, space, algo, max_evals, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin)
    312
    313     domain = base.Domain(fn, space,
--> 314                          pass_expr_memo_ctrl=pass_expr_memo_ctrl)
    315
    316     rval = FMinIter(algo, domain, trials, max_evals=max_evals,

/usr/local/lib/python2.7/site-packages/hyperopt/base.pyc in __init__(self, fn, expr, workdir, pass_expr_memo_ctrl, name, loss_target)
    784         before = pyll.dfs(self.expr)
    785         # -- raises exception if expr contains cycles
--> 786         pyll.toposort(self.expr)
    787         vh = self.vh = VectorizeHelper(self.expr, self.s_new_ids)
    788         # -- raises exception if v_expr contains cycles

/usr/local/lib/python2.7/site-packages/hyperopt/pyll/base.pyc in toposort(expr)
    713         G.add_edges_from([(n_in, node) for n_in in node.inputs()])
    714     order = nx.topological_sort(G)
--> 715     assert order[-1] == expr
    716     return order
    717

TypeError: 'generator' object has no attribute '__getitem__'

Maybe related with the hyperopt version?

Nice library though 👍

Error while Running:df = rd.train_test_split(paths, target_name)

I tried to run df = rd.train_test_split(paths, target_name) in Google colab and the error is
cannot import name '_values_from_object'.

Can anyone help me out?

Comparison of AutoML approaches

Hello, we are analyzing AutoML libraries(such as Auto-sklearn, Auto-ml, TPOT and MLBox) and try to determine pros and cons of each of them. Could you clarify the distinctive feature of your MLBox from others? In particular, we are interesting in problem of regression (time series with additional features)

Cleaning takes too long time on multi-cores cpu

Cleaning takes 276s for house price dataset on intel E5-2683v3
As E5-2683 has more 14cores and 28threads.
I guess the problem may cause by n-job=-1 in here.
` if (self.verbose):
print("cleaning data ...")

    df = pd.concat(Parallel(n_jobs=-1)(delayed(convert_list)(df[col]) for col in df.columns),
                   axis=1)

    df = pd.concat(Parallel(n_jobs=-1)(delayed(convert_float_and_dates)(df[col]) for col in df.columns), axis=1) `

I don't know how to fix it, may be add a n_jobs arguments for class Reader?
Looking for you response. Thank you.

Drift meaning

in the documentation,it says that a drift coefficient of 0.5 is very stable and a coefficient of 1 is not, so when we have two values, let's say 0.5 and 0.57 which one is the most stable? according to the documentation is 0.5, but I when I ran it on my dataset I got the one with 0.57 as the top coefficient.

Please if someone can explain me the concept and how I should interpret the results.

Thanks

LightGBM Warning when strategy is Linear

I'm getting error saying the parameter is unknown for a LightGBM classifier when I chose est__strategy as Linear. However the log shows that LogisticRegression was picked for HP search. I'm confused from the log. What am I missing?
Log

##################################################### testing hyper-parameters... #####################################################

>>> NA ENCODER :{'numerical_strategy': 'mean', 'categorical_strategy': 'most_frequent'}

>>> CA ENCODER :{'strategy': 'entity_embedding'}

>>> FEATURE SELECTOR :{'strategy': 'l1', 'threshold': 0.25183075829772344}

>>> ESTIMATOR :{'strategy': 'Linear', 'C': 1.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'intercept_scaling': 1, 'max_iter': 100, 'multi_class': 'ovr', 'n_jobs': -1, 'penalty': 'l2', 'random_state': 0, 'solver': 'liblinear', 'tol': 0.0001, 'verbose': 0, 'warm_start': False}

/opt/conda/lib/python3.6/site-packages/mlbox/model/classification/classifier.py:92: UserWarning: Invalid parameter for classifier LightGBM. Parameter IGNORED. Check the list of available parameters with `classifier.get_params().keys()`
  + ". Parameter IGNORED. Check the list of "
/opt/conda/lib/python3.6/site-packages/mlbox/model/classification/classifier.py:92: UserWarning: Invalid parameter for classifier LightGBM. Parameter IGNORED. Check the list of available parameters with `classifier.get_params().keys()`
  + ". Parameter IGNORED. Check the list of "
/opt/conda/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:1228: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = -1.
  " = {}.".format(self.n_jobs))
/opt/conda/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:1228: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = -1.
  " = {}.".format(self.n_jobs))
/opt/conda/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:1228: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = -1.
  " = {}.".format(self.n_jobs))

Code

space = {    
    'ne__numerical_strategy': {
        "search": "choice", 
        "space": ['mean']
    },
    'ne__categorical_strategy': {
        "search": "choice", 
        "space": ['most_frequent']
    },
    
    'ce__strategy': {
        "search": "choice",
        "space": ["label_encoding", "random_projection", "entity_embedding"]
    }, 
    
    # https://mlbox.readthedocs.io/en/latest/features.html#feature-selection
    'fs__strategy': {
        "search": "choice",
        "space": ['l1']
    },
    'fs__threshold':{
        "search":"uniform",
        "space":[0.01, 0.3]
    },  
    
    # https://mlbox.readthedocs.io/en/latest/features.html#id1
    'est__strategy': {
        "search": "choice",
        "space": ["Linear"]
    },
    
    # Params: https://github.com/AxeldeRomblay/MLBox/blob/master/mlbox/model/classification/classifier.py#L103
    # https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
    'est__penalty': {
        "search": "choice",
        "space": ["l2"]
    },
    'est__max_iter': {
        "search": "choice",
        "space": [100, 200, 300, 400, 500, 800]
    }
}

max_trials_from_space = 60 # max number of times samples from above space are taken
best_lin = opt.optimise(space, data, max_trials_from_space)

nice idea but ...

IOError: You have attempted to connect to an IPython Cluster but no Controller could be found

Hi. I tried the 'Getting started: 30 seconds to MLBox', but I have this error when read and preprocess the files:

data = Reader(sep=",").train_test_split(paths, target_name)  #reading
data = Drift_thresholder().fit_transform(data)  #deleting non-stable variables

IOError: You have attempted to connect to an IPython Cluster but no Controller could be found.
Please double-check your configuration and ensure that a cluster is running.

Before successfully started a cluster with:

ipcluster start

Also I checked at the same notebook this without error:

import os
import ipyparallel as ipp
rc = ipp.Client()
ar = rc[:].apply_async(os.getpid)
pid_map = ar.get_dict()
pid_map

Could you help me? I really want to try your MLBox 😄

Thanks.

Feature Request: Leave one out encoding

Hi,

I have seen Owen Zhang( Kaggle #1) explaining how Leave one out encoding is useful for categorical encoding. Could you please review the usefulness of this technique and add it to the library?

code sample

some questions about multi-class

Hi,
It's so wonderful tools.However,I wondered how it distinct the classification from regression ,by train label?When I do multi-class task,the model seemed to transform all data into float64(my train label's type is int64),so it do regression task.it's not correct .So should I set some params in model?
thank you!

Installed mlbox but stuck at importing

Hi Team,

I was excited in trying out mlbox but its stuck at importing.

So I have managed to installed the package, but its just stuck at importing.

This line: from mlbox.preprocessing import *

I wasnt able to get the background process error because it didnt come up in my command line.

I was wondering if you guys aware of this?

Best,

Sugi

how to set KERAS_BACKEND ?

I have warning / errors with Theano backend.
So I want to switch to tensorflow.
I have a conda virtualenv , in general I set the keras.json file but here ?

Trying
os.environ["KERAS_BACKEND"] = "tensorflow"
from mlbox.preprocessing import *
from mlbox.optimisation import *
from mlbox.prediction import *
seems not worked

Thanks
Bruno

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	requirements = [
	"numpy==1.13.0",
	"matplotlib==2.0.2",
	"hyperopt==0.1",
	"Keras==2.0.4",
	"pandas==0.20.3",
	"joblib==0.11",
	"scikit-learn==0.19.0",
	"Theano==0.9.0",
	"xgboost==0.6a2",
	"lightgbm==2.0.2",
	"networkx==1.11"
	]