alex-lekov / automl_alex Goto Github PK

State-of-the art Automated Machine Learning python library for Tabular Data

License: MIT License

Python 99.69% Dockerfile 0.31%

automl auto-ml optimisation ml cross-validation xgboost stacking stacking-ensemble hyperparameter-optimization hyperparameter-tuning sklearn python machine-learning machine-learning-library machine-learning-models data-science data-science-projects model-selection automatic-machine-learning

automl_alex's Introduction

automl_alex's People

Contributors

Stargazers

Watchers

automl_alex's Issues

How can i solve "Columns must be same length as key" error?

What i tried :

de = DataPrepare(
                num_generator_features=True, # Generator interaction Num Features
                # operations_num_generator=['/','*','-',],
                )
clean_X_train = de.fit_transform(train_X_all)

de = DataPrepare(clean_and_encod_data=True,
                # cat_encoder_names=['HelmertEncoder','OneHotEncoder'], # Encoders list for Generator cat encodet features
                clean_nan=True, # fillnan
                clean_outliers=True, # method='IQR', threshold=2,
                drop_invariant=True, # drop invariant features (data.nunique < 2)
                num_generator_features=True, # Generator interaction Num Features
                num_denoising_autoencoder=True, # denoising_autoencoder if num features > 2
                normalization=True, # normalization data (StandardScaler)
                cat_features=None, # DataPrepare can auto detect categorical features
                random_state=42,
                verbose=3)

clean_X_train = de.fit_transform(train_X_all)

but getting error :

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _setitem_array(self, key, value)
   3065             if isinstance(value, DataFrame):
   3066                 if len(value.columns) != len(key):
-> 3067                     raise ValueError("Columns must be same length as key")
   3068                 for k1, k2 in zip(key, value.columns):
   3069                     self[k1] = value[k2]

ValueError: Columns must be same length as key

Running Env : Google Colab

I cant understand why i'm getting "ValueError: Columns must be same length as key"

Is there anything to fix in my code or data?

I'm attaching my train data.

2021_04_22_orgianl_agg_heEncoded_looEncoded_df.zip

Thank you. I'm using well

TypeError("unsupported operand type(s) for +: 'int' and 'str'")

Is it intended that this error happens ?

  File "/home/../python3.7/site-packages/automl_alex/data_prepare.py", line 1050, in fit_transform
    data = self._clean_outliers_enc.transform(data)
  File "/home/../python3.7/site-packages/automl_alex/data_prepare.py", line 693, in transform
    feature_name = weight_values + "_Is_Outliers_" + self.method
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Want to use Databunch in Featurewiz

Hi Alex:
Thanks for your excellent AutoML repo! I was looking at your project and found that your DataBunch module was very quickly able to create hundreds of additional features on train and test. I am likely to re-use that module in the new "featurewiz" library I have created. I would be citing your MIT license and credit your Guithub (of course). Just FYI.

Please take a look at the featurewiz library here:
https://github.com/AutoViML/featurewiz

Once again, thank you for sharing your knowledge with everyone.
Ram

Error during training

I'm trying to use automl_alex to make a baseline for this task, but this triggered an error.

my code is:

import pandas as pd
from automl_alex import AutoMLRegressor

X_train = df[df.columns.difference(['y'], sort=False)]
y_train = df.y
X_test = pd.read_csv('./data/test.csv', index_col='ID')

model = AutoMLRegressor(X_train, y_train, X_test, cat_features=X_train.columns, verbose=1)

%%time
predict_test, predict_train = model.fit_predict(verbose=2)

Step 1: Model 0

100%|██████████| 1/1 [00:14<00:00, 14.74s/it]

Model 1
One iteration takes ~ 4.5 sec

Start Auto calibration parameters
[I 2020-11-05 11:35:57,966] A new study created in memory with name: no-name-ecd42a72-44ac-442e-9b71-d621e2a383ea
Start optimization with the parameters:
CV_Folds = 5
Score_CV_Folds = 2
Feature_Selection = True
Opt_lvl = 2
Cold_start = 44.0
Early_stoping = 100
Metric = mean_squared_error
Direction = minimize
##################################################
Default model OptScore = 109.2303
Optimize: : 55it [20:47, 22.68s/it, | Model: ExtraTrees | OptScore: 105.3311 | Best mean_squared_error: 88.854 +- 16.477105]

Predict from Models_1
100%|██████████| 3/3 [01:27<00:00, 29.22s/it]
0%| | 0/1 [00:00<?, ?it/s]

Calc predict policy on Models_1:
| posible_repeats: 0 | stack_top: 1 | n_repeats: 1
100%|██████████| 1/1 [02:28<00:00, 148.44s/it]

Mean Score mean_squared_error on 5 Folds: 76.0092 std: 15.236319

Models_1 Mean mean_squared_error Score Train: 76.0097

Model 2

One iteration takes ~ 10.6 sec

Start Auto calibration parameters
Start optimization with the parameters:
CV_Folds = 5
Score_CV_Folds = 1
Feature_Selection = True
Opt_lvl = 1
Cold_start = 10
Early_stoping = 50
Metric = mean_squared_error
Direction = minimize
##################################################
Default model OptScore = 75.9577
Optimize: : 16it [02:44, 24.17s/it, | Model: MLP | OptScore: 109.1348 | Best mean_squared_error: 109.1348 ]

stack trace is:
/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/extmath.py:153: RuntimeWarning: overflow encountered in matmul
ret = a @ b
/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/extmath.py:153: RuntimeWarning: invalid value encountered in matmul
ret = a @ b
Trial 16 failed because of the following error: ValueError("Input contains NaN, infinity or a value too large for dtype('float64').")
Traceback (most recent call last):
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/optuna/study.py", line 799, in _run_trial
result = func(trial)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/base.py", line 420, in objective
**data_kwargs,
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/base.py", line 762, in cross_val_score
res = self.cross_val(predict=False,**kwargs)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/base.py", line 700, in cross_val
y_test=val_y.reset_index(drop=True),
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/automl_alex/models/sklearn_models.py", line 88, in _fit
model.model.fit(X_train, y_train,)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 641, in fit
return self._fit(X, y, incremental=False)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 371, in _fit
intercept_grads, layer_units, incremental)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 554, in _fit_stochastic
self._update_no_improvement_count(early_stopping, X_val, y_val)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/neural_network/_multilayer_perceptron.py", line 597, in update_no_improvement_count
self.validation_scores.append(self.score(X_val, y_val))
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/base.py", line 552, in score
return r2_score(y, y_pred, sample_weight=sample_weight)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/metrics/_regression.py", line 589, in r2_score
y_true, y_pred, multioutput)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/metrics/_regression.py", line 86, in _check_reg_targets
y_pred = check_array(y_pred, ensure_2d=False, dtype=dtype)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 645, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "/home/user/Projects/p0/venv/lib/python3.7/site-packages/sklearn/utils/validation.py", line 99, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

python: 3.7.4
ubuntu: Ubuntu 18.04.5 LTS
packages installed:
alembic==1.4.3 argon2-cffi==20.1.0 async-generator==1.10 attrs==20.2.0 automl-alex==0.10.7 backcall==0.2.0 bleach==3.2.1 catboost==0.24.2 category-encoders==2.2.2 certifi==2020.6.20 cffi==1.14.3 chardet==3.0.4 cliff==3.4.0 cmaes==0.7.0 cmd2==1.3.11 colorama==0.4.4 colorlog==4.4.0 cycler==0.10.0 decorator==4.4.2 defusedxml==0.6.0 entrypoints==0.3 graphviz==0.14.2 idna==2.10 importlib-metadata==2.0.0 ipykernel==5.3.4 ipython==7.19.0 ipython-genutils==0.2.0 jedi==0.17.2 Jinja2==2.11.2 joblib==0.17.0 json5==0.9.5 jsonschema==3.2.0 jupyter-client==6.1.7 jupyter-core==4.6.3 jupyterlab==2.2.9 jupyterlab-pygments==0.1.2 jupyterlab-server==1.2.0 kiwisolver==1.3.1 lightgbm==3.0.0 Mako==1.1.3 MarkupSafe==1.1.1 matplotlib==3.3.2 mistune==0.8.4 nbclient==0.5.1 nbconvert==6.0.7 nbformat==5.0.8 nest-asyncio==1.4.2 notebook==6.1.4 numpy==1.19.4 optuna==2.2.0 packaging==20.4 pandas==1.1.4 pandocfilters==1.4.3 parso==0.7.1 patsy==0.5.1 pbr==5.5.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==8.0.1 plotly==4.12.0 prettytable==0.7.2 prometheus-client==0.8.0 prompt-toolkit==3.0.8 ptyprocess==0.6.0 pycparser==2.20 Pygments==2.7.2 pyparsing==2.4.7 pyperclip==1.8.1 pyrsistent==0.17.3 python-dateutil==2.8.1 python-editor==1.0.4 pytz==2020.4 PyYAML==5.3.1 pyzmq==19.0.2 requests==2.24.0 retrying==1.3.3 scikit-learn==0.23.2 scipy==1.5.3 seaborn==0.11.0 Send2Trash==1.5.0 six==1.15.0 SQLAlchemy==1.3.20 statsmodels==0.12.1 stevedore==3.2.2 terminado==0.9.1 testpath==0.4.4 threadpoolctl==2.1.0 tornado==6.1 tqdm==4.51.0 traitlets==5.0.5 urllib3==1.25.11 wcwidth==0.2.5 webencodings==0.5.1 xgboost==1.2.1 zipp==3.4.0

ValueError: `X` and `y` both have indexes, but they do not match.

Hi Alex,

Amazing AutoML contribution. Was interested in trying your system but got an error. Even on your colab examples.

Error: ValueError: X and y both have indexes, but they do not match.

Step to reproduce: Simply run the following colab --> https://colab.research.google.com/github/Alex-Lekov/AutoML_Alex/blob/master/examples/01_Quick_Start.ipynb#scrollTo=YSBt7jh3fB34

Screenshot of the error:

does it work on spectral data?

Hello,

I'm wondering if you have tested it on spectral data? I have NIR Spectroscopy data with 125 variables and using presently TPOT, works very well but looking at your benchmark it seems your approach works better for the kind of data you have developed it for. hence do you recommend testing it on my spectral data or it's not the purpose of it?

thanks

alex-lekov / automl_alex Goto Github PK

automl_alex's Introduction

AutoML Alex

Works with Tasks:

Benchmark Results

Scheme

Features

Installation

Docs

🚀 Examples

What's inside

Works with Features

Note

Realtime Dashboard

Road Map

Contact

automl_alex's People

Contributors

Stargazers

Watchers

Forkers

automl_alex's Issues

100%|██████████| 1/1 [00:14<00:00, 14.74s/it]

Models_1 Mean mean_squared_error Score Train: 76.0097

Recommend Projects

Recommend Topics

Recommend Org