Giter VIP home page Giter VIP logo

sb-ai-lab / lightautoml Goto Github PK

View Code? Open in Web Editor NEW
721.0 13.0 42.0 94.4 MB

Fast and customizable framework for automatic ML model creation (AutoML)

Home Page: https://developers.sber.ru/portal/products/lightautoml

License: Apache License 2.0

Python 97.74% HTML 2.16% Shell 0.10%
automl data-science machine-learning python automated-machine-learning automatic-machine-learning automl-algorithms binary-classification kaggle lama

lightautoml's Introduction

LightAutoML - automatic model creation framework

Telegram PyPI - Downloads Read the Docs Black Poetry-Lock

LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks:

  • binary classification
  • multiclass classification
  • regression

Current version of the package handles datasets that have independent samples in each row. I.e. each row is an object with its specific features and target. Multitable datasets and sequences are a work in progress :)

Note: we use AutoWoE library to automatically create interpretable models.

Authors: Alexander Ryzhkov, Anton Vakhrushev, Dmitry Simakov, Rinchin Damdinov, Vasilii Bunakov, Alexander Kirilin, Pavel Shvets.

Documentation of LightAutoML is available here, you can also generate it.

(New features) GPU and Spark pipelines

Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for:

Table of Contents

Installation

To install LAMA framework on your machine from PyPI, execute following commands:

# Install base functionality:

pip install -U lightautoml

# For partial installation use corresponding option.
# Extra dependecies: [nlp, cv, report]
# Or you can use 'all' to install everything

pip install -U lightautoml[nlp]

Additionaly, run following commands to enable pdf report generation:

# MacOS
brew install cairo pango gdk-pixbuf libffi

# Debian / Ubuntu
sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info

# Fedora
sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2

# Windows
# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows

Back to top

Quick tour

Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML:

  • Use ready preset for tabular data:
import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv')
df_test = pd.read_csv('../input/titanic/test.csv')

automl = TabularAutoML(
    task = Task(
        name = 'binary',
        metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1))
)
oof_pred = automl.fit_predict(
    df_train,
    roles = {'target': 'Survived', 'drop': ['PassengerId']}
)
test_pred = automl.predict(df_test)

pd.DataFrame({
    'PassengerId':df_test.PassengerId,
    'Survived': (test_pred.data[:, 0] > 0.5)*1
}).to_csv('submit.csv', index = False)

LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the resources section.

Back to top

Resources

Kaggle kernel examples of LightAutoML usage:

Google Colab tutorials and other examples:

Note 1: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default

Note 2: to take a look at this report after the run, please comment last line of demo with report deletion command.

Courses, videos and papers

Back to top

Contributing to LightAutoML

If you are interested in contributing to LightAutoML, please read the Contributing Guide to get started.

Back to top

License

This project is licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Back to top

For developers

Build your own custom pipeline:

import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv')
df_test = pd.read_csv('../input/titanic/test.csv')

# define that machine learning problem is binary classification
task = Task("binary")

reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE)

# create a feature selector
model0 = BoostLGBM(
    default_params={'learning_rate': 0.05, 'num_leaves': 64,
    'seed': 42, 'num_threads': N_THREADS}
)
pipe0 = LGBSimpleFeatures()
mbie = ModelBasedImportanceEstimator()
selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0)

# build first level pipeline for AutoML
pipe = LGBSimpleFeatures()
# stop after 20 iterations or after 30 seconds
params_tuner1 = OptunaTuner(n_trials=20, timeout=30)
model1 = BoostLGBM(
    default_params={'learning_rate': 0.05, 'num_leaves': 128,
    'seed': 1, 'num_threads': N_THREADS}
)
model2 = BoostLGBM(
    default_params={'learning_rate': 0.025, 'num_leaves': 64,
    'seed': 2, 'num_threads': N_THREADS}
)
pipeline_lvl1 = MLPipeline([
    (model1, params_tuner1),
    model2
], pre_selection=selector, features_pipeline=pipe, post_selection=None)

# build second level pipeline for AutoML
pipe1 = LGBSimpleFeatures()
model = BoostLGBM(
    default_params={'learning_rate': 0.05, 'num_leaves': 64,
    'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS},
    freeze_defaults=True
)
pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1,
 post_selection=None)

# build AutoML pipeline
automl = AutoML(reader, [
    [pipeline_lvl1],
    [pipeline_lvl2],
], skip_conn=False)

# train AutoML and get predictions
oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']})
test_pred = automl.predict(df_test)

pd.DataFrame({
    'PassengerId':df_test.PassengerId,
    'Survived': (test_pred.data[:, 0] > 0.5)*1
}).to_csv('submit.csv', index = False)

Back to top

Support and feature requests

Seek prompt advice at Telegram group.

Open bug reports and feature requests on GitHub issues.

lightautoml's People

Contributors

acc-to-learn avatar alexmryzhkov avatar belonovskii avatar cybsloth avatar d1mk4real avatar desimakov avatar dev-rinchin avatar elineii avatar ezzbreezn avatar mikhailkuz avatar susie-ku avatar tikhomirovd avatar vabun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lightautoml's Issues

Permutation importance calculations of multilevel models

๐Ÿ› Bug

Problem

Functions calc_one_feat_imp and calc_feats_permutation_imps in lightautoml/automl/presets/utils.py are unable to work with multilevel models.

To Reproduce

Fit a TabularAutoML with multi class Task and call get_feature_scores('accurate', df)

Traceback

KeyError Traceback (most recent call last)
Cell In[63], line 1
----> 1 accurate_fi = automl.get_feature_scores('accurate', test_data, silent=True)
2 accurate_fi.set_index('Feature')['Importance'].plot.bar(figsize = (30, 10), grid = True)

File ~/LightAutoML/lightautoml/automl/presets/tabular_presets.py:837, in TabularAutoML.get_feature_scores(self, calc_method, data, features_names, silent)
835 data, _ = read_data(data, features_names, self.cpu_limit, read_csv_params)
836 used_feats = self.collect_used_feats()
--> 837 fi = calc_feats_permutation_imps(
838 self,
839 used_feats,
840 data,
841 self.reader.target,
842 self.task.get_dataset_metric(),
843 silent=silent,
844 )
845 return fi

File ~/LightAutoML/lightautoml/automl/presets/utils.py:38, in calc_feats_permutation_imps(model, used_feats, data, target, metric, silent)
35 feat_imp = []
36 for it, f in enumerate(used_feats):
37 feat_imp.append(
---> 38 calc_one_feat_imp(
39 (it + 1, n_used_feats),
40 f,
41 model,
42 data,
43 norm_score,
44 target,
45 metric,
46 silent,
47 )
48 )
49 feat_imp = pd.DataFrame(feat_imp, columns=["Feature", "Importance"])
50 feat_imp = feat_imp.sort_values("Importance", ascending=False).reset_index(drop=True)

File ~/LightAutoML/lightautoml/automl/presets/utils.py:14, in calc_one_feat_imp(iters, feat, model, data, norm_score, target, metric, silent)
13 def calc_one_feat_imp(iters, feat, model, data, norm_score, target, metric, silent):
---> 14 initial_col = data[feat].copy()
15 data[feat] = np.random.permutation(data[feat].values)
17 preds = model.predict(data)

File ~/LAMA_venv3_8/lib/python3.8/site-packages/pandas/core/frame.py:3807, in DataFrame.getitem(self, key)
3805 if self.columns.nlevels > 1:
3806 return self._getitem_multilevel(key)
-> 3807 indexer = self.columns.get_loc(key)
3808 if is_integer(indexer):
3809 indexer = [indexer]

File ~/LAMA_venv3_8/lib/python3.8/site-packages/pandas/core/indexes/base.py:3804, in Index.get_loc(self, key, method, tolerance)
3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
-> 3804 raise KeyError(key) from err
3805 except TypeError:
3806 # If we have a listlike key, _check_indexing_error will raise
3807 # InvalidIndexError. Otherwise we fall through and re-raise
3808 # the TypeError.
3809 self._check_indexing_error(key)

KeyError: 'Lvl_0_Pipe_0_Mod_0_LinearL2_prediction_0'

Add time series config

  • added possibility to change additional parameters for AutoTS class (e.g., algorithms used)
  • this is possible both in the form of providing parameters while the class initialisation, as well as in the form of changing the .yml config file

Add built-in matching for predictions

๐Ÿš€ Feature Request

Add matching function to map AutoML's predictions for multiclass task and call it before predict method return predictions. automl.reader.class_mapping is not convenient.

Optimize python linters

Motivation

For example, black covers imports sorting, so we don't (may be) need isort
darglint is slow

Proposal

Remove redundant python linters

RandomForestRegressor bug

๐Ÿ› Bug

In the recent versions of sklearn ( >= 1.0.0) the criterion parameter of RandomForestRegressor has changed

python3.10/site-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'criterion' parameter of RandomForestRegressor must be a str among {'poisson', 'friedman_mse', 'squared_error', 'absolute_error'}. Got 'mse' instead.

Python 3.10 TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object' from log_calls dependency

๐Ÿ› Bug

Error from log_calls package in Python 3.10+

To Reproduce

Install python 3.10, install lightautoml and run:

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task
automl = TabularAutoML(task = Task(name = 'reg',metric = 'mse'))

You will got the following:

Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Python310\lib\site-packages\lightautoml\__init__.py", line 16, in <module>
    from .addons import *
  File "C:\Python310\lib\site-packages\lightautoml\addons\utilization\__init__.py", line 2, in <module>
    from .utilization import TimeUtilization
  File "C:\Python310\lib\site-packages\lightautoml\addons\utilization\utilization.py", line 8, in <module>
    from ...automl.base import AutoML
  File "C:\Python310\lib\site-packages\lightautoml\automl\base.py", line 8, in <module>
    from .blend import Blender, BestModelSelector
  File "C:\Python310\lib\site-packages\lightautoml\automl\blend.py", line 9, in <module>
    from ..dataset.base import LAMLDataset
  File "C:\Python310\lib\site-packages\lightautoml\dataset\base.py", line 8, in <module>
    from .roles import ColumnRole
  File "C:\Python310\lib\site-packages\lightautoml\dataset\roles.py", line 15, in <module>
    class ColumnRole:
  File "C:\Python310\lib\site-packages\log_calls\log_calls.py", line 1691, in __call__
    self._class__call__(klass)      # modifies klass (methods & inner classes) (if not builtin)
  File "C:\Python310\lib\site-packages\log_calls\log_calls.py", line 1482, in _class__call__
    new_class = self.__class__(
  File "C:\Python310\lib\site-packages\log_calls\log_calls.py", line 1692, in __call__
    self._add_class_attrs(klass)    # v0.3.0v20 traps TypeError for builtins
  File "C:\Python310\lib\site-packages\log_calls\log_calls.py", line 2138, in _add_class_attrs
    setattr(
TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'
python-BaseException

Expected behavior

No exception

Additional context

Tested on Python 3.10.0 on Windows

Checklist

  • [X ] bug description
  • [X ] steps to reproduce
  • [X ] expected behavior
  • [X ] code sample / screenshots

Reporting a vulnerability

Hello!

I hope you are doing well!

We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.

Can you enable it, so that we can report it?

Thanks in advance!

PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository

Unable to Process TabularCVAutoML.fit_predict()

๐Ÿ› Bug

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\automl\presets\tabular_presets.py:549, in TabularAutoML.fit_predict(self, train_data, roles, train_features, cv_iter, valid_data, valid_features, log_file, verbose)
    546 if valid_data is not None:
    547     data, _ = read_data(valid_data, valid_features, self.cpu_limit, self.read_csv_params)
--> 549 oof_pred = super().fit_predict(train, roles=roles, cv_iter=cv_iter, valid_data=valid_data, verbose=verbose)
    551 return cast(NumpyDataset, oof_pred)

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\automl\presets\base.py:205, in AutoMLPreset.fit_predict(self, train_data, roles, train_features, cv_iter, valid_data, valid_features, verbose)
    202 logger.info(f"- memory: {self.memory_limit} GB\n")
    204 self.timer.start()
--> 205 result = super().fit_predict(
    206     train_data,
    207     roles,
    208     train_features,
    209     cv_iter,
    210     valid_data,
    211     valid_features,
    212     verbose=verbose,
    213 )
    215 logger.info("\x1b[1mAutoml preset training completed in {:.2f} seconds\x1b[0m\n".format(self.timer.time_spent))
    216 logger.info(f"Model description:\n{self.create_model_str_desc()}\n")

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\automl\base.py:212, in AutoML.fit_predict(self, train_data, roles, train_features, cv_iter, valid_data, valid_features, verbose)
    206 logger.info(
    207     f"Layer \x1b[1m{leven_number}\x1b[0m train process start. Time left {self.timer.time_left:.2f} secs"
    208 )
    210 for k, ml_pipe in enumerate(level):
--> 212     pipe_pred = ml_pipe.fit_predict(train_valid)
    213     level_predictions.append(pipe_pred)
    214     pipes.append(ml_pipe)

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\pipelines\ml\base.py:120, in MLPipeline.fit_predict(self, train_valid)
    117 train_valid = train_valid.apply_selector(self.pre_selection)
    119 # apply features pipeline
--> 120 train_valid = train_valid.apply_feature_pipeline(self.features_pipeline)
    122 # train and apply post selection
    123 train_valid = train_valid.apply_selector(self.post_selection)

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\validation\base.py:79, in TrainValidIterator.apply_feature_pipeline(self, features_pipeline)
     69 """Apply features pipeline on train data.
     70 
     71 Args:
   (...)
     76 
     77 """
     78 train_valid = copy(self)
---> 79 train_valid.train = features_pipeline.fit_transform(train_valid.train)
     80 return train_valid

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\pipelines\features\base.py:117, in FeaturesPipeline.fit_transform(self, train)
    115 # TODO: Think about input/output features attributes
    116 self._input_features = train.features
--> 117 self._pipeline = self._merge_seq(train) if self.sequential else self._merge(train)
    119 return self._pipeline.fit_transform(train)

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\pipelines\features\base.py:162, in FeaturesPipeline._merge(self, data)
    160 pipes = []
    161 for pipe in self.pipes:
--> 162     pipes.append(pipe(data))
    164 return UnionTransformer(pipes) if len(pipes) > 1 else pipes[-1]

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\pipelines\features\image_pipeline.py:103, in ImageAutoFeatures.create_pipeline(self, train)
     98 imgs = get_columns_by_role(train, "Path")
     99 if len(imgs) > 0:
    100     imgs_processing = SequentialTransformer(
    101         [
    102             ColumnsSelector(keys=imgs),
--> 103             AutoCVWrap(
    104                 self.embed_model,
    105                 self.weights_path,
    106                 self.cache_dir,
    107                 self.subs,
    108                 self.device,
    109                 self.n_jobs,
    110                 self.random_state,
    111                 self.is_advprop,
    112                 self.batch_size,
    113                 self.verbose,
    114             ),
    115             SequentialTransformer([FillInf(), FillnaMedian(), StandardScaler()]),
    116         ]
    117     )
    118     transformers_list.append(imgs_processing)
    120 union_all = UnionTransformer(transformers_list)

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\transformers\image.py:197, in AutoCVWrap.__init__(self, model, weights_path, cache_dir, subs, device, n_jobs, random_state, is_advprop, batch_size, verbose)
    194 self.dicts = {}
    195 self.cache_dir = cache_dir
--> 197 self.transformer = DeepImageEmbedder(
    198     device,
    199     n_jobs,
    200     random_state,
    201     is_advprop,
    202     model,
    203     weights_path,
    204     batch_size,
    205     verbose,
    206 )
    207 self._emb_name = "DI_" + single_text_hash(self.embed_model)
    208 self.emb_size = self.transformer.model.feature_shape

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\image\image.py:300, in DeepImageEmbedder.__init__(self, device, n_jobs, random_state, is_advprop, model_name, weights_path, batch_size, verbose)
    297 self.verbose = verbose
    298 seed_everything(random_state)
--> 300 self.model = EffNetImageEmbedder(model_name, weights_path, self.is_advprop, self.device)

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\lightautoml\image\image.py:196, in EffNetImageEmbedder.__init__(self, model_name, weights_path, is_advprop, device)
    193 super(EffNetImageEmbedder, self).__init__()
    194 self.device = device
    195 self.model = (
--> 196     EfficientNet.from_pretrained(
    197         model_name,
    198         weights_path=weights_path,
    199         advprop=is_advprop,
    200         include_top=False,
    201     )
    202     .eval()
    203     .to(self.device)
    204 )
    205 self.feature_shape = self.get_shape()
    206 self.is_advprop = is_advprop

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\efficientnet_pytorch\model.py:378, in EfficientNet.from_pretrained(cls, model_name, weights_path, advprop, in_channels, num_classes, **override_params)
    351 """Create an efficientnet model according to name.
    352 
    353 Args:
   (...)
    375     A pretrained efficientnet model.
    376 """
    377 model = cls.from_name(model_name, num_classes=num_classes, **override_params)
--> 378 load_pretrained_weights(model, model_name, weights_path=weights_path,
    379                         load_fc=(num_classes == 1000), advprop=advprop)
    380 model._change_in_channels(in_channels)
    381 return model

File D:\anaconda3\envs\RecommenderSystems\lib\site-packages\efficientnet_pytorch\utils.py:613, in load_pretrained_weights(model, model_name, weights_path, load_fc, advprop, verbose)
    610     ret = model.load_state_dict(state_dict, strict=False)
    611     assert set(ret.missing_keys) == set(
    612         ['_fc.weight', '_fc.bias']), 'Missing keys when loading pretrained weights: {}'.format(ret.missing_keys)
--> 613 assert not ret.unexpected_keys, 'Missing keys when loading pretrained weights: {}'.format(ret.unexpected_keys)
    615 if verbose:
    616     print('Loaded pretrained weights for {}'.format(model_name))

AssertionError: Missing keys when loading pretrained weights: ['_fc.weight', '_fc.bias']

Expected behavior

I noticed that if class EffNetImageEmbedder in lightautoml.image uses

EfficientNet.from_pretrained(
      model_name,
      weights_path=weights_path,
      advprop=is_advprop,
      include_top=True,
 )

Then I would be able to run the code. Please provide a way to modify include_top.

ะžัˆะธะฑะบะฐ ะฟั€ะธ ะพั‚ั€ะฐะฑะพั‚ะบะต ะบะพะดะฐ

โ“ Questions and Help

ะ”ะพะฑั€ั‹ะน ะดะตะฝัŒ! ะ’ั‡ะตั€ะฐ ั€ะฐะฑะพั‚ะฐะปะพ ะฒัะต ะธัะฟั€ะฐะฒะฝะพ, ัะตะณะพะดะฝั ััƒั‚ั€ะฐ ะฟะพะปัƒั‡ะธะป ะพัˆะธะฑะบัƒ ะฒะพั‚ ั‚ะฐะบะพะณะพ ั€ะพะดะฐ:
AttributeError: module 'lightautoml.reader' has no attribute 'fit_read'
ะŸะพะดัะบะฐะถะธั‚ะต ะฟะพะถะฐะปัƒะนัั‚ะฐ, ะบะฐะบ ั€ะตัˆะธั‚ัŒ ะตะต.

image

automl.fit_predict() raises KeyError: 'index' when used with latest version of pandas==2.1.1

Explanation

I am getting a KeyError when executing the automl.fit_predict() function while following this notebook - https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_1_basics.ipynb

error trace

lightautoml

version used:

Python 3.10.12
lightautoml==0.3.8b1
pandas==2.1.1

References:

Handle import errors of extras modules

๐Ÿš€ Feature Request

LightAutoML should inform a user about additional extras that he needs to install to avoid the import problem. Similar to:

print("Can't generate PDF report: check manual for installing pdf extras.")

warnings.warn("'nltk' - package isn't installed")

Or automatically install missing extras as in following proposal.

Proposal

try:
    from weasyprint import HTML
except ImportError:
    import pip
    pip.main(['install', '--user', 'lightautoml[pdf]'])
    from weasyprint import HTML

UserWarning for CatBoost and TabNet

For multiclass classification from TabularAutoML: UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
Please add No Warning or implementation without sklearn

Bug on sequential data

Wrong logic in conditions
Correct both for numeric and categoric features
cat: https://github.com/sb-ai-lab/LightAutoML/blob/0bcbb8523b499e27eea493eca1d44b247b42e4b0/lightautoml/ml_algo/dl_model.py#L316C22-L316C22
cont:

"cont_embedder_": cont_embedder_by_name.get(params["cont_embedder"], LinearEmbedding)

To reproduce: for Autoint and FTTransformer set use_cat to False, advanced_roles to False and all features to numeric

Add ranking task

๐Ÿš€ Feature Request

Add ranking task, metrics

Motivation

LightAutoML has binary, multiclass, regression task, it would be great to use LightAutoML to solve ranking problems

Input contains NaN error when doing linear_l2 model

๐Ÿ› Bug

On some multiclass tasks the linear model throws the following error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

called from site-packages/sklearn/utils/validation.py.

To Reproduce

Steps to reproduce the behavior:

  1. Unarchive the issue.zip folder;
  2. Place it in the LightAutoML directory and cd to issue folder;
  3. run python ./lama_cpu.py -p ./data/ -k sf-crime -f 2 -n 4 -s 42 -c ./lama_cpu.yml -t 7200;
  4. during fold 2 calculation an error should appear;
  5. if you run python ./lama_cpu.py -p ./data/ -k otto -f 2 -n 4 -s 42 -c ./lama_cpu.yml -t 7200 you should see a normal program termination on a different dataset.

Expected behavior

I expect the sf-crime dataset to finish successfully just like otto.

Additional context

You can make the error disappear if you change learning rate from 0.1 to 0.05. But is it a good solution?

Checklist

  • bug description
  • steps to reproduce
  • expected behavior
  • code sample / screenshots

issue.zip

Serialization-related exceptions when using cpu_limit > 1

๐Ÿ› Bug

Hi! I'm trying to experiment with LAMA using the simplest notebook, the one called Tutorial_1_basics.ipynb using Google Colab. I'm using just an ordinary Colab environment, without any pre-configuration.

The execution fails when I set cpu_limit to something greater than 1:

TabularAutoML(
    cpu_limit = 2,
)

with the following exception:

---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/joblib/externals/loky/process_executor.py", line 407, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "/usr/lib/python3.7/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'new_block' on <module 'pandas.core.internals.blocks' from '/usr/local/lib/python3.7/dist-packages/pandas/core/internals/blocks.py'>
"""

The above exception was the direct cause of the following exception:

BrokenProcessPool                         Traceback (most recent call last)
<timed exec> in <module>

[/usr/local/lib/python3.7/dist-packages/lightautoml/automl/presets/tabular_presets.py](https://localhost:8080/#) in fit_predict(self, train_data, roles, train_features, cv_iter, valid_data, valid_features, log_file, verbose)
    547             data, _ = read_data(valid_data, valid_features, self.cpu_limit, self.read_csv_params)
    548 
--> 549         oof_pred = super().fit_predict(train, roles=roles, cv_iter=cv_iter, valid_data=valid_data, verbose=verbose)
    550 
    551         return cast(NumpyDataset, oof_pred)

10 frames
[/usr/lib/python3.7/concurrent/futures/_base.py](https://localhost:8080/#) in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

If I set cpu_limit=1, this exception is not thrown and the execution completes successfully.

To Reproduce

Steps to reproduce the behavior:

  1. Open Tutorial_1_basics.ipynb
  2. Run all of the cells needed to configure the task
  3. Run the cell with code oof_pred = automl.fit_predict(tr_data, roles = roles, verbose = 1)

Expected behavior

We except that the execution completes without an error

Error during lib import in Kaggle and Colab

TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'

Full Error

TypeError Traceback (most recent call last)
Cell In[4], line 4
1 import pandas as pd
2 from sklearn.metrics import f1_score
----> 4 from lightautoml.automl.presets.tabular_presets import TabularAutoML
5 from lightautoml.tasks import Task

File /opt/conda/lib/python3.10/site-packages/lightautoml/init.py:16
12 import importlib_metadata
14 version = importlib_metadata.version(name)
---> 16 from .addons import *
17 from .addons.utilization import *
18 from .automl import *

File /opt/conda/lib/python3.10/site-packages/lightautoml/addons/utilization/init.py:2
1 """Tools to configure resources utilization."""
----> 2 from .utilization import TimeUtilization
4 all = ['TimeUtilization']

File /opt/conda/lib/python3.10/site-packages/lightautoml/addons/utilization/utilization.py:8
4 from typing import Optional, Any, Sequence, Type, Union, Iterable
6 from log_calls import record_history
----> 8 from ...automl.base import AutoML
9 from ...automl.blend import Blender, BestModelSelector
10 from ...automl.presets.base import AutoMLPreset

File /opt/conda/lib/python3.10/site-packages/lightautoml/automl/base.py:8
4 from typing import Sequence, Any, Optional, Iterable, Dict, List
6 from log_calls import record_history
----> 8 from .blend import Blender, BestModelSelector
9 from ..dataset.base import LAMLDataset
10 from ..dataset.utils import concatenate

File /opt/conda/lib/python3.10/site-packages/lightautoml/automl/blend.py:9
6 from log_calls import record_history
7 from scipy.optimize import minimize_scalar
----> 9 from ..dataset.base import LAMLDataset
10 from ..dataset.np_pd_dataset import NumpyDataset
11 from ..dataset.roles import NumericRole

File /opt/conda/lib/python3.10/site-packages/lightautoml/dataset/base.py:8
4 from typing import Any, Optional, Dict, List, Tuple, Sequence, Union, TypeVar
6 from log_calls import record_history
----> 8 from .roles import ColumnRole
9 from ..tasks.base import Task
11 valid_array_attributes = ('target', 'group', 'folds', 'weights')

File /opt/conda/lib/python3.10/site-packages/lightautoml/dataset/roles.py:15
9 Dtype = Union[Callable, type, str]
12 # valid_features_str_names = []
14 @record_history(enabled=False)
---> 15 class ColumnRole:
16 """Abstract class for column role.
17
18 Role type defines column dtype,
(...)
22
23 """
24 dtype = object

File /opt/conda/lib/python3.10/site-packages/log_calls/log_calls.py:1691, in deco_base.call(self, f_or_klass)
1684 self.cls = klass
1686 if klass:
1687 #++++++++++++++++++++++++++++++++
1688 # 0.3.0 -- case "f_or_klass is a class" -- namely, klass
1689 #++++++++++++++++++++++++++++++++
-> 1691 self.class__call
(klass) # modifies klass (methods & inner classes) (if not builtin)
1692 self._add_class_attrs(klass) # v0.3.0v20 traps TypeError for builtins
1693 return klass

File /opt/conda/lib/python3.10/site-packages/log_calls/log_calls.py:1482, in deco_base.class__call(self, klass)
1479 new_only = deco_obj._only or self._only
1480 new_omit += deco_obj._omit
-> 1482 new_class = self.class(
1483 settings=new_settings,
1484 only=new_only,
1485 omit=new_omit
1486 )(item)
1487 # and replace in class dict
1488 setattr(klass, name, new_class)

File /opt/conda/lib/python3.10/site-packages/log_calls/log_calls.py:1692, in deco_base.call(self, f_or_klass)
1686 if klass:
1687 #++++++++++++++++++++++++++++++++
1688 # 0.3.0 -- case "f_or_klass is a class" -- namely, klass
1689 #++++++++++++++++++++++++++++++++
1691 self.class__call
(klass) # modifies klass (methods & inner classes) (if not builtin)
-> 1692 self._add_class_attrs(klass) # v0.3.0v20 traps TypeError for builtins
1693 return klass
1695 elif not f:
1696 #++++++++++++++++++++++++++++++++
1697 # 0.3.0 -- case "f_or_klass is a callable but not a function"
(...)
1701 # Callable builtins e.g. len are not functions in the isfunction sense,
1702 # can't deco anyway. Just give up (quietly):

File /opt/conda/lib/python3.10/site-packages/log_calls/log_calls.py:2138, in _deco_base.add_class_attrs(self, klass)
2136 this_deco_class = self.class
2137 this_deco_class_name = this_deco_class.name
-> 2138 setattr(
2139 klass,
2140 'get
' + this_deco_class_name + '_wrapper',
2141 classmethod(partial(_get_deco_wrapper, this_deco_class))
2142 )
2143 # Make it even easier for methods to find their own log_calls wrappers,
2144 # via get_own_log_calls_wrapper(fname)
2145 # or get_own_record_history_wrapper(fname)
2146 # This can be called on a deco'd class or on an instance thereof.
2147 this_deco_class = self.class

TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'

Remove tunner wrapper

๐Ÿš€ Feature Request

Remove wrapper for tunner module.

Motivation

We need only Optuna as a tuner.

Unable to import TabularAutoML

๐Ÿ› Bug

Hello,
I am trying to use LightAutoML in my local, Google Colab and Kaggle with the latest version.

when I am importing

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

I get this error


TypeError                                 Traceback (most recent call last)
[<ipython-input-8-e1576580262b>](https://localhost:8080/#) in <cell line: 17>()
     15 
     16 # LightAutoML presets, task and report generation
---> 17 from lightautoml.automl.presets.tabular_presets import TabularAutoML
     18 from lightautoml.tasks import Task

10 frames
[/usr/local/lib/python3.10/dist-packages/log_calls/log_calls.py](https://localhost:8080/#) in _add_class_attrs(self, klass)
   2136         this_deco_class = self.__class__
   2137         this_deco_class_name = this_deco_class.__name__
-> 2138         setattr(
   2139             klass,
   2140             'get_' + this_deco_class_name + '_wrapper',

TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'

To Reproduce

Steps to reproduce the behavior:

  1. https://colab.research.google.com/github/AILab-MLTools/LightAutoML/blob/master/examples/tutorials/Tutorial_9_neural_networks.ipynb
  2. Install the Library
  3. Try to Import -> from lightautoml.automl.presets.tabular_presets import TabularAutoML

Attached is the screenshot of the error.

image

TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'

Hi,
I am trying to set up lama in python 3.10
System is Windows 10, I created a separate .venv for this.

when I am trying to execute:
from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML
it fails with the following traceback:

---> [13](vscode-notebook-cell:/c%3A/Users/Sergey/repos/air-quality/Copy_of_starter.ipynb#ch0000064?line=12) from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML
     [14](vscode-notebook-cell:/c%3A/Users/Sergey/repos/air-quality/Copy_of_starter.ipynb#ch0000064?line=13) from lightautoml.tasks import Task
     [15](vscode-notebook-cell:/c%3A/Users/Sergey/repos/air-quality/Copy_of_starter.ipynb#ch0000064?line=14) from lightautoml.report.report_deco import ReportDeco

File c:\Users\Sergey\repos\air-quality\.venv\lib\site-packages\lightautoml\__init__.py:16, in <module>
     12     import importlib_metadata
     14 __version__ = importlib_metadata.version(__name__)
---> 16 from .addons import *
     17 from .addons.utilization import *
     18 from .automl import *

File c:\Users\Sergey\repos\air-quality\.venv\lib\site-packages\lightautoml\addons\utilization\__init__.py:2, in <module>
      1 """Tools to configure resources utilization."""
----> 2 from .utilization import TimeUtilization
      4 __all__ = ['TimeUtilization']

File c:\Users\Sergey\repos\air-quality\.venv\lib\site-packages\lightautoml\addons\utilization\utilization.py:8, in <module>
      4 from typing import Optional, Any, Sequence, Type, Union, Iterable
      6 from log_calls import record_history
----> 8 from ...automl.base import AutoML
...
   2145 # or `get_own_record_history_wrapper(fname)`
   2146 # This can be called on a deco'd class or on an instance thereof.
   2147 this_deco_class = self.__class__

TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'

Any hints / suggestions on that error?

Support for Langevin Parameter in CatBoost Tuning

๐Ÿš€ Feature Request

Motivation

I would like to introduce support for the langevin=True parameter in LightAutoML. This parameter is the Stochastic Gradient Langevin Boosting (SGLB) method, which is a powerful and efficient machine learning framework capable of handling a wide range of loss functions and providing provable generalization guarantees. The method is based on a special form of the Langevin diffusion equation specifically designed for gradient boosting. This allows us to theoretically guarantee global convergence even for multimodal loss functions, while standard gradient boosting algorithms can only guarantee local optimum Paper

Proposal

I propose that LightAutoML support the langevin=True parameter during hyperparameter tuning of CatBoost models. This would allow users to leverage the benefits of SGLB when tuning CatBoost models using LightAutoML.

Alternatives

As an alternative, users could manually set the langevin parameter when creating a CatBoost instance. However, this could be less convenient and efficient than having LightAutoML automatically tune the parameters.

Additional context

I have successfully used the langevin=True parameter during a Kaggle competition. This experience has shown me the potential benefits of this parameter, and I believe it would be beneficial to have this feature in LightAutoML.

More details: #5 Solution

Error at LAMA import

I have tried one of your Colab tutorials:

Tutorial_1_basics.ipynb - get started with LightAutoML on tabular data.

It fails with "TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'" at importing
"from lightautoml.automl.presets.tabular_presets import TabularAutoML, TabularUtilizedAutoML" (((((

Use of deprecated pandas methods

๐Ÿ› Bug

To Reproduce

Steps to reproduce the behavior:
Run some examples with categorical data.

Expected behavior

No warnings

Additional context

OrdinalEncoder uses pandas.Series.append() which is deprecated. Meanwhile it results is a bunch of warnings, in the future it will fail.

Checklist

  • bug description
  • steps to reproduce
  • expected behavior
  • code sample / screenshots

Fix features forcing inside feature selection

๐Ÿ› Bug

Feature forcing is not working.

return dataset[:, self.selected_features]

Additional context

AutoML reuses feature selector from 1st level in the 2nd level if level 2 not using feature selector. So if feature forcing not working predictions from 1st level not passed to next levels.

Where AutoML reuses feature selector:


TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'

In google colab when trying to execute the following code:
!pip install lightautoml

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

The following error is displayed:


TypeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 from lightautoml.automl.presets.tabular_presets import TabularAutoML
2 from lightautoml.tasks import Task

10 frames
/usr/local/lib/python3.10/dist-packages/lightautoml/init.py in
14 version = importlib_metadata.version(name)
15
---> 16 from .addons import *
17 from .addons.utilization import *
18 from .automl import *

/usr/local/lib/python3.10/dist-packages/lightautoml/addons/utilization/init.py in
1 """Tools to configure resources utilization."""
----> 2 from .utilization import TimeUtilization
3
4 all = ['TimeUtilization']

/usr/local/lib/python3.10/dist-packages/lightautoml/addons/utilization/utilization.py in
6 from log_calls import record_history
7
----> 8 from ...automl.base import AutoML
9 from ...automl.blend import Blender, BestModelSelector
10 from ...automl.presets.base import AutoMLPreset

/usr/local/lib/python3.10/dist-packages/lightautoml/automl/base.py in
6 from log_calls import record_history
7
----> 8 from .blend import Blender, BestModelSelector
9 from ..dataset.base import LAMLDataset
10 from ..dataset.utils import concatenate

/usr/local/lib/python3.10/dist-packages/lightautoml/automl/blend.py in
7 from scipy.optimize import minimize_scalar
8
----> 9 from ..dataset.base import LAMLDataset
10 from ..dataset.np_pd_dataset import NumpyDataset
11 from ..dataset.roles import NumericRole

/usr/local/lib/python3.10/dist-packages/lightautoml/dataset/base.py in
6 from log_calls import record_history
7
----> 8 from .roles import ColumnRole
9 from ..tasks.base import Task
10

/usr/local/lib/python3.10/dist-packages/lightautoml/dataset/roles.py in
13
14 @record_history(enabled=False)
---> 15 class ColumnRole:
16 """Abstract class for column role.
17

/usr/local/lib/python3.10/dist-packages/log_calls/log_calls.py in call(self, f_or_klass)
1689 #++++++++++++++++++++++++++++++++
1690
-> 1691 self.class__call_(klass) # modifies klass (methods & inner classes) (if not builtin)
1692 self._add_class_attrs(klass) # v0.3.0v20 traps TypeError for builtins
1693 return klass

/usr/local/lib/python3.10/dist-packages/log_calls/log_calls.py in class__call_(self, klass)
1480 new_omit += deco_obj._omit
1481
-> 1482 new_class = self.class(
1483 settings=new_settings,
1484 only=new_only,

/usr/local/lib/python3.10/dist-packages/log_calls/log_calls.py in call(self, f_or_klass)
1690
1691 self.class__call_(klass) # modifies klass (methods & inner classes) (if not builtin)
-> 1692 self._add_class_attrs(klass) # v0.3.0v20 traps TypeError for builtins
1693 return klass
1694

/usr/local/lib/python3.10/dist-packages/log_calls/log_calls.py in add_class_attrs(self, klass)
2136 this_deco_class = self.class
2137 this_deco_class_name = this_deco_class.name
-> 2138 setattr(
2139 klass,
2140 'get
' + this_deco_class_name + '_wrapper',

TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'

Issue with AutoTS, no parameter "transformers_params"

๐Ÿ› Bug

When installing in a fresh venv, version from pip does not support transformers_params argument in lightautoml.addons.autots.base.AutoTS

To Reproduce

Steps to reproduce the behavior:

  1. get python Python 3.8.10 (v3.8.10:3d8993a744, May 3 2021, 08:55:58)
  2. install lightautoml==0.3.7.3
pip install lightautoml
  1. get code from master/examples/demo13.py
  2. try to run
import numpy as np
import pandas as pd

from sklearn.metrics import mean_absolute_error

from lightautoml.addons.autots.base import AutoTS
from lightautoml.tasks import Task


np.random.seed(42)

data = pd.read_csv("data/ai92_value_77.csv")
horizon = 30

train = data[:-horizon]
test = data[-horizon:]

roles = {"target": "value", "datetime": "date"}

seq_params = {
    "seq0": {
        "case": "next_values",
        "params": {"n_target": horizon, "history": np.maximum(7, horizon), "step": 1, "test_last": True},
    },
}

# True (then set default values) / False; int, list or np.array
# default: lag_features=30, diff_features=7
transformers_params = {
    "lag_features": [0, 1, 2, 3, 5, 10],
    "lag_time_features": [0, 1, 2],
    "diff_features": [0, 1, 3, 4],
}

task = Task("multi:reg", greater_is_better=False, metric="mae", loss="mae")

automl = AutoTS(
    task,
    seq_params=seq_params,
    trend_params={
        "trend": False,
    },
    transformers_params=transformers_params,
)
train_pred, _ = automl.fit_predict(train, roles, verbose=4)
forecast, _ = automl.predict(train)

print("Check scores...")
print("TEST score: {}".format(mean_absolute_error(test[roles["target"]].values, forecast.data)))
  1. get error:
multi:reg isn`t supported in lgb
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 20
     12 transformers_params = {
     13     "lag_features": [0, 1, 2, 3, 5, 10],
     14     "lag_time_features": [0, 1, 2],
     15     "diff_features": [0, 1, 3, 4],
     16 }
     18 task = Task("multi:reg", greater_is_better=False, metric="mae", loss="mae")
---> 20 automl = AutoTS(
     21     task,
     22     seq_params=seq_params,
     23     trend_params={
     24         "trend": False,
     25     },
     26     transformers_params=transformers_params,
     27 )
     28 train_pred, _ = automl.fit_predict(train, roles, verbose=4)
     29 forecast, _ = automl.predict(train)

TypeError: __init__() got an unexpected keyword argument 'transformers_params'

Expected behavior

  • code runs
  • get TEST score print at the end

Additional context

  • Running with jupyter notebook
  • OS: macOS Ventura 13.2

Add time series tutorial

  • added a tutorial showing the possibilities of using lightautoml for time series prediction tasks

Installation Error

Hello, I kept getting this error when trying to install the package !pip install -U lightautoml[all] Not sure how to address this, any help would be greatly appreciated.


TypeError Traceback (most recent call last)
in <cell line: 17>()
15
16 # LightAutoML presets, task and report generation
---> 17 from lightautoml.automl.presets.tabular_presets import TabularAutoML
18 from lightautoml.tasks import Task

10 frames
/usr/local/lib/python3.10/dist-packages/log_calls/log_calls.py in add_class_attrs(self, klass)
2136 this_deco_class = self.class
2137 this_deco_class_name = this_deco_class.name
-> 2138 setattr(
2139 klass,
2140 'get
' + this_deco_class_name + '_wrapper',

TypeError: cannot set 'get_record_history_wrapper' attribute of immutable type 'object'

Make torch as necessary part only for cv or nlp tasks

What's now: Any LAMA installation also installing torch as dependency, and this takes more than 800MB additional space.

Trouble example: I use LAMA for my project with tabular data and whenever project image is created - it takes up twice more space (1.6 GB) for features I don't ever need in this project.

Idea: Make additional extras for installation ([no-torch] or something like that) or remove torch from [report] installation.

Support python 3.10

๐Ÿš€ Feature Request

This package has become unusable on Kaggle and Colab

Screen Shot 2023-06-16 at 6 54 22 pm

Motivation

Related to this: #89

Proposal

Just update the config to support 3.10.10 and drop an update.

Alternatives

  • Downgrading every instance may be an option but no new users will know how to do this.

Checklist

  • [ X] feature proposal description
  • [ X] motivation
  • [ X] additional context / proposal alternatives review

Add forced features as a basis in an iterative selection strategy

๐Ÿš€ Feature Request

Features force_input must be in out set of features. But features that correlated with forced features are not need in final feature set. I'd like to add forced features as a basis in an iterative selection strategy, so correlated features will not be selected.

Motivation

Reducing the size of the dataset without reducing the quality.

5-fold CV causes different number of classes in train/test

๐Ÿ› Bug

if train contains less than 5 instances of any class, one or more folds fails with "y_true and y_pred contain different number of classes" error.

To Reproduce

Run default AutoML on wine-quality-white task:

y_true and y_pred contain different number of classes 6, 7. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1 2 3 4 5]
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/ml_algo/utils.py", line 66, in tune_and_fit_predict
preds = ml_algo.fit_predict(train_valid)
File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/ml_algo/base.py", line 273, in fit_predict
model, pred = self.fit_predict_single_fold(train, valid)
File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/ml_algo/linear_sklearn.py", line 140, in fit_predict_single_fold
valid.weights,
File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/ml_algo/torch_based/linear_model.py", line 406, in fit
score = self.metric(y_val, val_pred, weights_val)
File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/tasks/losses/base.py", line 42, in call
val = self.metric_func(y_true, y_pred, sample_weight=sample_weight)
File "/usr/local/lib/python3.7/site-packages/sklearn/metrics/classification.py", line 2430, in log_loss
transformed_labels.shape[1], y_pred.shape[1], lb.classes

ValueError: y_true and y_pred contain different number of classes 6, 7. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1 2 3 4 5]
Traceback (most recent call last):
File "experiments/run_tabular.py", line 75, in
main(dataset_name=args.dataset, cpu_limit=args.cpu_limit, memory_limit=args.memory_limit)
File "experiments/run_tabular.py", line 38, in main
oof_predictions = automl.fit_predict(train, roles={"target": "class"}, verbose=10)
File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/automl/presets/tabular_presets.py", line 549, in fit_predict
oof_pred = super().fit_predict(train, roles=roles, cv_iter=cv_iter, valid_data=valid_data, verbose=verbose)
File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/automl/presets/base.py", line 212, in fit_predict
verbose=verbose,
File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/automl/base.py", line 212, in fit_predict
pipe_pred = ml_pipe.fit_predict(train_valid)
File "/root/.clearml/venvs-builds/3.7/task_repository/LightAutoML.git/lightautoml/pipelines/ml/base.py", line 136, in fit_predict
), "Pipeline finished with 0 models for some reason.\nProbably one or more models failed"
AssertionError: Pipeline finished with 0 models for some reason.
Probably one or more models failed
Process failed, exit code 1

Blending uses weights from last iteration, not best

๐Ÿ› Bug

Log from tutorial 1:

[10:39:41] Layer 1 training completed.

[10:39:41] Blending: optimization starts with equal weights and score 0.7529907972036357
[10:39:41] Blending: iteration 0: score = 0.7552430767490724, weights = [0.17297602 0.45194352 0.12763917 0. 0.24744129]
[10:39:41] Blending: iteration 1: score = 0.7555030859886485, weights = [0.22631824 0.37608108 0.19821008 0. 0.1993906 ]
[10:39:41] Blending: iteration 2: score = 0.7554997906957511, weights = [0.23017529 0.38376114 0.18327494 0. 0.20278871]
[10:39:41] Blending: iteration 3: score = 0.755520519151073, weights = [0.23089828 0.38205293 0.18385063 0. 0.20319813]
[10:39:41] Blending: iteration 4: score = 0.7555106332723811, weights = [0.22740835 0.3837866 0.18468489 0. 0.20412019]
[10:39:41] Automl preset training completed in 234.02 seconds

[10:39:41] Model description:
Final prediction for new objects (level 0) =
0.22741 * (5 averaged models Lvl_0_Pipe_0_Mod_0_LinearL2) +
0.38379 * (5 averaged models Lvl_0_Pipe_1_Mod_0_LightGBM) +
0.18468 * (5 averaged models Lvl_0_Pipe_1_Mod_1_Tuned_LightGBM) +
0.20412 * (5 averaged models Lvl_0_Pipe_1_Mod_3_Tuned_CatBoost)

CPU times: user 18min 28s, sys: 1min 26s, total: 19min 55s
Wall time: 3min 54s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.