aimclub / bamt Goto Github PK

View Code? Open in Web Editor NEW

117.0 9.0 17.0 108.75 MB

Repository of a data modeling and analysis tool based on Bayesian networks

Home Page: https://bamt.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

bayesian-networks synthetic-data mixed-data structure-learning parameters-learning

bamt's Introduction

BAMT - Bayesian Analytical and Modelling Toolkit

Repository of a data modeling and analysis tool based on Bayesian networks.

Badges

team
package
tests
docs
license
stats
style

Introduction

BAMT - Bayesian Analytical and Modelling Toolkit. This repository contains a data modeling and analysis tool based on Bayesian networks. It can be divided into two main parts - algorithms for constructing and training Bayesian networks on data and algorithms for applying Bayesian networks for filling gaps, generating synthetic data, assessing edge strength, etc.

Installation

BAMT package is available via PyPi:

pip install bamt

BAMT Features

The following algorithms for Bayesian Networks learning are implemented:

Building the structure of a Bayesian network based on expert knowledge by directly specifying the structure of the network.
Building the structure of a Bayesian network on data using three algorithms - Hill Climbing, evolutionary, and PC (PC is currently under development). For Hill Climbing, the following score functions are implemented - MI, K2, BIC, AIC. The algorithms work on both discrete and mixed data.
Learning the parameters of distributions in the nodes of the network based on Gaussian distribution and Mixture Gaussian distribution with automatic selection of the number of components.
Non-parametric learning of distributions at nodes using classification and regression models.
BigBraveBN - algorithm for structural learning of Bayesian networks with a large number of nodes. Tested on networks with up to 500 nodes.

Difference from existing implementations:

Algorithms work on mixed data.
Structural learning implements score-functions for mixed data.
Parametric learning implements the use of a mixture of Gaussian distributions to approximate continuous distributions.
Non-parametric learning of distributions with various user-specified regression and classification models.
The algorithm for structural training of large Bayesian networks (> 10 nodes) is based on local training of small networks with their subsequent algorithmic connection.

For example, in terms of data analysis and modeling using Bayesian networks, a pipeline has been implemented to generate synthetic data by sampling from Bayesian networks.

How to use

Then the necessary classes are imported from the library:

from bamt.networks.hybrid_bn import HybridBN

Next, a network instance is created and training (structure and parameters) is performed:

bn = HybridBN(has_logit=False, use_mixture=True)
bn.add_edges(preprocessed_data)
bn.fit_parameters(data)

Examples & Tutorials

More examples can be found in Documentation.

Publications about BAMT

We have published several articles about BAMT:

Project structure

The latest stable version of the library is available in the master branch.

It includes the following modules and directories:

bamt - directory with the framework code:
- Preprocessing - module for data preprocessing
- Networks - module for building and training Bayesian networks
- Nodes - module for nodes support of Bayesian networks
- Utilities - module for mathematical and graph utilities
data - directory with data for experiments and tests
tests - directory with unit and integration tests
tutorials - directory with tutorials
docs - directory with RTD documentation

Preprocessing

Preprocessor module allows users to transform data according to the pipeline (similar to the pipeline in scikit-learn).

Networks

Three types of networks are implemented:

HybridBN - Bayesian network with mixed data
DiscreteBN - Bayesian network with discrete data
ContinuousBN - Bayesian network with continuous data

They are inherited from the abstract class BaseNetwork.

Nodes

Contains classes for nodes of Bayesian networks.

Utilities

Utilities module contains mathematical and graph utilities to support the main functionality of the library.

Web-BAMT

A web interface for BAMT is currently under development. The repository is available at web-BAMT.

Contacts

If you have questions or suggestions, you can contact us at the following address: [email protected] (Irina Deeva)

Our resources:

Citation

@misc{BAMT,
  author={BAMT},
  title = {Repository experiments and data},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ITMO-NSS-team/BAMT.git}},
  url = {https://github.com/ITMO-NSS-team/BAMT.git}
}

@article{deeva2023advanced,
  title={Advanced Approach for Distributions Parameters Learning in Bayesian Networks with Gaussian Mixture Models and Discriminative Models},
  author={Deeva, Irina and Bubnova, Anna and Kalyuzhnaya, Anna V},
  journal={Mathematics},
  volume={11},
  number={2},
  pages={343},
  year={2023},
}

bamt's People

Contributors

Stargazers

Watchers

Forkers

yaxuzhu chihuanbin seanahmad nmramorov karinakarina6 jrzkaminski xiemeigongzi roman223 lechatb dmitry-grigorev evilfreelancer matveyshishov yakonick pablokarpacho licongren sandy4321 anton-golubkov

bamt's Issues

Не обновляются типы узлов при передаче в set_structure пустого списка рёбер

Проблема в том, что когда создаёшь сеть со смесями например bn = ContinuousBN(use_mixture=True) и пытаешься инициализировать в БС пустую структуру (bn.set_structure(edge=[])), не обновляются типы узлов, они остаюстя по умолчанию Gaussian, хотя должны быть MixtureGaussian, если передать хоть какую-то структуру, хотя бы из одного ребра, то обновление типов узлов происходит.

Failed to build pomegranate

In discrete_node.py you import "DiscreteDistribution" and "ConditionalProbabilityTable" from pomegranate. But last pomegranate version has not these. Maybe it's a reason for my failure: "Failed to build pomegranate" during install your library. What do you think?

requirements are too strict

requirements.txt and pyproject.toml only contain strict requirements like numpy==1.24.0, it makes BAMT incompatible with other frameworks and tools like GOLEM and FEDOT.

Create RTD documentation

Сейчас документация размещена в wiki (что не слишком удобно в плане версионирования и авто-сборки) и написана на русском.

Хорошо бы её перевести и пересоздать в формате read-the-docs. Параллельно можно будет актуализировать содержание.

Возможно, вот эту часть стоит переработать в виде Factory.

          Возможно, вот эту часть стоит переработать в виде Factory.

Originally posted by @nicl-nno in #50 (comment)

Remove unused functions and variables

Некоторые объекты вообще не используются.

Примеры:

функция get_nodes_sign в discretization.py
переменная est там же.
classifier_body в nodes.py
куча закомментированного кода (много где):

Error when sampling data

Hello,

I try to replicate the following code (https://bamt.readthedocs.io/en/latest/examples/learn_sampling_predict.html) but when i learn the parameters i get the following error : AttributeError: 'NoneType' object has no attribute 'split'. I give you the details of the errors : error.pdf
And I have the impression that learning the structure isn't complete: the progress bar is at 0%.
When I set the use_mixture parameter to False, I can generate data, but the number of samples generated does not correspond to the number given at the start (there are fewer).

Thanks,

Nikita

Get rid of useless tests

Некоторые тесты по факту ничего не тестируют (например, https://github.com/ITMO-NSS-team/BAMT/blob/master/tests/sendingClassifiersLogit.py).

Тут нет ни assertoв, ни какого-то понятного сценария тестирования. Нужно пройтись по всем и удалить/переработать такие тесты.

How to use PC algorithm with continuous and discrete data?

Add PEP8 bot

The is a lot of non-PEP8-compatible parts in the code. The automated validation using bot should be implemented.

Get rid of fedot sources in framework

Looks like BAMP uses FEDOT features, but it is not imported via requirements.
I suggest getting rid of fedot directory.
In order to import fedot correctly, just add requirements.txt in root and add this line:
fedot==0.3.1

If a specific version of the framework is needed, it can be linked to a branch:
pip install git+https://github.com/nccr-itmo/FEDOT@master
Here, the latest version from the master branch is imported.

Replace progress bars with rich

Replace outdated tqdm progress bars with rich progress bars
Add new progress bar for fit_parameters

Display func get_info(as_df=False) should by dynamic in order to create pretty tables

Problem
If names of nodes are too broad, borders of table's cells are shifted.

Integration with GOLEM

Нужно сделать пример структурного обучения на основе https://github.com/aimclub/GOLEM.
В идеале не-эволюционные подходы к обучению тоже бы хорошо реализовать в golem-compatible формате.

ModuleNotFoundError: No module named 'bamt.Networks'

Hi,

Firstly, I would like to thank you and grateful for this awesome package. I'm encounter this "ModuleNotFoundError: No module named 'bamt.Networks'" when I try to compile and run example_socio.ipynb using Spyder(Python 3.9) in Anaconda. I install bamt 1.1.31 using "conda skeleton pypi bamt" and its successfully installed, however, it still not available when I import it.

I really appreciate your help, thanks.
NazimR

Learning network structure and probability distribution for prediction output.

Hi,

I have two questions?

How to learn network structure without discretization since the tutorials are using discretized data for structural learning? (sorry if I failed to notice if there any tutorials for structural learning for mixed data without discretization)
How to get the output of prediction in term of probability?
For example, the target class of IRIS data are IrisSetosa, IrisVersicolor and IrisVirginica.
The output of "pred = bn.predict(iris)" will be the target class in term of IrisSetosa, IrisVersicolor or IrisVirginica. So, how to get the probability as the output like:
IrisSetosa = 0.2
IrisVersicolor = 0.2
IrisVirginica = 0.6

I really appreciate your help. Thank you :)

BAMT 2.0.0 - new features, refactoring, architecture refreshment

Current BAMT architecture has a number of disadvantages, some clunky code and other limitations. Thus, it was decided to make a full refactoring. This refreshment will not only include new refactored code and API but also new features (like vectorized sampling and other operations, new algorithms for structure learning, score-functions etc.)
For now, here is a checklist of modules that should be implemented in 2.0.0 architecture:

The development of BAMT 2.0.0 is held in 2.0.0 branch of the repository. If you, the reader of the issue, have decided to implement some module or submodule, please reply to this message, create a separate issue and add it to milestone and project.

The goal of these changes is also to make a sklearn-like interface, so the usual pipeline looks like that:

# read data
data = pd.read_csv("data.csv")

# define optimizers and score functions
dag_score_function = DAGScoreFunction(**parameters)
dag_optimizer = DAG_optimizer(**parameters)


# get a structure, maybe in networkx format?
G = dag_optimizer.optimize(data, ** parameters)

# define parameters estimator and BN
parameters_estimator = ParametersEstimator(**parameters)
bn = ContinuousBayesianNetwork(**parameters)

# fit the bn
bn.fit(data, ParametersEstimator, **parameters)
bn.sample(1000)
bn.predict(data.drop[["col1", "col2"]])

BigBraveBN update api

In the algorithm for large BigBraveBN networks, a threshold is now set at the network initialization stage, which is then used to select insignificant edges based on the Brave matrix of coefficients. However, if I want to change the threshold, I need to reinitialize the network and restart the process of calculating the Brave matrix, although this is not necessary if it has already been calculated and I just need to change the threshold value. It is necessary to move the threshold value to another level.

set_structure doesn't work with ContinuousBN

When I create a continuous network with the use_mixture=true flag and want to set a structure to it with set_structure(), after that the node types don't change, and the parents don't update either. I am attaching code and dataset.
Code.txt
synth_data.csv

Add LSevoBN algorithim

Add the LSevoBN algorithm as soon as the evolutionary algorithm is updated in BAMT.

Refactoring of string-based identifiers

Сейчас большая часть логики завязана на строковые идентификаторы:

Но это порождает многочисленные ситуации, когда есть различия в регистре и возникают .lower() и прочие неэстетичные штуки.

Я бы предложил два варианта:

ту часть "категориальных" идентификаторов, которая не видна конечному пользователю, заменить на Enum-ы;
или хотя бы заменить все эти строковые идентификатора на константы.

Training time

Add experiments to estimate training time with subnets and on the entire dataset

Refactor evolutionary structure learning algorithm

The algorithm should be refactored and unified for basic nets and composite nets.

Display module update

It seems display method should be refactored into a module.
Name of nodes should be in English.

Non-PEP8 naming

Много где нотация названий объектов не по PEP8. Сходу, например:

После #29 нужно пройтись с IDE и всё почистить.

Integration of Information Metrics

Some metrics are another files (e.g. mi_entropy_gauss, redef_HC), this is not compatible with entire BAMT logic.
Therefore, these files must be decomposed and integrated as Builders in BAMT Logic.

These files must be investigated and removed, their essence must be consolidated by BAMT.

Trouble with dependencies

Trouble with Python versions

Made a copy of the framework repo. Create venv with with the specified requirements.txt
When launch experiments/fedot_bn_example.py got the following error

ImportError: cannot import name 'TypedDict' from 'typing' (C:\Users\558\AppData\Local\Programs\Python\Python37\lib\typing.py)

According to this issue the problem is probably with the version of Python I am using. Anyway, when I used venv on python 3.9 instead of 3.7, it worked.

Dependencies may be incomplete

You use FEDOT (I understand that it is now used in experimental mode) in your experiments, as well as packages such as pgmpy, gmr, pyvis. However, they are not specified in requirements.txt. It may be worth expanding the list of libraries

Please check my worries

Fit_parameters method decomposition

Fit paramenets method is too big, it should be decomposed.

Для большей читаемости лучше gru.nodes_types(discretized_data).values() вынести в переменную.

          Для большей читаемости лучше gru.nodes_types(discretized_data).values() вынести в переменную.

И переработать в all([node_type in discrete_types for node_type in nodes_types])

Можно в целом вынести такую проверку в функцию вроде is_discrete - явно где-то ещё может понадобится.

Originally posted by @nicl-nno in #50 (comment)

Update pgmpy

Update pgmpy package and allow Python 3.11 for BAMT

Install error

Please help with the errors:
ERROR: Could not find a version that satisfies the requirement bamt (from versions: none)
ERROR: No matching distribution found for bamt

I used: "pip install bamt"

Roman

Привет, предлагаю исправить в файле selbst.ini

C:\Users\Roman\BAMT\Nodes_data

на

%%LOCALAPPDATA%%\aimclub\BAMT\Nodes_data

И оставить

log_conf_loc = logging.conf

В файле bamt\networks\base.py придётся добавить expandvars:

STORAGE = os.path.expandvars(config.get(
    "NODES", "models_storage", fallback="models_storage is not defined"
))

Или вообще использовать platformdirs и захардкодить user_data_dir(appname, appauthor).

Также мне потребовалось обновить

pip install threadpoolctl -U

иначе не читался конфиг blas и fit_parameters падал. Предлагаю threadpoolctl добавить в зависимости, или хотя бы прописать этот кейс в readme.

Отзыв по использованию примера example_socio.ipynb для знакомства в Bamt

Здравствуйте, моя попытка потестировать Bamt провалилась. Возможно, мой отзыв окажется полезен. В руководстве (https://github.com/aimclub/BAMT/blob/master/tutorials/example_socio.ipynb) приведён пример предсказания байесовской сетью наличия питомца по профилю vk юзера. Получили точность 94% и на том посчитали пример завершённым. При более детальном рассмотрении оказалось:

питомец был всего у 5% пользователей, и bamt ВСЕГДА предсказывала класс 0 (те ничему не училась)
из 80+ фичей авторы взяли лишь 8. казалось бы, почему? Ответ: библиотека работает ОЧЕНЬ МЕДЛЕННО. 30k примеров, 8 фичей обучается примерно секунд 50. с мультипроцессингом.
в примере авторы обучались на train set и потом предсказывали на ... sample(100) от того же train set.
катбуст и особенно xgboost обучались за доли секунды. но даже они не смогли ничему научиться, что было видно на CV, потому что.. отобранные авторами 8 фичей были нерелевантными.
предсказывать наличие питомца всё же было можно, и с хорошей точностью, надо было просто взять все фичи. доказано бустингами. но bamt над всеми фичами сожрал всю память на ноуте. а на компе питон младше 3.9, bamt не поставился.
гауссовы смеси в bamt постоянно вылетали с fitting the mixture model failed because some components have ill-defined empirical covariance (for instance caused by singleton or collapsed samples). Try to decrease the number of components, or increase reg_covar. В MathUtils.py к вызовам GaussianMixture(n_components=i, random_state=0) пришлось добавить с потолка reg_covar=1e-1, чтобы оно просто запускалось. Хотя как это влияет на качество решения - вопрос.

Я начал переделку блокнота, но так и не смог обучить Bamt, даже на половине признаков - не хватило 16gb ноутбучной памяти. Возможно, в будущем у уважаемых авторов получится провести ревизию данного примера в свете выявленных недостатков.

Установка BAMT падает на pyitlib==0.2.2

Добрый день! Пытаюсь установить BAMT через Jupyter.
Установка останавливается и выдает ошибку на пакете pyitlib

Collecting bamt
Obtaining dependency information for bamt from https://files.pythonhosted.org/packages/20/cb/92e16c14f0cebff0206cadd10dc5d4a4fe65be83eb3c55923ba0f9f2c427/bamt-1.1.44-py3-none-any.whl.metadata
Using cached bamt-1.1.44-py3-none-any.whl.metadata (9.7 kB)
Collecting gmr==1.6.2 (from bamt)
Using cached gmr-1.6.2.tar.gz (249 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting matplotlib==3.6.2 (from bamt)
Using cached matplotlib-3.6.2-cp310-cp310-win_amd64.whl (7.2 MB)
Collecting missingno<0.6.0,>=0.5.1 (from bamt)
Using cached missingno-0.5.2-py3-none-any.whl (8.7 kB)
Collecting numpy>=1.24.2 (from bamt)
Obtaining dependency information for numpy>=1.24.2 from https://files.pythonhosted.org/packages/b7/db/4d37359e2c9cf8bf071c08b8a6f7374648a5ab2e76e2e22e3b808f81d507/numpy-1.25.2-cp310-cp310-win_amd64.whl.metadata
Using cached numpy-1.25.2-cp310-cp310-win_amd64.whl.metadata (5.7 kB)
Collecting pandas==1.5.2 (from bamt)
Using cached pandas-1.5.2-cp310-cp310-win_amd64.whl (10.4 MB)
Collecting pgmpy==0.1.20 (from bamt)
Using cached pgmpy-0.1.20-py3-none-any.whl (1.9 MB)
Collecting pyitlib==0.2.2 (from bamt)
Using cached pyitlib-0.2.2.tar.gz (27 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error

python setup.py egg_info did not run successfully.
exit code: 1

[18 lines of output]
D:\Users\Koolkool\anaconda3\envs\Kool\lib\site-packages\setuptools\dist.py:745: SetuptoolsDeprecationWarning: Invalid dash-separated options
!!

      ********************************************************************************
      Usage of dash-separated 'description-file' will not be supported in future
      versions. Please use the underscore name 'description_file' instead.

      By 2023-Sep-26, you need to update your project and remove deprecated calls
      or your builds will no longer be supported.

      See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
      ********************************************************************************

!!
opt = self.warn_dash_deprecation(opt, section)
error in pyitlib setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Expected end or semicolon (after version specifier)
pandas>=0.20.2numpy>=1.9.2
~~~~~~~~^
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Encountered error while generating package metadata.

See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Improve README

Текущее README не особо понятно новичку.

Project structure написано очень скупо, Oil and Gas Reservoirs Parameters Analysis выглядит вообще непонятно (это кейс, или функциональность, или что-то ещё?). Очень не хватает примеров использования.

У нас ещё нет готового образа README, но можно посмотреть на черновик тут (https://github.com/ITMO-NSS-team/open-source-ops/pull/12/files) или опереться на FEDOT-овский.

Implement vectorized sampling

Right now sampling is implemented row-by-row. Vectorized version should be much faster.

Unified score-functions

Score-functions should be unified and allow custom user score-functions

Decomoposing the modules

Сейчас основные модули (например, networks.py) очень велики. В них куча классов, сотни строк кода - это тяжело и читается, и поддерживается.

Предложил бы переструктурировать их, создав иерархию папок-модулей и разложив в них файлы по принципу "1 класс - 1 файл" (от него можно отступать, но если это обосновано).

Replace unefficient datatypes with more appropriate ones

For example, a list of edges is now stored in a list() datatype, but a set() seems to be more appropriate and memory-efficient, since we do not need to store an order of the elements.

Link to documentation

The link to the documentation in the ReadMe file does not work

Regressors serialization

Просматривая файл nodes.py, я заметил, что при обучении параметров модель регрессора сериализуется (причем используется достаточно длинная конструкция if/else). Далее упакованная модель помещается в словарь distributions и для дальнейшей работы требуется распаковать её.

Почему бы не оставить сериализацию модели только в том месте, где мы хотим сохранить нашу модель?
В этом случае можно помещать в словарь distribution регрессор в качестве объекта и использовать его напрямую, без распаковки.

Bayesian Network Structure

Hi again,

I have two questions?

How to draw network structure manually for HBN?

encoder = preprocessing.LabelEncoder()
discretizer = preprocessing.KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='quantile')
p = pp.Preprocessor([('discretizer', discretizer)])
discretized_data, est = p.apply(TrainingSet)

edges = [('sepalL', 'target'), ('sepalW', 'target'), ('petalL', 'target'), ('petalW', 'target')]
bn = Nets.HybridBN(has_logit=True, use_mixture=True)
info = p.info
TrainingSet.info()
bn.add_nodes(info)
bn.set_structure(edges)
bn.get_info()

I try write the code but doesn't work. All the variables do not link accordingly to what I want.

bn.plot('Simple.html') not working, the output return "Local cdn resources have problems on chrome/safari when used in jupyter-notebook."

I really appreciate your help. Thank you :)

Generated Data Strings

There is a filter in the file (bamt\networks\base.py, line 578) "seq_df[(seq_df[positive_columns] >= 0).all(axis=1)]", which does not allow to get the desired number of rows. Please add an additional parameter allow the user to disable this check.

Test of compatibility of different models for nodes

Not all possible convinient models (sklearn) are covered in tests.

[BUG] Fail to install package in Google Colab

Bug Report

Installation process in Google Colab fails on building wheels for pomegranate.

How To Reproduce

Run installation in Google Colab

pip install bamt

See the error traceback

Collecting bamt
  Using cached bamt-1.1.41-py3-none-any.whl (60 kB)
Collecting gmr==1.6.2 (from bamt)
  Using cached gmr-1.6.2.tar.gz (249 kB)
  Preparing metadata (setup.py) ... done
Collecting matplotlib==3.6.2 (from bamt)
  Using cached matplotlib-3.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.8 MB)
Requirement already satisfied: missingno<0.6.0,>=0.5.1 in /usr/local/lib/python3.10/dist-packages (from bamt) (0.5.2)
Collecting numpy>=1.24.2 (from bamt)
  Using cached numpy-1.25.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
Collecting pandas==1.5.2 (from bamt)
  Using cached pandas-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.1 MB)
Collecting pgmpy==0.1.20 (from bamt)
  Using cached pgmpy-0.1.20-py3-none-any.whl (1.9 MB)
Collecting pomegranate==0.14.8 (from bamt)
  Using cached pomegranate-0.14.8.tar.gz (4.3 MB)
  Installing build dependencies ... done
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Изменение изначального датафрейма методом fit_parameters при работе с CompositeBN

При обучении параметров композитной БС энкодер изменяет строки изначального датасета
Происходит этот тут:
https://github.com/aimclub/BAMT/blob/393275ef49d87165155e19b9330e68aefb313344/bamt/networks/base.py#L508C1-L508C1

Это создаёт проблему с сэмплированием при наличии какого - либо evidence. Для того чтобы сэмплировать приходится указывать закодированный evidence, а не оригинальный.

Изменение датасета также усложняет дальнейшую работу с ним. Если пользователь хочет что-то сделать с оригинальными данными после обучения БС, ему придется их перечитывать.

Predict after saving the model

There is a problem with save() method. After method call, regressors in distributions field changes to their pickle serialization.

bn = ContinuousBN(use_mixture=False, has_logit=False)
bn.add_nodes(DESCRIPTOR)
bn.add_edges(DISCRETIZED_DATA)
bn.fit_parameters(DATA)

bn.save(BN_NAME, MODELS_DIR)

bn.predict(TEST_DATA)

After calling predict() method we've got an error:

...
        if pvals is not None:
            for el in pvals:
                if str(el) == "nan":
                    return np.nan
            model = node_info["regressor_obj"]
>           pred = model.predict(pvals)
E           AttributeError: 'str' object has no attribute 'predict'

Because now instead of CatBoostRegressor object (for example) in node_info["regressor_obj"] we have something like this: "\x80\x04\x95w\x00\x00\x00\x00\x00\x00\x00\x8c\rcatbo..."