jakecoltman / bartpy Goto Github PK

View Code? Open in Web Editor NEW

213.0 12.0 42.0 12.22 MB

Bayesian Additive Regression Trees For Python

Home Page: https://jakecoltman.github.io/bartpy/

License: MIT License

Python 100.00%

bartpy's Introduction

BartPy

Introduction

BartPy is a pure python implementation of the Bayesian additive regressions trees model of Chipman et al [1].

Reasons to use BART

Much less parameter optimization required that GBT
Provides confidence intervals in addition to point estimates
Extremely flexible through use of priors and embedding in bigger models

Reasons to use the library:

Can be plugged into existing sklearn workflows
Everything is done in pure python, allowing for easy inspection of model runs
Designed to be extremely easy to modify and extend

Trade offs:

Speed - BartPy is significantly slower than other BART libraries
Memory - BartPy uses a lot of caching compared to other approaches
Instability - the library is still under construction

How to use:

There are two main APIs for BaryPy:

High level sklearn API
Low level access for implementing custom conditions

If possible, it is recommended to use the sklearn API until you reach something that can't be implemented that way. The API is easier, shared with other models in the ecosystem, and allows simpler porting to other models.

Sklearn API

The high level API works as you would expect

from bartpy.sklearnmodel import SklearnModel
model = SklearnModel() # Use default parameters
model.fit(X, y) # Fit the model
predictions = model.predict() # Make predictions on the train set
out_of_sample_predictions = model.predict(X_test) # Make predictions on new data

The model object can be used in all of the standard sklearn tools, e.g. cross validation and grid search

from bartpy.sklearnmodel import SklearnModel
model = SklearnModel() # Use default parameters
cross_validate(model)

Extensions

BartPy offers a number of convenience extensions to base BART. The most prominent of these is using BART to predict the residuals of a base model. It is most natural to use a linear model as the base, but any sklearn compatible model can be used

from bartpy.extensions.baseestimator import ResidualBART
model = ResidualBART(base_estimator=LinearModel())
model.fit(X, y)

A nice feature of this is that we can combine the interpretability of a linear model with the power of a trees model

Lower level API

BartPy is designed to expose all of its internals, so that it can be extended and modifier. In particular, using the lower level API it is possible to:

Customize the set of possible tree operations (prune and grow by default)
Control the order of sampling steps within a single Gibbs update
Extend the model to include additional sampling steps

Some care is recommended when working with these type of changes. Through time the process of changing them will become easier, but today they are somewhat complex

If all you want to customize are things like priors and number of trees, it is much easier to use the sklearn API

Alternative libraries

References

[1] https://arxiv.org/abs/0806.3286 [2] http://www.gatsby.ucl.ac.uk/~balaji/pgbart_aistats15.pdf [3] https://arxiv.org/ftp/arxiv/papers/1309/1309.1906.pdf [4] https://cran.r-project.org/web/packages/BART/vignettes/computing.pdf

bartpy's People

Contributors

Stargazers

Watchers

bartpy's Issues

TypeError trying to run test code from README.md

I'm using the master branch, and getting the following error when trying to run the basic code from README.md on my own data:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<timed exec> in <module>

~\AppData\Local\Continuum\anaconda3\envs\pymc3\lib\site-packages\bartpy\sklearnmodel.py in fit(self, X, y)
    131             self with trained parameter values
    132         """
--> 133         self.model = self._construct_model(X, y)
    134         self.extract = Parallel(n_jobs=self.n_jobs)(self.f_delayed_chains(X, y))
    135         self.combined_chains = self._combine_chains(self.extract)

~\AppData\Local\Continuum\anaconda3\envs\pymc3\lib\site-packages\bartpy\sklearnmodel.py in _construct_model(self, X, y)
    157         if len(X) == 0 or X.shape[1] == 0:
    158             raise ValueError("Empty covariate matrix passed")
--> 159         self.data = self._convert_covariates_to_data(X, y)
    160         self.sigma = Sigma(self.sigma_a, self.sigma_b, self.data.normalizing_scale)
    161         self.model = Model(self.data, self.sigma, n_trees=self.n_trees, alpha=self.alpha, beta=self.beta)

~\AppData\Local\Continuum\anaconda3\envs\pymc3\lib\site-packages\bartpy\sklearnmodel.py in _convert_covariates_to_data(X, y)
    152             X: pd.DataFrame = X
    153             X = X.values
--> 154         return Data(deepcopy(X), deepcopy(y), mask=np.zeros_like(X).astype(bool), normalize=True)
    155 
    156     def _construct_model(self, X: np.ndarray, y: np.ndarray) -> Model:

TypeError: __init__() got an unexpected keyword argument 'mask'

Removing mask=np.zeros_like(X).astype(bool) from line 154 of sklearnmodel.py eliminates the error.

Unable to access ```bartpy.samplers```

I am having error of missing module when I try to import bartpy.sklearnmodel. This is because a module bartpy.samplers is called in that file but not able to import

This is the error I am getting:

[/usr/local/lib/python3.10/dist-packages/bartpy/sklearnmodel.py](https://localhost:8080/#) in <module>
      9 from bartpy.data import Data
     10 from bartpy.model import Model
---> 11 from bartpy.samplers.leafnode import LeafNodeSampler
     12 from bartpy.samplers.modelsampler import ModelSampler, Chain
     13 from bartpy.samplers.schedule import SampleSchedule

ModuleNotFoundError: No module named 'bartpy.samplers'

This is unusual because I checked the library and the module is present

Tuple index out of range

I am using BarPy on Ubuntu with Python 3.7.3. I tried the following code snippet:

model = SklearnModel()
model.fit(np.array([1.0, 0.2, 3.0, 4.0]), np.array([1, 0, 0, 1]))

I got a "tuple index out of range" error.

Does anyone know how to fix this error ?

predicting confidence intervals

Hi! One of the advantages of BART is that it provides confidence intervals (in addition to point estimates). Is there a way to obtain confidence intervals of model predictions using the bartpy library?

Questions about the calculation of likelihood ratio

Hello,
Thank you for your contributions, which is a great help for me to understanding the BART model. But I found something questionable about the calculation of the likelihood ratio when sampling to modify the tree structure. Please take a look at the function log_grow_ratio in the file bartpy/samplers/unconstrainedtree/likelihoodratio.py, line 20

first_term = (var * (var + n * sigma_mu)) / ((var + n_l * var_mu) * (var + n_r * var_mu))

I think there are something wrong with n * sigma_mu. It should be corrected as
first_term = (var * (var + n * var_mu)) / ((var + n_l * var_mu) * (var + n_r * var_mu))

Same question is also found in the log_grow_function in the file bartpy/samplers/oblivioustrees/likelihoodratio.py

My thoughts are based on the paper bartMachine: Machine Learning with Bayesian Additive Regression Tree, A.1. part. You will be greatly appreciated to point that out if I am wrong. Hope my thoughts be helpful to you.

Difference with the alternative bartMachine

Hello,
It is quite interesting to have an implementation of BART in python. However, when I tried this implementation with its alternative in R "bartMachine", the alternative was giving more promising results.
Can you tell, if you had the time to explore it of course, the difference between your implementation and the one of bartMachine?

Thank you in advance :)

Validation of assumptions

BART makes assumptions about the residuals term, which BartPy should have the capability of validating.

In particular:

errors are normally distributed
errors are homoskedastic

Given that the validity of these two assumptions doesn't depend on the details of the regressors (and hence trees), it seems likely that this can be handed off to existing packages. I'd guess with some interface layer, statsmodels could be used for this

Cannot import module after successfully installing with pip

I'm not an expert, but I think that the package is not configured correctly.

To reproduce:

pip install bartpy
python
>>> import bartpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'bartpy'

Note that this works (hence my thought that the package configuration is the problem)

git clone https://github.com/JakeColtman/bartpy ~/bin/bartpy
python
>>> import sys
>>> sys.path.append("<home>/bin/bartpy")
>>> import bartpy

Cannot import bartpy.samplers

from bartpy.sklearnmodel import SklearnModel
from bartpy.featureselection import SelectNullDistributionThreshold, SelectSplitProportionThreshold
from bartpy.diagnostics.features import *

ModuleNotFoundError Traceback (most recent call last)
in
----> 1 from bartpy.sklearnmodel import SklearnModel
2 from bartpy.featureselection import SelectNullDistributionThreshold, SelectSplitProportionThreshold
3 from bartpy.diagnostics.features import *

~/miniconda3/envs/viz/lib/python3.7/site-packages/bartpy/sklearnmodel.py in
9 from bartpy.data import Data
10 from bartpy.model import Model
---> 11 from bartpy.samplers.leafnode import LeafNodeSampler
12 from bartpy.samplers.modelsampler import ModelSampler, Chain
13 from bartpy.samplers.schedule import SampleSchedule

ModuleNotFoundError: No module named 'bartpy.samplers'

from bartpy.samplers import *

ModuleNotFoundError Traceback (most recent call last)
in
----> 1 from bartpy.samplers import *

ModuleNotFoundError: No module named 'bartpy.samplers'

Softbart

Has there been any consideration given to implementing softbart?

Binary Outcomes and Random Intercepts

Are there plans to add dichotomous outcomes and random intercepts to this model? Another question is how one would incorporate the random intercepts to conditional means for prediction of the dichotomous outcome? Thanks!

https://github.com/vdorie/dbarts/blob/4ed4eafc772e95d788d5a135d9f4e4728b9516ec/R/rbart.R#L5

Model predicting NaN

Hi,

Thank you for bart-py!

My BART model is predicting NaN for some cases. Does anyone know why this happens? or how I can prevent this?

My data has missing data but to my knowledge, BART can handle this. My data are finite.

Thank you!

Code:
(Sorry for the lengthy data generation)

import pandas as pd
import numpy as np
import random
import bartpy
from bartpy.sklearnmodel import SklearnModel
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold

# simulate df with 46 features and 9000 rows
# create binary vars and make a df
label=np.random.randint(2, size=9000)
df = pd.DataFrame({'label':label})
df['a']=np.random.randint(2, size=9000)

# create integers
df['b'] = np.random.randint(low=50, high=96, size=9000)
df['b'] = np.random.randint(low=4, high=97, size=9000)
df['c'] = np.random.randint(low=0, high=1759.22, size=9000)
df['d'] = np.random.randint(low=0, high=5702.2, size=9000)
df['e'] = np.random.randint(low=0, high=7172.31, size=9000)

# create numerics
df['f'] = np.random.uniform(0, 908.56, 9000)
df['f'] = np.random.uniform(0,908.56, 9000)
df['g'] = np.random.uniform(0,2508.78, 9000)
df['h'] = np.random.uniform(0,3757.56, 9000)
df['i'] = np.random.uniform(0,560.18, 9000)
df['j'] = np.random.uniform(0,1362.71, 9000)
df['k'] = np.random.uniform(0,2578.26, 9000)
df['l'] = np.random.uniform(175.07,997, 9000)
df['m'] = np.random.uniform(992.39,3972.81, 9000)
df['n'] = np.random.uniform(1787.24,5823.21, 9000)
df['o'] = np.random.uniform(-56,53, 9000)
df['p'] = np.random.uniform(-47,46, 9000)
df['q'] = np.random.uniform(-1089.03,1546.87, 9000)
df['r'] = np.random.uniform(-1599.14,898.79, 9000)
df['s'] = np.random.uniform(-2871.02,5329, 9000)
df['t'] = np.random.uniform(-4231.44,2481.55, 9000)
df['u'] = np.random.uniform(-3435.9,5824.22, 9000)
df['v'] = np.random.uniform(-5086.6,4548.43, 9000)
df['w'] = np.random.uniform(-406.57,907.91, 9000)
df['x'] = np.random.uniform(-834.82,840.27, 9000)
df['y'] = np.random.uniform(-549.2,2506.29, 9000)
df['z'] = np.random.uniform(-1547.2,2434.18, 9000)
df['aa'] = np.random.uniform(-426.6,3636.17, 9000)
df['bb'] = np.random.uniform(-2819.8,3390, 9000)
df['cc'] = np.random.uniform(-266.75,527.81, 9000)
df['dd'] = np.random.uniform(-778.64,527.81, 9000)
df['ee'] = np.random.uniform(-476.09,1358.32, 9000)
df['ff'] = np.random.uniform(-1890.91,919.3, 9000)
df['gg'] = np.random.uniform(-1633.23,2577.01, 9000)
df['hh'] = np.random.uniform(-2427.93,2078.78, 9000)
df['ii'] = np.random.uniform(-339.67,518.32, 9000)
df['jj'] = np.random.uniform(-528.07,412, 9000)
df['kk'] = np.random.uniform(-1460.23,1610.58, 9000)
df['ll'] = np.random.uniform(-1984.08,1127.82, 9000)
df['mm'] = np.random.uniform(-2153.38,2402.24, 9000)
df['nn'] = np.random.uniform(-2311.27,1809.37, 9000)
df['oo'] = np.random.uniform(16,92, 9000)
df['pp'] = np.random.uniform(4,24, 9000)
df['qq'] = np.random.uniform(4,80, 9000)
df['rr'] = np.random.uniform(0,1, 9000)

# add missings to floats
# select only numeric columns to apply the missingness to
cols_list = df.select_dtypes('float64').columns.tolist()
        
# randomly remove cases from the dataframe
for col in df[cols_list]:
    df.loc[df.sample(frac=0.02).index, col] = np.nan

# # 80/20 train test split
X_train, X_test, y_train, y_test = train_test_split(df.drop(['label'],axis=1), df['label'], train_size=0.7, random_state = 99)

# Modelling
model = SklearnModel(n_jobs = 30) 
model.fit(X_train, y_train) 

# Predictions
y_predictions = model.predict(X_test)
np.isnan(y_predictions).sum()

What should be the shape of inputs, X and Y?

What should be the shape of inputs, X and Y?
Because when X is with shape (4802, 79), which means 4802 samples and 79 covariates per sample, and Y is with shape (4802), there will be an error:

IndexError: too many indices for array

when I use SklearnModel.

Looking for answers.

Accessing the Ensemble

Hi. I am new to BART implementation. I am not sure if the BART implementation stores the trees of the ensemble, but if it does can someone tell me how can I access them?

Thanks

UnboundLocalError: local variable 'mutation' referenced before assignment

Hi Jake,
When running example code as beflow:
`import numpy as np
import pandas as pd

from matplotlib import pyplot as plt

from bartpy.sklearnmodel import SklearnModel

x = np.random.normal(0, 0.5, size=1000)
X = pd.DataFrame({"x": x})
y = 2 * x + 4 * np.power(x, 3)
plt.scatter(x, y)

model = SklearnModel(n_jobs=1)

model.fit(X, y)
plt.scatter(x, model.predict())
plt.scatter(x, y)`

I got the following error:
0%| | 0/200 [00:00<?, ?it/s]Starting burn
Traceback (most recent call last):

File "", line 15, in
model.fit(X, y)

File ".../BART/bartpy-master/bartpy-master/bartpy/sklearnmodel.py", line 134, in fit
self.extract = Parallel(n_jobs=self.n_jobs)(self.f_delayed_chains(X, y))

File ".../anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 921, in call
if self.dispatch_one_batch(iterator):

File ".../anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)

File ".../anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)

File ".../anaconda3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)

File ".../anaconda3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 549, in init
self.results = batch()

File ".../anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]

File ".../anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]

File ".../BART/bartpy-master/bartpy-master/bartpy/sklearnmodel.py", line 31, in run_chain
model.store_acceptance_trace)

File ".../BART/bartpy-master/bartpy-master/bartpy/samplers/modelsampler.py", line 43, in samples
self.step(model, trace_logger)

File ".../BART/bartpy-master/bartpy-master/bartpy/samplers/modelsampler.py", line 26, in step
result = step()

File ".../BART/bartpy-master/bartpy-master/bartpy/samplers/schedule.py", line 48, in
yield "Tree", lambda: self.tree_sampler.step(model, tree)

File ".../BART/bartpy-master/bartpy-master/bartpy/samplers/unconstrainedtree/treemutation.py", line 47, in step
mutation = self.sample(model, tree)

File ".../BART/bartpy-master/bartpy-master/bartpy/samplers/unconstrainedtree/treemutation.py", line 40, in sample
ratio = self.likihood_ratio.log_probability_ratio(model, tree, proposal)

File ".../BART/bartpy-master/bartpy-master/bartpy/samplers/treemutation.py", line 80, in log_probability_ratio
return self.log_transition_ratio(tree, mutation) + self.log_likihood_ratio(model, tree, mutation) + self.log_tree_ratio(model, tree, mutation)

File ".../BART/bartpy-master/bartpy-master/bartpy/samplers/unconstrainedtree/likihoodratio.py", line 69, in log_likihood_ratio
#mutation: GrowMutation = mutation

UnboundLocalError: local variable 'mutation' referenced before assignment

Do you know where the problem is? Thank you.
xiangwei

New PyPi release?

Thank you so much for this cool package.

Would you consider creating a new PyPi release, to incorporate recent commits?

Thanks again!

ModuleNotFoundError: No module named 'bartpy.samplers'

On Python 3.6 I run into the following:

Traceback (most recent call last):
    import bartpy.sklearnmodel
    from bartpy.samplers.leafnode import LeafNodeSampler
ModuleNotFoundError: No module named 'bartpy.samplers'

This is for the PyPI version (v0.0.2). I also ran into the same issue with installing directly from github

UnboundLocalError: local variable 'mutation' referenced before assignment

Hi,

I just installed the bartpy locally in my machine and I am trying to run the example code ols.py. I am getting the following error, please advise.

0%| | 0/50 [00:00<?, ?it/s]2020-01-27 13:38:26.026932
Starting burn

Traceback (most recent call last):

File "", line 1, in
runfile('/Users/anita/Bayesian/bartpy-master/examples/ols.py', wdir='/Users/anita/Bayesian/bartpy-master/examples')

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/Users/anita/Bayesian/bartpy-master/examples/ols.py", line 26, in
model, x, y = run(0.95, 2., 20, 5)

File "/Users/anita/Bayesian/bartpy-master/examples/ols.py", line 15, in run
model.fit(X, y)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/bartpy-0.0.2-py3.7.egg/bartpy/sklearnmodel.py", line 134, in fit
self.extract = Parallel(n_jobs=self.n_jobs)(self.f_delayed_chains(X, y))

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 921, in call
if self.dispatch_one_batch(iterator):

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 549, in init
self.results = batch()

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/bartpy-0.0.2-py3.7.egg/bartpy/sklearnmodel.py", line 31, in run_chain
model.store_acceptance_trace)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/bartpy-0.0.2-py3.7.egg/bartpy/samplers/modelsampler.py", line 43, in samples
self.step(model, trace_logger)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/bartpy-0.0.2-py3.7.egg/bartpy/samplers/modelsampler.py", line 26, in step
result = step()

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/bartpy-0.0.2-py3.7.egg/bartpy/samplers/schedule.py", line 48, in
yield "Tree", lambda: self.tree_sampler.step(model, tree)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/bartpy-0.0.2-py3.7.egg/bartpy/samplers/unconstrainedtree/treemutation.py", line 47, in step
mutation = self.sample(model, tree)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/bartpy-0.0.2-py3.7.egg/bartpy/samplers/unconstrainedtree/treemutation.py", line 40, in sample
ratio = self.likihood_ratio.log_probability_ratio(model, tree, proposal)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/bartpy-0.0.2-py3.7.egg/bartpy/samplers/treemutation.py", line 80, in log_probability_ratio
return self.log_transition_ratio(tree, mutation) + self.log_likihood_ratio(model, tree, mutation) + self.log_tree_ratio(model, tree, mutation)

File "/Users/anita/opt/anaconda3/lib/python3.7/site-packages/bartpy-0.0.2-py3.7.egg/bartpy/samplers/unconstrainedtree/likihoodratio.py", line 69, in log_likihood_ratio
mutation: GrowMutation = mutation

UnboundLocalError: local variable 'mutation' referenced before assignment

No Module named bartpy.samplers

Open to "import SklearnModel"

import bartpy
from bartpy.sklearnmodel import SklearnModel

ModuleNotFoundError                       Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1232/2717476998.py in <module>
----> 1 from bartpy.sklearnmodel import SklearnModel

~\Anaconda3\envs\env\lib\site-packages\bartpy\sklearnmodel.py in <module>
      9 from bartpy.data import Data
     10 from bartpy.model import Model
---> 11 from bartpy.samplers.leafnode import LeafNodeSampler
     12 from bartpy.samplers.modelsampler import ModelSampler, Chain
     13 from bartpy.samplers.schedule import SampleSchedule

ModuleNotFoundError: No module named 'bartpy.samplers'

Easy way to access PPD from Sklearn API?

Thanks for this implementation - huge time-saver. I'm trying to obtain the structure of the resulting posterior predictive distribution. I've been unable to find an attribute call in SklearnModel() for doing so and am wondering if I'm missing it/looking in the wrong place. TIA

Update pypi with latest version of code

Pypi is a way behind the current master, worth updating it with the new version

Multiple Parallel chains

To assess convergence to the true distribution, it is useful to run several chains from different start points. At the moment, this requires running the model multiple times in parallel.

There are two related parts of this issue:

conceptually support multiple chains in the API
use multiprocessing of the like to parallelize

'numpy.ndarray' object has no attribute 'normalizing_scale'

If I use:
model = SklearnModel()
model.fit(X_train, y_train.to_numpy())
I got 'numpy.ndarray' object has no attribute 'normalizing_scale' this error message.

If I use:
model = SklearnModel
model.fit(X_train, y_train.to_numpy())
I got 'fit() missing 1 required positional argument: 'y'' this error message.

X_train is a pandas dataframe.

How do you sample from the posterior of the BART model?

about an exception

Hi, Jake @JakeColtman ,

After pull bartpy, and running it in the pycharm, I got following errors which prevents me to make forcasting (regression) with X is 50 by 10 and y is 50 by 1
The error is:

Starting burn
0%| | 0/200 [00:00<?, ?it/s]
Starting burn
0%| | 0/200 [00:00<?, ?it/s]
Starting burn
0%| | 0/200 [00:00<?, ?it/s]
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/tairen/.local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
r = call_item()
File "/home/tairen/.local/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 272, in call
return self.fn(*self.args, **self.kwargs)
File "/home/tairen/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 567, in call
return self.func(*args, **kwargs)
File "/home/tairen/.local/lib/python3.6/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/home/tairen/.local/lib/python3.6/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/home/tairen/PycharmProjects/bart/samplers/modelsampler.py", line 24, in samples
self.step(model)
File "/home/tairen/PycharmProjects/bart/samplers/modelsampler.py", line 19, in step
step()
File "/home/tairen/PycharmProjects/bart/samplers/schedule.py", line 47, in
yield lambda: self.tree_sampler.step(model, tree)
File "/home/tairen/PycharmProjects/bart/samplers/treemutation/treemutation.py", line 42, in step
mutation = self.sample(model, tree)
File "/home/tairen/PycharmProjects/bart/samplers/treemutation/treemutation.py", line 36, in sample
if np.log(np.random.uniform(0, 1)) < ratio:
File "/home/tairen/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 1573, in nonzero
.format(self.class.name))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

""The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/tairen/PycharmProjects/bart/testmain.py", line 13, in
model.fit(X, y) # Fit the model
File "/home/tairen/PycharmProjects/bart/sklearnmodel.py", line 110, in fit
self.extract = Parallel(n_jobs=self.n_jobs)(self.delayed_chains(X, y))
File "/home/tairen/.local/lib/python3.6/site-packages/joblib/parallel.py", line 934, in call
self.retrieve()
File "/home/tairen/.local/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/tairen/.local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
return future.result(timeout=timeout)
File "/home/tairen/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/home/tairen/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can you tell me some hints?
Thank you!

Please, add Monotonic Constraints feature

XGBoost recently added the ability to enforce monotonicity constraints on any features used in a boosted model , so I suppose it will be natural enhancement for bartpy.
May be it will help: https://arxiv.org/pdf/1612.01619.pdf
Thx!

Classification

At the moment, BartPy only supports regression. Chapman et al detail how to expand the model to classification

Support feature importance / variable selection

In many real world use cases, it's important to be able to identity truly important features.

Implementing some of the approaches of https://repository.upenn.edu/cgi/viewcontent.cgi?article=1555&context=statistics_papers seems like a good start.

A side constraint is that the solution should be able to scale to large datasets, which might pose a problem for the permutation approach. Possibly it would be useful to have two different modes - a fully principled one and a rough and ready one for large data sets.

fit_predict missing parameter on the call to predict

Shouldn't it be:

self.predict(X)

bartpy/bartpy/sklearnmodel.py

Line 134 in a1a799b

return self.predict()

Feature Request: Converting to Cython

Since the code is written in pure python, a simple step at the end to improve performance is to compile it all using Cython.
It's relatively simple to set up and the speedups are great for looped code, so it might be worth looking into.

sklearn is deprecated

Collecting sklearn
Using cached sklearn-0.0.post12.tar.gz (2.6 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
rather than 'sklearn' for pip commands.

  Here is how to fix this error in the main use cases:
  - use 'pip install scikit-learn' rather than 'pip install sklearn'
  - replace 'sklearn' by 'scikit-learn' in your pip requirements files
    (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
  - if the 'sklearn' package is used by one of your dependencies,
    it would be great if you take some time to track which package uses
    'sklearn' instead of 'scikit-learn' and report it to their issue tracker
  - as a last resort, set the environment variable
    SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error

  More information is available at
  https://github.com/scikit-learn/sklearn-pypi-package
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

jakecoltman / bartpy Goto Github PK

bartpy's Introduction

BartPy

Introduction

Reasons to use BART

Reasons to use the library:

Trade offs:

How to use:

Sklearn API

Extensions

Lower level API

Alternative libraries

References

bartpy's People

Contributors

Stargazers

Watchers

Forkers

bartpy's Issues

Recommend Projects

Recommend Topics

Recommend Org