vgarciasc / hundred-hammers Goto Github PK

View Code? Open in Web Editor NEW

8.0 2.0 1.0 4.62 MB

Quickly try out several ML models on a given dataset

License: MIT License

Python 92.72% Jupyter Notebook 7.28%

ai artificial-intelligence machine-learning ml python sklearn

hundred-hammers's Introduction

Hundred Hammers

"At least one of them is bound to do the trick."

Hundred Hammers is a Python package that helps you batch-test ML models in a dataset. It can be used out-of-the-box to run most popular ML models and metrics, or it can be easily extended to include your own.

Supports both classification and regression.
Already comes strapped with most sci-kit learn models.
Already comes with several plots to visualize the results.
Easy to integrate with parameter tuning from GridSearch CV.
Already gives you the average metrics from training, test, validation (train) and validation (test) sets.
Allows you to define how many seeds to consider, so you can increase the significance of your results.
Produces a Pandas DataFrame with the results (which can be exported to CSV and analyzed elsewhere).

Installation

The recommended way to install the library is through pip install hundred_hammers. However, if you want to fiddle around with the repo yourself, you can clone this repository, and run pip install -e hundred_hammers/

Documentation

The documentation can be found in ReadTheDocs. Code is formatted using Black with line length 150.

Examples

Full examples can be found in the examples directory. As an appetizer, here's a simple one of how to use Hundred Hammers to run a batch classification on Iris data:

from hundred_hammers.classifier import HundredHammersClassifier
from hundred_hammers.plots import plot_batch_results
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

hh = HundredHammersClassifier()
df_results = hh.evaluate(X, y)

plot_batch_results(df_results, metric_name="Accuracy", title="Iris Dataset")

This already gives us a DataFrame with the results from several different models, and a nice plot of the results:

Other plots

We can also use Hundred Hammers to produce nice confusion matrices plots and regression predictions:

from hundred_hammers.plots import plot_confusion_matrix
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

data = load_iris()
X, y = data.data, data.target
plot_confusion_matrix(X, y, class_dict={0: "Setosa", 1: "Versicolor", 2: "Virginica"},
                      model=DecisionTreeClassifier(), title="Iris Dataset")

from hundred_hammers.plots import plot_regression_pred
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error
from sklearn.dummy import DummyRegressor

data = load_diabetes()
X, y = data.data, data.target
plot_regression_pred(X, y, models=[DummyRegressor(strategy='median'), best_model], metric=mean_squared_error,
                     title="Diabetes", y_label="Diabetes (Value)")

Finally, it is also possible to compare different datasets and compare their results (each dot is a model).

data = load_iris()
X, y = data.data, data.target

hh = HundredHammersClassifier()

df = []
for i, feature_name in enumerate(data.feature_names):
    X_i = X[:, [j for j in range(X.shape[1]) if j != i]]

    for degree in range(8):
        df_i = hh.evaluate(X_i ** degree, y, optim_hyper=False)
        df_i["Dataset"] = f"$X^{degree}$, w/out $x_{i}$"
        df.append(df_i)

df_results = pd.concat(df, ignore_index=True)
plot_multiple_datasets(df_results, metric_name="Avg ACC (Validation Test)", id_col="Dataset", title="Iris Dataset", display=True)

How is the data used?

By default, Hundred Hammers will split the data into train and test. If the user defines a normalization procedure (through the input_transform parameter), then normalization will be fitted to the training data and applied to both partitions. Next, if the user enabled hyperparameter optimization, the training data is used to fit the hyperparameters of each model, through a Grid Search with n_folds_tune folds. The model is then trained on the training data and evaluated on both partitions to produce the train and test results.

As is standard in ML, the training data is also used in a cross-validation fashion, according to the cross-validator passed by the user (through the cross_validator parameter). The user-defined metrics are then average over the cross-validation folds to produced both validation train and validation test results.

Two DataFrames are provided to the user: a full report (hh._full_report) with the results for each model, seed, and cross-validation fold; and a summary report (hh._report)with the average results for each model.

Furthermore, with flexibility in mind, Hundred Hammers also allows the user to define how many seeds will be tested and averaged for both training and validation splitting. This is done through the n_train_evals and n_val_evals parameters, which are both 1 by default (i.e. a single train/test split is done, and inside the training data, a single cross-validation scheme is run).

Since the usage of data is key, we provide below an image to illustrate how the data is used:

hundred-hammers's People

Contributors

Stargazers

Watchers

Forkers

eugeniolr

hundred-hammers's Issues

Data snooping on hyperparameter optimization

When doing hyperparameter optimization, every point in the dataset is used; however, some of those points later are used to calculate the test performance.

def evaluate(self, X, y, optim_hyper, n_grid_points):
  if optim_hyper:
      new_models = self.tune_models(X, y, n_grid_points)
  else:
      new_models = deepcopy(self.models)
  
  report, trained_models = self._evaluate_models(X, y, new_models)
  ...

I hadn't thought about this until now, but I think it qualifies as data snooping. I think that the correct way would be to tune a model only on the training data, and use it to evaluate the training and test data. This would involve some non-trivial rewriting, but it seems to be the only correct way to proceed, since the current results with optim_hyper=True may not be reproducible for true out-of-sample data. What do you think?

Integrate basic PyTorch

Maybe it is a bit unreasonable, but @eugenioLR found this repo: https://github.com/skorch-dev/skorch

It allows you to run PyTorch models like sklearn.

Maybe you can integrate some basic PyTorch models to HundredHammers by this way?

Add normalization options

It is very common to normalize data before training any models. However it is necessary to do this carefully, using the data in the training set to normalize the test set. Since HH abstracts away the whole training/testing separation process, it is not possible for the user to normalize data correctly given the current state of the library. It would be interesting to allow the user to determine something similar to a sklearn Pipeline.

Using sphinx to generate the documentation as an HTML page

Sphinx is a very popular library used to gather all the docstrings of each function and class into a single document/web page.

When this is done, we can also consider uploading the documentation to the internet, probably through "readthedocs".

Change progress bar backend from `rich` to `tqdm`

As noted in Issue #11, rich doesn't play too well with Jupyter notebooks. It would be best to replace it with tqdm.

Provide an optional progress bar

It would be useful to provide an optional progress bar when calling evaluate. Not for aesthetic pleasure, but mainly to help users ascertain the ETA for the training to conclude. Libraries like rich and tqdm make this very easy.

bug: float object is not subscriptable

Hello one more time!

I was trying the notebooks from examples, and I love it. But, I have an error with one of the plots (the other ones work perfect).

It was the plot_batch_results on example_regression.ipynb notebook.

This is the Trace of the error. I'm using a specific conda enviroment for hundred-hammers 1.0.5, so if you need it, I can provide information of pkgs installed, python version or other thinks like that. Maybe I'm the problem and I miss some kind of dependency, I don't know.

TypeError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 hh.plot_batch_results(df_results, metric_name="MSE", title="Iris Dataset", display=False)

File ~/anaconda3/envs/HH/lib/python3.10/site-packages/hundred_hammers/plots.py:165, in plot_batch_results(df, metric_name, title, filepath, display)
    163     text = plt.text(row[x_label] + 0.005, row[y_label] + 0.001, row["Model"], fontsize=12)
    164     texts.append(text)
--> 165 adjust_text(texts, arrowprops=dict(arrowstyle="-", color='k', lw=0.5), force_text=1.0, force_points=1.0)
    167 plt.title(title, fontweight='bold')
    168 plt.grid()

File ~/anaconda3/envs/HH/lib/python3.10/site-packages/adjustText/__init__.py:536, in adjust_text(texts, x, y, objects, avoid_self, force_text, force_static, force_pull, force_explode, expand, explode_radius, ensure_inside_axes, expand_axes, only_move, ax, min_arrow_len, time_lim, iter_lim, *args, **kwargs)
    533 i = 0
    534 while error > 0:
    535     # expand = expands[min(i, expand_steps-1)]
--> 536     coords, error = iterate(
    537         coords,
    538         orig_xy_disp_coord,
    539         static_coords,
    540         force_text=force_text,
    541         force_static=force_static,
    542         force_pull=force_pull,
    543         expand=expand,
    544         bbox_to_contain=ax_bbox,
...
--> 279 text_shifts_x *= force_text[0]
    280 text_shifts_y *= force_text[1]
    281 static_shifts_x *= force_static[0]

TypeError: 'float' object is not subscriptable

Rewrite examples as Jupyter Notebooks

The current examples are great: simple, clear, and each shows off one feature of the project. However I think they would benefit from being a Jupyter Notebook, since this would allow people to take a look at the results without running the code themselves.

Add project to PyPI

Eventually, the project should be added to PyPI to facilitate installation.

Using mean and std of train/test split repetitions, but the split is only done once.

We are taking the mean and std of the scores obtained by the models in each repetition of cross validation, where we evaluate the models with the train and test data sets.

This last split is only done once, but we evaluate the models a number of times, which leads to a std of 0 in those metrics.

Examples in Documentation

The notebooks from examples are very good.

I think it would be more clear and user-friendly if the ReadTheDocs documentation have the .ipynb examples as web pages. What I mean is use the Jupyter Notebook integration to make it looks like something like this.

Using black to automatically format the code

I recently found the black python package that automatically formats python code to follow the PEP guidelines. It also makes the linter complain less, which is good too. It also works for jupyter notebooks which is really cool.

I also find the maximum default line length to be a bit short, i personally like to use lines of 150 characters. This is specified using the --line-length argument in the command line.

I think we can use this from now on if you want. But it's better if we start using it once all the pull requests are closed, since it changes a lot of code and otherwise it will be hard to see which changes were just for formatting and which ones were for the actual pull request.