Giter VIP home page Giter VIP logo

hundred-hammers's Introduction

Hundred Hammers

"At least one of them is bound to do the trick."

Hundred Hammers is a Python package that helps you batch-test ML models in a dataset. It can be used out-of-the-box to run most popular ML models and metrics, or it can be easily extended to include your own.

  • Supports both classification and regression.
  • Already comes strapped with most sci-kit learn models.
  • Already comes with several plots to visualize the results.
  • Easy to integrate with parameter tuning from GridSearch CV.
  • Already gives you the average metrics from training, test, validation (train) and validation (test) sets.
  • Allows you to define how many seeds to consider, so you can increase the significance of your results.
  • Produces a Pandas DataFrame with the results (which can be exported to CSV and analyzed elsewhere).

Installation

The recommended way to install the library is through pip install hundred_hammers. However, if you want to fiddle around with the repo yourself, you can clone this repository, and run pip install -e hundred_hammers/

Documentation

The documentation can be found in ReadTheDocs. Code is formatted using Black with line length 150.

Examples

Full examples can be found in the examples directory. As an appetizer, here's a simple one of how to use Hundred Hammers to run a batch classification on Iris data:

from hundred_hammers.classifier import HundredHammersClassifier
from hundred_hammers.plots import plot_batch_results
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

hh = HundredHammersClassifier()
df_results = hh.evaluate(X, y)

plot_batch_results(df_results, metric_name="Accuracy", title="Iris Dataset")

This already gives us a DataFrame with the results from several different models, and a nice plot of the results:

Other plots

We can also use Hundred Hammers to produce nice confusion matrices plots and regression predictions:

from hundred_hammers.plots import plot_confusion_matrix
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

data = load_iris()
X, y = data.data, data.target
plot_confusion_matrix(X, y, class_dict={0: "Setosa", 1: "Versicolor", 2: "Virginica"},
                      model=DecisionTreeClassifier(), title="Iris Dataset")

from hundred_hammers.plots import plot_regression_pred
from sklearn.datasets import load_diabetes
from sklearn.metrics import mean_squared_error
from sklearn.dummy import DummyRegressor

data = load_diabetes()
X, y = data.data, data.target
plot_regression_pred(X, y, models=[DummyRegressor(strategy='median'), best_model], metric=mean_squared_error,
                     title="Diabetes", y_label="Diabetes (Value)")

Finally, it is also possible to compare different datasets and compare their results (each dot is a model).

data = load_iris()
X, y = data.data, data.target

hh = HundredHammersClassifier()

df = []
for i, feature_name in enumerate(data.feature_names):
    X_i = X[:, [j for j in range(X.shape[1]) if j != i]]

    for degree in range(8):
        df_i = hh.evaluate(X_i ** degree, y, optim_hyper=False)
        df_i["Dataset"] = f"$X^{degree}$, w/out $x_{i}$"
        df.append(df_i)

df_results = pd.concat(df, ignore_index=True)
plot_multiple_datasets(df_results, metric_name="Avg ACC (Validation Test)", id_col="Dataset", title="Iris Dataset", display=True)

How is the data used?

By default, Hundred Hammers will split the data into train and test. If the user defines a normalization procedure (through the input_transform parameter), then normalization will be fitted to the training data and applied to both partitions. Next, if the user enabled hyperparameter optimization, the training data is used to fit the hyperparameters of each model, through a Grid Search with n_folds_tune folds. The model is then trained on the training data and evaluated on both partitions to produce the train and test results.

As is standard in ML, the training data is also used in a cross-validation fashion, according to the cross-validator passed by the user (through the cross_validator parameter). The user-defined metrics are then average over the cross-validation folds to produced both validation train and validation test results.

Two DataFrames are provided to the user: a full report (hh._full_report) with the results for each model, seed, and cross-validation fold; and a summary report (hh._report)with the average results for each model.

Furthermore, with flexibility in mind, Hundred Hammers also allows the user to define how many seeds will be tested and averaged for both training and validation splitting. This is done through the n_train_evals and n_val_evals parameters, which are both 1 by default (i.e. a single train/test split is done, and inside the training data, a single cross-validation scheme is run).

Since the usage of data is key, we provide below an image to illustrate how the data is used:

hundred-hammers's People

Contributors

eugeniolr avatar vgarciasc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

eugeniolr

hundred-hammers's Issues

Data snooping on hyperparameter optimization

When doing hyperparameter optimization, every point in the dataset is used; however, some of those points later are used to calculate the test performance.

def evaluate(self, X, y, optim_hyper, n_grid_points):
  if optim_hyper:
      new_models = self.tune_models(X, y, n_grid_points)
  else:
      new_models = deepcopy(self.models)
  
  report, trained_models = self._evaluate_models(X, y, new_models)
  ...

I hadn't thought about this until now, but I think it qualifies as data snooping. I think that the correct way would be to tune a model only on the training data, and use it to evaluate the training and test data. This would involve some non-trivial rewriting, but it seems to be the only correct way to proceed, since the current results with optim_hyper=True may not be reproducible for true out-of-sample data. What do you think?

Add normalization options

It is very common to normalize data before training any models. However it is necessary to do this carefully, using the data in the training set to normalize the test set. Since HH abstracts away the whole training/testing separation process, it is not possible for the user to normalize data correctly given the current state of the library. It would be interesting to allow the user to determine something similar to a sklearn Pipeline.

Using sphinx to generate the documentation as an HTML page

Sphinx is a very popular library used to gather all the docstrings of each function and class into a single document/web page.

When this is done, we can also consider uploading the documentation to the internet, probably through "readthedocs".

Provide an optional progress bar

It would be useful to provide an optional progress bar when calling evaluate. Not for aesthetic pleasure, but mainly to help users ascertain the ETA for the training to conclude. Libraries like rich and tqdm make this very easy.

bug: float object is not subscriptable

Hello one more time!

I was trying the notebooks from examples, and I love it. But, I have an error with one of the plots (the other ones work perfect).

It was the plot_batch_results on example_regression.ipynb notebook.

This is the Trace of the error. I'm using a specific conda enviroment for hundred-hammers 1.0.5, so if you need it, I can provide information of pkgs installed, python version or other thinks like that. Maybe I'm the problem and I miss some kind of dependency, I don't know.

TypeError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 hh.plot_batch_results(df_results, metric_name="MSE", title="Iris Dataset", display=False)

File ~/anaconda3/envs/HH/lib/python3.10/site-packages/hundred_hammers/plots.py:165, in plot_batch_results(df, metric_name, title, filepath, display)
    163     text = plt.text(row[x_label] + 0.005, row[y_label] + 0.001, row["Model"], fontsize=12)
    164     texts.append(text)
--> 165 adjust_text(texts, arrowprops=dict(arrowstyle="-", color='k', lw=0.5), force_text=1.0, force_points=1.0)
    167 plt.title(title, fontweight='bold')
    168 plt.grid()

File ~/anaconda3/envs/HH/lib/python3.10/site-packages/adjustText/__init__.py:536, in adjust_text(texts, x, y, objects, avoid_self, force_text, force_static, force_pull, force_explode, expand, explode_radius, ensure_inside_axes, expand_axes, only_move, ax, min_arrow_len, time_lim, iter_lim, *args, **kwargs)
    533 i = 0
    534 while error > 0:
    535     # expand = expands[min(i, expand_steps-1)]
--> 536     coords, error = iterate(
    537         coords,
    538         orig_xy_disp_coord,
    539         static_coords,
    540         force_text=force_text,
    541         force_static=force_static,
    542         force_pull=force_pull,
    543         expand=expand,
    544         bbox_to_contain=ax_bbox,
...
--> 279 text_shifts_x *= force_text[0]
    280 text_shifts_y *= force_text[1]
    281 static_shifts_x *= force_static[0]

TypeError: 'float' object is not subscriptable

Rewrite examples as Jupyter Notebooks

The current examples are great: simple, clear, and each shows off one feature of the project. However I think they would benefit from being a Jupyter Notebook, since this would allow people to take a look at the results without running the code themselves.

Add project to PyPI

Eventually, the project should be added to PyPI to facilitate installation.

Examples in Documentation

The notebooks from examples are very good.

I think it would be more clear and user-friendly if the ReadTheDocs documentation have the .ipynb examples as web pages. What I mean is use the Jupyter Notebook integration to make it looks like something like this.

Using black to automatically format the code

I recently found the black python package that automatically formats python code to follow the PEP guidelines. It also makes the linter complain less, which is good too. It also works for jupyter notebooks which is really cool.

I also find the maximum default line length to be a bit short, i personally like to use lines of 150 characters. This is specified using the --line-length argument in the command line.

I think we can use this from now on if you want. But it's better if we start using it once all the pull requests are closed, since it changes a lot of code and otherwise it will be hard to see which changes were just for formatting and which ones were for the actual pull request.

Progress bars are still broken on jupyter notebooks

I was using the library on a notebook executed on the jupyter-lab app and the progress bar that is shown to the use is broken and unreadable.

image

The solution might be, once again, to switch to another library for displaying progress bars. I'll see if i can figure something out.

Update README.md with optim_hyper

While I don't think is necessary to show off every feature in the README, this is important enough to merit a mention, so the user can see how easy it is to optimize hyperparameters (and the impact this has).

Add more types of hyperparameter optimization

As commented in #1, defining the type and bounds of the hyperparameters allows us to use different hyperparameter optimization techniques.

To start, we could implement Random search and a simple hill climbing algorithm. This implementation should allow adding new methods easily.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.