Giter VIP home page Giter VIP logo

remayn's Introduction

REMAYN: REsults MAde easY in pythoN

remayn is an open-source Python toolkit focused on results management for machine learning experiments. It includes the required functionalities to save the complete results of an experiment, load them, and generate reports.

Overview
CI/CD !codecov !docs
Code !python !black Linter: Ruff

Getting started

⚙️ Installation

remayn is supported by Python >=3.8.

The easiest way to install remayn is via pip:

pip install remayn

💾 Saving the results of a experiment

A new Result object can be created using the make_result function. Then, the Result can be saved to disk by simply calling the save() method.

import numpy as np
from remayn.result import make_result

targets = np.array([1, 2, 3])
predictions = np.array([1.1, 2.2, 3.3])
config = {"model": "linear_regression", "dataset": "iris", "learning_rate": 1e-3}

result = make_result("./results",
                    config=config,
                    targets=targets,
                    predictions=predictions
                    )
result.save()

This will generate an unique identifier for this Result and it will be saved in a subdirectory of the ./results directory.

⌛ Loading a set of results

After saving the results of all the experiments, the set of results can be loaded using the ResultFolder class, as shown in the following snippet:

from remayn.result_set import ResultFolder

rs = ResultFolder('./results')

Note that the same path used to save the results is employed here to load the ResultFolder. The ResultFolder object is a special type of ResultSet and represents a set of results which have been loaded from disk.

📝 Creating a pandas DataFrame that contains all the results

After loading the results, the create_dataframe method of the ResultSet class can be used to generate a pandas.DataFrame containing all the results. This method receives a callable which is used to compute the metrics from the targets and predictions stored in each Result. Therefore, first we can define a function that computes the metrics:

def mse(y_true, y_pred):
    return ((y_true - y_pred)**2).mean()

def _compute_metrics(targets, predictions):
    return {
        "mse": mse(targets, predictions),
    }

Then, the create_dataframe method of the ResultSet is used:

from remayn.result_set import ResultFolder

rs = ResultFolder('./results')
df = rs.create_dataframe(
    config_columns=[
        "model",
        "dataset",
        "learning_rate",
    ],
    metrics_fn=_compute_metrics,
)

Finally, the DataFrame can be saved to a file by using the existing pandas methods:

df.to_excel('results.xlsx', index=False)

This will generate an Excel file that contains the column given in the config_columns parameter along with the columns associated with the metrics computed in the function provided.

Collaborating

Code contributions to the remayn project are welcomed via pull requests. Please, contact the maintainers (maybe opening an issue) before doing any work to make sure that your contributions align with the project.

Guidelines for code contributions

  • You can clone the repository and then install the library from the local repository folder:
git clone [email protected]:ayrna/remayn.git
pip install ./remayn
  • In order to set up the environment for development, install the project in editable mode and include the optional dev requirements:
pip install -e '.[dev]'
  • Install the pre-commit hooks before starting to make any modifications:
pre-commit install
  • Write code that is compatible with all supported versions of Python listed in the pyproject.toml file.
  • Create tests that cover the common cases and the corner cases of the code.
  • Preserve backwards-compatibility whenever possible, and make clear if something must change.
  • Document any portions of the code that might be less clear to others, especially to new developers.
  • Write API documentation as docstrings.

remayn's People

Contributors

victormvy avatar

Stargazers

David Guijo-Rubio avatar  avatar Francisco Bérchez-Moreno avatar  avatar

Watchers

Antonio Manuel Durán Rosal avatar Pedro Antonio Gutiérrez avatar Juan Carlos Fernández Caballero avatar María Pérez Ortiz avatar

remayn's Issues

[BUG] Running `ResultSet.create_dataframe` with `include_train` or `include_val` causes an error

Describe the bug

Running ResultSet.create_dataframe with include_train=True or include_val=True causes an error when there are results in the ResultSet that do not contain train/validation targets or predictions.

Steps/Code to reproduce the bug

from remayn.result_set import ResultFolder

def compute_metrics(targets, predictions):
    from sklearn.metrics import accuracy_score
    targets = np.array(targets)
    predictions = np.array(predictions)
    if len(predictions.shape) > 1:
        predictions = np.argmax(predictions, axis=1)
    metrics = {
        "CCR": accuracy_score(targets, predictions),
    }
    return metrics


# ./results has some results that do not contain train or validation results
results = ResultFolder("./results")

df = results.create_dataframe(
    config_columns=[
        "estimator_name",
        "dataset",
        "rs",
        "n_folds",
        "estimator_config",
    ],
    metrics_fn=compute_metrics,
    include_train=True,
    include_val=True,
    config_columns_prefix="",
    n_jobs=1,
)

Expected results

The results dataframe should be created and the cells for the missing results should be empty.

Actual results

An error is thrown when trying to compute the metrics with a np.array(None) object.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.