Giter VIP home page Giter VIP logo

entity_embeddings_categorical's Introduction

PyPI version Build Status Coverage Status GitHub Codacy Badge

Overview

This project is aimed to serve as an utility tool for the preprocessing, training and extraction of entity embeddings through Neural Networks using the Keras framework. It's still under construction, so please use it carefully.

Installation

The installation is pretty simple if you have a virtualenv already installed on your machine. If you don't please rely to VirtualEnv official documentation.

pip install entity-embeddings-categorical

Documentation

Besides the docstrings, major details about the documentation can be found here.

Testing

This project is inteded to suit most of the existent needs, so for this reason, testability is a major concern. Most of the code is heavily tested, along with Travis as Continuous Integration tool to run all the unit tests once there is a new commit.

Usage

The usage of this utility library is provided in two modes: default and custom. In the default configuration, you can perform the following operations: Regression, Binary Classification and Multiclass Classification.

If your data type differs from any of these, you can feel free to use the custom mode, where you can define most of the configurations related to the target processing and output from the neural network.

Default mode

The usage of the default mode is pretty straightforward, you just need to provide a few parameters to the Config object:

So for creating a simple embedding network that reads from file sales_last_semester.csv, where the target name is total_sales, with the desired output being a binary classification and with a training ratio of 0.9, our Python script would look like this:

    config = Config.make_default_config(csv_path='sales_last_semester.csv',
                                        target_name='total_sales',
                                        target_type=TargetType.BINARY_CLASSIFICATION,
                                        train_ratio=0.9)


    embedder = Embedder(config)
    embedder.perform_embedding()

Pretty simple, huh?

A working example of default mode can be found here as a Python script.

Custom mode

If you intend to customize the output of the Neural Network or even the way that the target variables are processed, you need to specify these when creating the configuration object. This can be done by creating a class that extend from TargetProcessor and ModelAssembler.

A working example of custom configuration mode can be found here.

Visualization

Once you are done with the training of your model, you can use the module visualization_utils in order to create some visualizations from the generated weights as well as the accuraccy of your model.

Below are some examples created for the Rossmann dataset:

Weights for store id embedding

Troubleshooting

In case of any issue with the project, or for further questions, do not hesitate to open an issue here on GitHub.

Contributions

Contributions are really welcome, so feel free to open a pull request :-)

TODO

  • Allow to use a Pandas DataFrame instead of the csv file path;

entity_embeddings_categorical's People

Contributors

rodrigobressan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

entity_embeddings_categorical's Issues

how should i do when i use my data using dataframe not loaded csv file?

for example,

from entity_embeddings import Config, Embedder, TargetType
from entity_embeddings.util import visualization_utils

df=pd.read_csv('test_file.csv')

config = Config.make_default_config(df,
target_name='churn',
target_type=TargetType.BINARY_CLASSIFICATION,
train_ratio=0.8,
epochs=10,
verbose=True,
artifacts_path='artifacts')

embedder = Embedder(config)
embedder.perform_embedding()

Invalid argument error.

I got this error while running the line (on a GPU):

embeddings = ce.get_embeddings(X_train, y_train, categorical_embedding_info=embedding_info, is_classification=True, epochs=100,batch_size=256)

Can't tell from this error what the problem is. Is it embedding #71 (indexed from 0) the 71th LabelEncoder or the 71st column in the original X_train dataframe that is causing issue?

'2 root error(s) found.\n (0) Invalid argument: indices[5,0] = 2 is not in [0, 2)\n\t [[node model/embedding_71/embedding_lookup (defined at /miniconda3/lib/python3.8/site-packages/categorical_embedder/init.py:175) ]]\n (1) Invalid argument: indices[5,0] = 2 is not in [0, 2)\n\t [[node model/embedding_71/embedding_lookup (defined at /miniconda3/lib/python3.8/site-packages/categorical_embedder/init.py:175) ]]\n\t [[ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_int32_ExpandDims_23/_537]]\n0 successful operations.\n0 derived errors ignored. [Op:__inference_train_function_11843]\n\nErrors may have originated from an input operation.\nInput Source operations connected to node model/embedding_71/embedding_lookup:\n model/embedding_71/embedding_lookup/7853 (defined at /miniconda3/lib/python3.8/contextlib.py:113)\n\nInput Source operations connected to node model/embedding_71/embedding_lookup:\n model/embedding_71/embedding_lookup/7853 (defined at /miniconda3/lib/python3.8/contextlib.py:113)\n\nFunction call stack:\ntrain_function -> train_function\n'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.