Giter VIP home page Giter VIP logo

rmllib's Introduction

Relational Machine Learning Library (RMLLib)

The Relational Machine Learning Library (rmllib) is aimed at providing scalable relational machine learning solutions in python.

Features

  • Collective inference for relational inference
  • Semi-supervised learning utilizing esimates of labels for previous rounds
  • Scalable solutions for single-box machines
  • Additional implementations of state-of-the-art generative graph models for synthetic experimentation

Getting started

RMLLib uses APIs inspired by sklearn and relies heavily on numpy, scipy and pandas for data wrangling and optimizations, but generally these are not compatible learners for RMLLib. This is largely due to the interconnectedness between labeled and unlabeled data. The RMLLib dataformat largely hides this problem from the user by providing / using masking functions in the dataset to ensure the training labels remain unobserved during training.

For a simple example of building data and running methods, please see the provided notebook.

Learning and Inference

The crux of RMLLib focuses on a relational dependency network representation, where a set of conditional distributions (e.g, Relational Naive Bayes) of a label given its neighbors is laced together via a collective inference algorithms (e.g., Variational Inference). On top of this, RMLLib provides semi-supervised learning and inference methods that perform well in sparsely labeled data scenarios.

For the optimization step, RMLLib follows RDNs by maximizing the pseudolikelihood, allowing for faster optimization of the parameter space. For collective inference, RMLLib diverges slightly from most implementations as it performs this largely through a single (potentially sparse) matrix multiply, rather than each instance updated once. This allows for considerably faster implementations of inference than previously reported as it can use existing BLAS (or alternative) implementations.

RMLLib also aims to provide alternative learning/inference algorithms to RDNs, although this is todo.

Data Format

RMLLib is intended to run from the ground up on large, potentially multi-class datasets. To facilitate this, the generic dataset class that wraps four basic datastructures:

  • labels: a pandas DataFrame with rows indicating sample labels and columns as a multiindex with level 0 being the "Y" label and class values being level 1
  • features: either a pandas DataFrame or SparseDataFrame, with feature values being level=0 feature name and feature values being level=1. Categorical features are assumed to have a one-hot-encoding representation allowing for simple slicing and sparse matrix multiplication (see Boston Medians for a simple example).
  • edges: either dense or sparse matrix containing the weight values between nodes.

In addition, the dataset module provides helpers such as masks for defining a training/test split, and helpers for creating training sets that obscure unlabeled parts of the graph.

Installation

Currently, installation is only from source, i.e.:

git clone https://github.com/jpfeiffe/rmllib
cd rmllib
pip install rmllib

Blame

Currently the project is maintained by me, Joel Pfeiffer. I'm always looking for help with new methods.

If you find the library useful for your work, please consider citing:

@misc{rmllib,
title = {Relational Machine Learning Library (RMLLib)},
author = {Joseph J. {Pfeiffer III}},
howpublished = {\url{https://github.com/jpfeiffe/rmllib}},
note = {Accessed: 2010-09-30}
}

Additionally, please ensure to cite relevant articles for the corresponding methods, algorithms and/or datasets.

rmllib's People

Contributors

jpfeiffe avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.