Giter VIP home page Giter VIP logo

lasso's Introduction

LASSO

Introduction

LASSO is a parallel machine learning system that learns a regression model from large data. It works in either of two modes:

  1. IPM-mode. In this mode, you start multiple training processes running the mrml-lasso/train program on one or more computers. Each process learns a model from its local part of data. After all processes are finished, these models are aggregated into one using the Iterative Parameter Mixtures (IPM) technology.

  2. MPI-mode. In this model, you start a process running the mrml-lasso/mrml-lasso program, which will start more processes using MPI. After every iteration, these processes exchange their opinions and update the model. Since MPI-mode induces more data exchanges than IPM-mode, it is less scalable.

In either mode, LASSO learns a logistic regression model with L1-regularization using the OWL-QN training algorithm. For more details about this algorithm, please refer to:

Motivation

This project serves as a baseline training system for the grand challenge in IEEE ICME 2014. To win this challenge, you need to be able to handle large training corpus generated from real Internet services. You can develop your own system, or try this one.

Installation

LASSO was developed and tested on MacOS X and Linux. It should be able to run on FreeBSD.

Dependents

LASSO depends on the following thirdparty libraries:

  1. protobuf
  2. boost
  3. gflags
  4. openssl
  5. libssh2
  6. mpich2

On MacOS X, it is recommended to install these packages using Homebrew. Homebrew makes sure that all header files come to folder /usr/loca/include and all libraries come to /usr/local/lib.

On Linux, you can install these packages using package management systems or build your own copy from source code. In this case, you might need to edit the CMakeLists.txt file to tell cmake where these packages are installed. Please refer to comments in CMakeLists.txt as a guide on how to edit it.

To make it easy to deploy LASSO on many computers, we prefer static linking to above libraries and the GCC runtiem library during building. This can be controlled by adding the following line to the CMakeLists.txt file:

set(CMAKE_EXE_LINKER_FLAGS "-static -static-libgcc")

With -static-libgcc, you should not need to worry that all computers in your cluster run the same version of GCC runtime.

Notice that above linker flags are not supported on MacOS X. It is reasonable anyway as MacOS X is a desktop system, and it is efficient for desktop applications sharing common components as shared libraries.

Checkout and Build

With above dependents installed, you can simply checkout the code and build it using cmake.

cd ~
git clone https://github.com/wangkuiyi/lasso
cd /tmp
cmake ~/lasso
make
make install

The make install commmand copies built software to a directory specified in CMakeLists.txt by the directive

set(CMAKE_INSTALL_PREFIX "/home/public/paralgo")

lasso's People

Contributors

wangkuiyi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.