Giter VIP home page Giter VIP logo

hadoopbnem's Introduction

[email protected]

Info

This code amends the libDAI library to provide:

  • A new parameter learning implementation called Age-Layered Expectation Maximization (ALEM), inspired by the ALPS genetic algorithm work of Greg Hornby of NASA Ames.
  • A distributed parameter learning implementation using MapReduce and a population structure, with either ALEM or a large random restart-equivalent which I call Multiple Expectation Maximzation (MEM).

This code can be used to recreate the results shown in my paper:

Reed, E. and Mengshoel, O. “Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce”. In BigLearning Workshop of Neural Information Processing Systems (NIPS 2012).

Additionally, this work can be used (with more emphasis on high performance computing) to recreate the algorithms suggested in the following two papers:

Mengshoel, O. J., Saluja, A., & Sundararajan, P. (2012). Age-Layered Expectation Maximization for Parameter Learning in Bayesian Networks. In Proc. of the Fifteenth International Conference on Artificial Intelligence and Statistics (pp. 984-992).

A. Basak, I. Brinster, X. Ma, and O.J. Mengshoel. Accelerating Bayesian network parameter learning using Hadoop and MapReduce. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine '12, pages 101–108, Beijing, China, 2012. ACM.

How to test using non-Hadoop local MapReduce

# working directory: ./mapreduce
make
./main.sh

How to run using Hadoop locally

This is useful for testing purposes before deploying on Amazon EC2 or another Hadoop cluster.

  1. Install Hadoop (tested up to v1.1.1)

  2. Set up environment variables (e.g.) export HADOOP_PREFIX=/home/erik/hadoop/hadoop-1.1.1

    Also set up Hadoop for pseudo-distributed operation: http://hadoop.apache.org/docs/r1.1.0/single_node_setup.html

  3. Run the following (I have hadoop in my path):

# working directory: .../libdai/mapreduce
make clobber # WARNING: this clears any existing HDFS data
             # which seems to cause a bug sometimes when
             # accessing the HDFS
hadoop namenode -format
start-all
  1. Initialize a BN and some data
./scripts/init dat/bnets/asia.net 100 4
  1. Start Hadoop streaming
make
./scripts/streaming dat/in
  1. Gander at results
ls out
  1. Stop Hadoop
stop-all

How to run using Amazon EC2

  1. $ make

  2. Launch an EC2 instance and send the mapreduce folder to it.

scp -rCi ~/Downloads/dai-mapreduce.pem mapreduce [email protected]:
  1. ssh into the instance and launch a cluster.
ssh -Ci ~/Downloads/dai-mapreduce.pem [email protected]

# Assumes hadoop-ec2 has been configured.
# Launch a cluster of 10 (small) instances
hadoop-ec2 launch-cluster dai-mapreduce 10
# ... wait a really long time for the cluster to initialize
  1. Push the mapreduce folder to the cluster master.
hadoop-ec2 push dai-mapreduce mapreduce
  1. Login to the cluster master.
hadoop-ec2 login dai-mapreduce
  1. Initialize a big bnet and start Hadoop streaming
./scripts/init dat/bnets/water.net 1000 10
# Do standard EM, pop-size=10, mappers=10
./scripts/streaming dat/in -u 10 10
# ... wait a few hours
  1. Collect data!
ls out

hadoopbnem's People

Contributors

erikreed avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.