Giter VIP home page Giter VIP logo

docker-hadoop's Introduction

#Apache Hadoop 2.7.1 Docker image

Hadoop in Docker with Networking and Persistent Storage

The Portworx docker-hadoop project is a collection of Docker Images for running a multi-node Hadoop cluster. Whereas other Hadoop Docker Images are meant to be run in a single image, or one host, these images can be used in a multi-host environment with Docker 1.9 networking.

Currently the only Hadoop version supported is 2.7.1

The Images also support Portworx Persistent Storage layer for Docker. With Persistent Storage Volumes the Namenode and all Datanodes can be stopped, restarted, migrated and cloned while retaining all data.

Volumes are currently configured for /hdfs/volume1

The Images

  • portworx/namenode

    The namenode and secondary-namenode services run on this image. At startup it will configure itself with the hostname of the instance.

  • portworx/yarn

    The Resource Manager and Job History Server run on this image. Requires the HADOOP_HOST_NAMENODE environment variable to be set to the hostname of the Namenode instance at startup. Otherwise will set the short hostname to "namenode".

  • portworx/datanode

    The Datanode and NodeManager services run on this image. At startup two environment variables are required: 1) HADOOP_HOST_NAMENODE environment variable to be set to the hostname of the Namenode; HADOOP_HOST_YARN variable set to the hostname of the Yarn (Resource Manager) instance.

Pull the image

These images are all available from the Docker Hub automated build repository.

docker pull portworx/namenode:2.7.1
docker pull portworx/yarn:2.7.1
docker pull portworx/datanode:2.7.1

Start a cluster

Make sure you have pulled all three images. Also make sure that SELinux is disabled on the host.

All of the images come bundled with the same set of SSH keys for user equivalence. Note that this setup isn't meant to be run in secure environments.

To create a cluster with a Namenode, Yarn server, and 3 Datanode’s issue the following docker run commands.

docker network create hadoopnet
docker run -itd -p 50070:50070 --net=hadoopnet --name namenode portworx/namenode
docker run -itd -p 8088:8088 -p 19888:19888 --net=hadoopnet --name hadoop-yarn -e HADOOP_HOST_NAMENODE=[namenode] portworx/yarn
docker run -itd --net=hadoopnet --name hadoop-node1 -e HADOOP_HOST_NAMENODE=[namenode] -e HADOOP_HOST_YARN=[yarn] portworx/datanode
docker run -itd --net=hadoopnet --name hadoop-node2 -e HADOOP_HOST_NAMENODE=[namenode] -e HADOOP_HOST_YARN=[yarn] portworx/datanode
docker run -itd --net=hadoopnet --name hadoop-node3 -e HADOOP_HOST_NAMENODE=[namenode] -e HADOOP_HOST_YARN=[yarn] portworx/datanode

Viewing Web Interfaces

Namenode - http://[namenode]:50070
Resource Manager - http://[yarn]:8088
Job History - http://[yarn]:19888

#Running Hadoop examples:

docker exec -it [yarn instance] /bin/bash

Set PATH=$PATH:/usr/local/hadoop/bin HADOOP_PREFIX=/usr/local/hadoop

cd $HADOOP_PREFIX
hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar teragen 10000 /teragen

hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter randomout
# Check output
bin/hdfs dfs -cat randomout/*

Acknowledgements & Support

Much of this work is based on the sequenceiq/hadoop-docker image.

If you have questions or would like to contribute to this project send an email to [email protected]

docker-hadoop's People

Contributors

garryknox avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.