Giter VIP home page Giter VIP logo

ragonneau / psmsc-dockersparkhdfs Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 1.0 476 KB

This project aims to implement an application scenario which deploy a cluster of virtual machines (containers) in a distributed setting with several laptops. The docker infrastructure consists in a spark infrastructure, including HDFS, a master and several distributed slaves.

License: MIT License

Java 4.14% Dockerfile 17.71% Shell 78.15%

psmsc-dockersparkhdfs's Introduction

PSMSC-DockerSparkHDFS

The objective of this project is to implement an application scenario which illustrates the use of the following techniques:

  • Docker: it can be used to deploy a cluster of (virtual) machines on your laptop, but can also be used in a distributed setting with several laptops.
  • Spark: a spark infrastructure, including hdfs, a master and several slaves must be deployed in the docker infrastructure.

It refers to the track Performance in Software, Media, and Scientific Computing of the MSc course Cloud Computing and Big Data given at Toulouse INP-E.N.S.E.E.I.H.T. Eng. School and Paul Sabatier Faculty of Science and Engineering.

Getting Started

Clone first this git repository, and go into the main folder.

Prerequisites

You first need to install Docker. On Ubuntu:

wget -qO- https://get.docker.com/ | sh
sudo apt-get install ufw
sudo usermod -aG docker $USER

Running

Locally

With Docker installed, execute the following script:

cd local/
./reset-ccbd.sh -i
./start-ccbd.sh <n>

with n being the total number of container which will be built (1 master + (n-1) slaves)

It will build all the docker images, run all the containers and get you into hadoop master container under root user and execute hadoop.

Now all you need to do in order to run the Word Count example is to execute the following lines:

cd examples/
./start-wordcount.sh

The time it took, for your configuration, to count the words in file-wordcount.txt is displayed at the end of the execution.

Remotely

Install Docker on every guest host which will be used.

With Docker installed on every guest host which will be used, set each IP address as static (be sure to keep internet access).

Every worker host communicates its RSA key to the manager who saves them in his authorized_keys.

Modify the set-configuration.sh on the manager as follows :

  • Put the manager's IP address

Modify the set-configuration.sh on the workers as follows :

  • Put the manager's host name
  • Put the IP address of the manager
  • Put the IP address of the worker

On every guest host, execute the following script :

cd remote/
sudo ./set-ports.sh

(This script only works on Linux distributions)

On the manager guest host execute the following script :

cd manager/
./reset-ccbd.sh -i
./start-ccbd.sh <n> <m>

with n being the total number of container on the manager host (master + slaves, default: 3) and m being the total number of remote slaves on the cluster (default: 2). It will build all the docker images locally, run all the containers and get you into hadoop master container under root user.

When the previous script is finished, each worker executes the following script :

cd worker/
./reset-ccbd.sh -i
./start-ccbd.sh <i> <n>

with i being the initial index (number of slaves already launched + 1, default: 3) and n being the number of slaves on this host (default: 2). If you launch only one worker, all the default numbers are relevant. You now do not have to do anything on the worker guest hosts. Just make sure to keep them powered on.

On the manager host, execute the following script:

start-hadoop.sh
cd examples/
./start-wordcount.sh

The time it took spark to count the words in file-wordcount.txt is displayed and saved /tmp/time-wordcount.log.

Built With

  • Docker - A computer program that performs operating-system-level virtualization
  • Spark - A unified analytics engine for large-scale data processing.
  • Hadoop - An open-source software for reliable, scalable, distributed computing.

Authors

  • Guillaume Hugonnard - MSc-PSMSC, INPT-ENSEEIHT Eng. School and Paul Sabatier Faculty of Science and Engineering - GuillaumeHugonnard
  • Tom Ragonneau - MSc-PSMSC, INPT-ENSEEIHT Eng. School and Paul Sabatier Faculty of Science and Engineering - TomRagonneau

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgments

  • Daniel Hagimont - MSc-PSMSC speaker, INPT-ENSEEIHT Eng. School and Paul Sabatier Faculty of Science and Engineering

psmsc-dockersparkhdfs's People

Contributors

guillaumehugonnard avatar ragonneau avatar

Forkers

gaomath

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.