Giter VIP home page Giter VIP logo

hadoop-vm's Introduction

Hadoop in Virtualbox (without SPARK)

  • Tue Nov 15 09:47:47 PM CST 2022

Pre-Requirement

  1. VirtualBox
  2. vagrant


virtualbox and vagrant
3. Access to Internet * The base OS we choose to deploy Hadoop Cluster is Ubuntu 16, thus we should be able to fetch "ubuntu/xenial64" from [vagrant cloud](https://app.vagrantup.com/boxes/search). * Also, the first step in your vm is to update and install necessary applications such as `ssh`, `rsync` and `vim`.

Installation

1. Ingredients checkup

  • Make sure the Directory has the following structure
.
├── cache                               -- Files to replace in VM
│   ├── core-site.xml                   -- replace /usr/local/hadoop/etc/hadoop/core-site.xml in VM
│   ├── hadoop-2.9.0.tar.gz             -- hadoop 
│   ├── hdfs-site.xml                   -- replace /usr/local/hadoop/etc/hadoop/hdfs-site.xml in VM 
│   ├── hosts                           -- replace /usr/hosts in VM
│   ├── jdk-8u161-linux-x64.tar.gz      -- jdk package
│   ├── mapred-site.xml                 -- replace /usr/local/hadoop/etc/hadoop/mapred-site.xml
│   ├── scala-2.11.8.tgz                -- scala package
│   ├── sources.list                    -- replace /etc/apt/sources.list -> from https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/
│   ├── spark-2.3.0-bin-hadoop2.7.tgz   -- spark package
│   └── yarn-site.xml                   -- replace /usr/local/hadoop/etc/hadoop/yarn-site.xml
├── hadoop+spark集群平台搭建.pptx       -- ppt instruction for Hadoop + spark on VMware
├── Hadoop集群安装手册.pdf              -- PDF instruction for Hadoop on VMware based on CentOS
├── hadoop集群搭建.pptx                 -- ppt instruction for Hadoop on VMware based on Ubuntu -- Instruction for this Virtualbox Version
├── img                                 -- IMGs in this readme file
├── init.sh                             -- Scripts to execute when VM first starts
├── README.md                           -- This file
└── Vagrantfile                         -- VM Configurations

2 directories, 26 files

2. Check init.sh and Vagrantfile for certain configurations

  1. The default configuration for Virtual Machines is written in Vagrantfile. Modifications can be made by changing the code directly.
  2. Check up init.sh for more setups.

3. Execute Vagrant Up

  • In the directory shown above, execute vagrant up to setup and boot your vm.
vagrant up 


Commands to Execute
  • It should take a while. So have a cup of tea and when everything is settled, check your virutal machine with either vagrant status or virtualbox user interface
vagrant status


vagrant status


virtualbox user interface

4. ssh configurations

  1. use vagrant ssh master to enter master virtual machine.
vagrant ssh master


vagrant ssh master
  1. Append public keys to authorized_keys by cat /vagrant/cache/authorized_keys >> ~/.ssh/authorized_keys
cat /vagrant/cache/authorized_keys >> ~/.ssh/authorized_keys

Note that all three public keys have been generated and pasted in /vagrant/cache/authorized_keys by commands in init.sh and Vagrantfile.

  1. Ssh configuration should be done in both slaves as well.

  2. Varify ssh configuration by executing ssh slave1 in master virtual machine. You sohuld log into slave1 without entering password.


log into slave1 without password after ssh configuration

Deploy and Varify Hadoop

1. Deploy Hadoop

  • run hadoop namenode -format in master virtual machine to configure node information.
hadoop namenode -format
  • run start-all.sh to start deployment. Always remember to run stop-all.sh before virtual machine shutdown.

2. Varification

  • run hadoop jar hadoop-mapreduce-examples-2.9.0.jar pi 5 5 in directory /usr/local/hadoop/share/hadoop/mapreduce to varify.
cd /usr/local/hadoop/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-2.9.0.jar pi 5 5

A number relatively close to pi is then presented. To be more accurate on the result, try running pi 10 10000 which takes a longer period.

3. User Interface

  • Hadoop Environment Configuration on IP:50070 where IP is the static IP for Master. And on IP:8088 where IP is the static IP of Master and 8088 can be configured in those .xml files.



UI

hadoop-vm's People

Contributors

chrisvicky avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.