- Tue Nov 15 09:47:47 PM CST 2022
- VirtualBox
- vagrant
- Make sure the Directory has the following structure
.
├── cache -- Files to replace in VM
│ ├── core-site.xml -- replace /usr/local/hadoop/etc/hadoop/core-site.xml in VM
│ ├── hadoop-2.9.0.tar.gz -- hadoop
│ ├── hdfs-site.xml -- replace /usr/local/hadoop/etc/hadoop/hdfs-site.xml in VM
│ ├── hosts -- replace /usr/hosts in VM
│ ├── jdk-8u161-linux-x64.tar.gz -- jdk package
│ ├── mapred-site.xml -- replace /usr/local/hadoop/etc/hadoop/mapred-site.xml
│ ├── scala-2.11.8.tgz -- scala package
│ ├── sources.list -- replace /etc/apt/sources.list -> from https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/
│ ├── spark-2.3.0-bin-hadoop2.7.tgz -- spark package
│ └── yarn-site.xml -- replace /usr/local/hadoop/etc/hadoop/yarn-site.xml
├── hadoop+spark集群平台搭建.pptx -- ppt instruction for Hadoop + spark on VMware
├── Hadoop集群安装手册.pdf -- PDF instruction for Hadoop on VMware based on CentOS
├── hadoop集群搭建.pptx -- ppt instruction for Hadoop on VMware based on Ubuntu -- Instruction for this Virtualbox Version
├── img -- IMGs in this readme file
├── init.sh -- Scripts to execute when VM first starts
├── README.md -- This file
└── Vagrantfile -- VM Configurations
2 directories, 26 files
- The default configuration for Virtual Machines is written in
Vagrantfile
. Modifications can be made by changing the code directly. - Check up init.sh for more setups.
- In the directory shown above, execute
vagrant up
to setup and boot your vm.
vagrant up
- It should take a while. So have a cup of tea and when everything is settled, check your virutal machine with either
vagrant status
orvirtualbox user interface
vagrant status
- use
vagrant ssh master
to enter master virtual machine.
vagrant ssh master
- Append public keys to authorized_keys by
cat /vagrant/cache/authorized_keys >> ~/.ssh/authorized_keys
cat /vagrant/cache/authorized_keys >> ~/.ssh/authorized_keys
Note that all three public keys have been generated and pasted in
/vagrant/cache/authorized_keys
by commands ininit.sh
andVagrantfile
.
-
Ssh configuration should be done in both slaves as well.
-
Varify ssh configuration by executing
ssh slave1
inmaster
virtual machine. You sohuld log into slave1 without entering password.
- run
hadoop namenode -format
inmaster
virtual machine to configure node information.
hadoop namenode -format
- run
start-all.sh
to start deployment. Always remember to runstop-all.sh
before virtual machine shutdown.
- run
hadoop jar hadoop-mapreduce-examples-2.9.0.jar pi 5 5
in directory/usr/local/hadoop/share/hadoop/mapreduce
to varify.
cd /usr/local/hadoop/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-2.9.0.jar pi 5 5
A number relatively close to pi is then presented. To be more accurate on the result, try running
pi 10 10000
which takes a longer period.
- Hadoop Environment Configuration on
IP:50070
where IP is the static IP for Master. And onIP:8088
where IP is the static IP of Master and8088
can be configured in those.xml
files.