This is my personal hadoop-3.3.4
docker images based on ubuntu:latest
(latest = 22.04 jammy when building this image).
The aim for this image is to provide a base for building a hadoop cluster. After start-up, a ssh deamon will be launched automatically. All the information about the experimental hadoop cluster enverionment, please refer to this hadoop tutorial in Chinese.
Please check the Dockerfile for details.
All the soruce code could be found in github.com/zhaolj/hadoop_docker.
-
start a single node
docker run -d -p10122:22 -p19870:9870 --privileged --name hadoop zhaolj/hadoop:latest
-
login via ssh (password: hadoop)
ssh hadoop@localhost -p10122
-
check ip
ip addr | grep 172
As example above, the ip is
172.17.0.2
. -
config Go to the hadoop configuration files directory:
/opt/hadoop/etc/hadoop
.cd $HADOOP_HOME/etc/hadoop
-
core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://<your ip>:9000</value> </property>
-
hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> </property>
-
-
format hdfs
hdfs namenode -format
-
start HDFS
start-dfs.sh
-
open the HDFS's HTTP board open the
http://localhost:19870/
in your host brower.
We are going to build a hadoop cluster with 1 NameNode
and 2 DataNode
s
We can use another pre-configed images: zhaolj/hadoop-cluster to quick start.
-
create a network for the cluster
docker network create --subnet=172.20.0.0/16 hnet
-
create the containers (3 nodes)
docker run -d -p10122:22 -p19870:9870 --name=nn --hostname=nn --network=hnet --ip=172.20.1.1 --add-host=dn1:172.20.1.2 --add-host=dn2:172.20.1.3 --privileged zhaolj/hadoop-cluster:latest docker run -d --name=dn1 --hostname=dn1 --network=hnet --ip=172.20.1.2 --add-host=nn:172.20.1.1 --add-host=dn2:172.20.1.3 --privileged zhaolj/hadoop-cluster:latest docker run -d --name=dn2 --hostname=dn2 --network=hnet --ip=172.20.1.3 --add-host=nn:172.20.1.1 --add-host=dn1:172.20.1.2 --privileged zhaolj/hadoop-cluster:latest
-
go to the
NameNode
via ssh (password: hadoop)ssh hadoop@localhost -p10122
-
format HDFS
hdfs namenode -format
-
start HDFS
start-dfs.sh
-
check processes via
jps
onDataNode
-
open the HDFS's HTTP board open the
http://localhost:19870/
in your host brower.
Differences between zhaolj/hadoop and zhaolj/hadoop-cluster
zhaolj/hadoop-cluster includes three pre-configed files for the example cluster network.
-
workers (
$HADOOP_HOME/etc/hadoop/workers
)dn1 dn2
-
core-site.xml (
$HADOOP_HOME/etc/hadoop/core-site.xml
)add following content in the default configuration (embraced by
<configuration>...</configuration>
).<!-- specify HDFS host & port --> <property> <name>fs.defaultFS</name> <value>hdfs://nn:9000</value> </property> <!-- specify Hadoop temporary directory --> <property> <name>hadoop.tmp.dir</name> <value>file:///home/hadoop/tmp</value> </property>
-
hdfs-site.xml (
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
)add following content in the default configuration (embraced by
<configuration>...</configuration>
).<property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hdfs/name</value> </property>