Giter VIP home page Giter VIP logo

hadoop_docker's Introduction

README

Overview 2022-12-24

This is my personal hadoop-3.3.4 docker images based on ubuntu:latest (latest = 22.04 jammy when building this image).

The aim for this image is to provide a base for building a hadoop cluster. After start-up, a ssh deamon will be launched automatically. All the information about the experimental hadoop cluster enverionment, please refer to this hadoop tutorial in Chinese.

Please check the Dockerfile for details.

All the soruce code could be found in github.com/zhaolj/hadoop_docker.

Use as a single node

  1. start a single node

    docker run -d -p10122:22 -p19870:9870 --privileged --name hadoop zhaolj/hadoop:latest
  2. login via ssh (password: hadoop)

    ssh hadoop@localhost -p10122
  3. check ip

    ip addr | grep 172

    check ip

    As example above, the ip is 172.17.0.2.

  4. config Go to the hadoop configuration files directory: /opt/hadoop/etc/hadoop.

    cd $HADOOP_HOME/etc/hadoop
    • core-site.xml

      <property>
          <name>fs.defaultFS</name>
          <value>hdfs://<your ip>:9000</value>
      </property>
    • hdfs-site.xml

      <property>
          <name>dfs.replication</name>
          <value>1</value>
      </property>
  5. format hdfs

    hdfs namenode -format
  6. start HDFS

    start-dfs.sh
  7. check processes via jps check via jps

  8. open the HDFS's HTTP board open the http://localhost:19870/ in your host brower. HDFS board

Use as a cluster

We are going to build a hadoop cluster with 1 NameNode and 2 DataNodes

cluster

We can use another pre-configed images: zhaolj/hadoop-cluster to quick start.

  1. create a network for the cluster

    docker network create --subnet=172.20.0.0/16 hnet
  2. create the containers (3 nodes)

    docker run -d -p10122:22 -p19870:9870 --name=nn --hostname=nn --network=hnet --ip=172.20.1.1 --add-host=dn1:172.20.1.2 --add-host=dn2:172.20.1.3 --privileged zhaolj/hadoop-cluster:latest
    docker run -d --name=dn1 --hostname=dn1 --network=hnet --ip=172.20.1.2 --add-host=nn:172.20.1.1 --add-host=dn2:172.20.1.3 --privileged zhaolj/hadoop-cluster:latest
    docker run -d --name=dn2 --hostname=dn2 --network=hnet --ip=172.20.1.3 --add-host=nn:172.20.1.1 --add-host=dn1:172.20.1.2 --privileged zhaolj/hadoop-cluster:latest
  3. go to the NameNode via ssh (password: hadoop)

    ssh hadoop@localhost -p10122
  4. format HDFS

    hdfs namenode -format
  5. start HDFS

    start-dfs.sh

    start-dfs

  6. check processes via jps on NameNode check via jps

  7. check processes via jps on DataNode

    • connect to dn1 via ssh in nn

      ssh dn1
      jps
      exit

      check via jps

    • connect to dn2 via ssh in nn

      ssh dn2
      jps
      exit

      check via jps

  8. open the HDFS's HTTP board open the http://localhost:19870/ in your host brower. HDFS board

Differences between zhaolj/hadoop and zhaolj/hadoop-cluster

zhaolj/hadoop-cluster includes three pre-configed files for the example cluster network.

  1. workers ($HADOOP_HOME/etc/hadoop/workers)

    dn1
    dn2
  2. core-site.xml ($HADOOP_HOME/etc/hadoop/core-site.xml)

    add following content in the default configuration (embraced by <configuration>...</configuration>).

    <!-- specify HDFS host & port -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://nn:9000</value>
    </property>
    <!-- specify Hadoop temporary directory -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:///home/hadoop/tmp</value>
    </property>
  3. hdfs-site.xml ($HADOOP_HOME/etc/hadoop/hdfs-site.xml)

    add following content in the default configuration (embraced by <configuration>...</configuration>).

    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hadoop/hdfs/name</value>
    </property>

hadoop_docker's People

Contributors

zhaolj avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.