Giter VIP home page Giter VIP logo

hadoop-setup's Introduction

hadoop-setup

As root, add your system user (ex: my_user_id) to sudo without a password

sudo visudo

add:

my_user_id  ALL=(ALL) NOPASSWD:ALL

Save and exit

As root, download and set up openjdk in /opt/jdk/current/

sudo su
cd
wget https://download.java.net/java/GA/jdk11/13/GPL/openjdk-11.0.1_linux-x64_bin.tar.gz
tar xzvf openjdk-11.0.1_linux-x64_bin.tar.gz
rm openjdk-11.0.1_linux-x64_bin.tar.gz
mkdir /opt/jdk
mv jdk-11.0.1 /opt/jdk/
cd /opt/jdk/
ln -s jdk-11.0.1/ current
ls -la
exit

As your user (with your user account), add your java home:

nano ~/.bashrc

At the bottom of the file, add

export JAVA_HOME=/opt/jdk/current
export PATH=/opt/jdk/current/bin:$PATH

Reload it

source ~/.bashrc

Download hadoop

wget http://apache.claz.org/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
tar xzvf hadoop-2.8.5.tar.gz
rm hadoop-2.8.5.tar.gz
sudo mv hadoop-2.8.5/ /opt/
cd /opt/hadoop-2.8.5/
``

Copy example files

cd /opt/hadoop-2.8.5/ mkdir input cp etc/hadoop/* input/


Test that Hadoop works

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar grep input output 'dfs[a-z.]+' ls -l output/


As your use, create a key so that you can ssh in locally without a password

ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

 
Test that you can connect without a password

ssh localhost yes exit


Edit the hadoop environment to set up the java home

nano /opt/hadoop-2.8.5/etc/hadoop/hadoop-env.sh

and change 

export JAVA_HOME=${JAVA_HOME}

to

export JAVA_HOME=/opt/jdk/current


Edit the hadoop core site and add the file system

nano /opt/hadoop-2.8.5/etc/hadoop/core-site.xml

change
``` to ``` fs.defaultFS hdfs://localhost:9000 ```

Edit the hdfs config

nano /opt/hadoop-2.8.5/etc/hadoop/hdfs-site.xml

change to replicate:

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>

Edit map reduce framework config, file may not exist, so touch it first.

touch /opt/hadoop-2.8.5/etc/hadoop/mapred-site.xml
nano /opt/hadoop-2.8.5/etc/hadoop/mapred-site.xml

contents are yarn, for resource negotiation, instead of local or classic.

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

Edit the yarn config for the mapreduce framework:

nano /opt/hadoop-2.8.5/etc/hadoop/yarn-site.xml

contents:

<configuration>
        <!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

Format the name node,

cd /opt/hadoop-2.8.5/
bin/hdfs namenode -format

Start the master and slave node of the distributed file system

sbin/start-dfs.sh

when prompted to continue reconnecting, say yes so that it adds the key of host 0.0.0.0

yes

ignore all the warnings

Check the installation: If you're in VirtualBox, add a mapping of port 50070 to 50070, then hit http://localhost:50070

http://localhost:50070

Create a home dir in HDFS with your linux username in the bin/hdfs/user directory

bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/${USER}

Check that it was created

bin/hdfs dfs -ls /user/

Confirm that jps is running:

jps

Start yarn, the resource negotiator

sbin/start-yarn.sh

Run jps again, to see 2 new processes running, the NodeManager and the ResourceManager

jps

If you're in VirtualBox, add a port of 8088 and 8088 and hit http://localhost:8088 to see the cluster view

http://localhost:8088

Create an input directory

bin/hdfs dfs -mkdir input

Copy hadoop files in there

bin/hdfs dfs -put etc/hadoop/* input/

List that they're there

bin/hdfs dfs -ls input/

Run mapreduce

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar grep input output 'dfs[a-z.]+'

Watch the progress of the job here:

http://localhost:8088/cluster/

When it's done, check the progress:

bin/hdfs dfs -cat output/*

hadoop-setup's People

Contributors

alanlupsha avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.