coursal / hadoop-examples Goto Github PK

Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.

Java 100.00%

hadoop hadoop-mapreduce apache-hadoop examples java hadoop-example mapreduce mapreduce-java

hadoop-examples's Introduction

Hadoop Examples

Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.

Preparations & Prerequisites

Latest stable version of Hadoop or at least the one used here, 3.3.0.
A single node setup is enough. You can also use the applications in a local cluster or in a cloud service, with needed changes on the map splits and the number of reducers, of course.
Of course, having (a somehow recent version of) Java installed. I have openjdk 11.0.5 installed to my 32-bit Ubuntu 16.04 system, and if I can do it, so can you.

Projects

Each project comes with its very own:

input data (.csv, .tsv, or simple text files in a folder ready to be copied to the HDFS).
execution guide (found in the source code of each project but also being heavily dependent of your setup of java and environment variables, so in case the guide doesn't work, you can always google/yahoo/bing/altavista your way to execution).

The projects featured in this repo are:

AvgPrice

Calculating the average price of houses for sale by zipcode.

BankTransfers

A typical "sum-it-up" example where for each bank we calculate the number and the sum of its transfers.

MaxTemp

Typical case of finding the max recorded temperature for every city.

Medals

An interesting application of working on Olympic game stats in order to see the total wins of gold, silver, and bronze medals of every athlete.

NormGrades

Just a plain old normalization example for a bunch of students and their grades.

OldestTree

Finding the oldest tree per city district. Child's play.

A bit more challenging than the rest. Every key-character (A-E) has 3 numbers as values, two negatives and one positive. We just calculate the score for every character based on the following expression character_score = pos / (-1 * (neg_1 + neg_2)).

hadoop-examples's People

Contributors

Stargazers

Watchers

Forkers

neha-ratakonda arcasx115

hadoop-examples's Issues

Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist:

Hi, I am a new hadoop learner. When I use your code to run hadoop it have this problem. I worked the Word Count example that run normally.(https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Partitioner). But your code not working when I command hadoop jar Bank_Transfers.jar Bank_Transfers. Please help me to understand. Thanks, have a good day.

 2023-03-18 00:45:31,378 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /127.0.0.1:8032

2023-03-18 00:45:31,626 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

2023-03-18 00:45:31,671 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/bigdata/.staging/job_1679072275142_0004

2023-03-18 00:45:31,946 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/bigdata/.staging/job_1679072275142_0004

Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/bigdata/bank_dataset

	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:340)

	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:279)

	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:404)

	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)

	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)

	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)

	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1571)

	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1568)

	at java.base/java.security.AccessController.doPrivileged(Native Method)

	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)

	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)

	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1568)

	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1589)

	at Bank_Transfers.main(Bank_Transfers.java:113)

	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

	at java.base/java.lang.reflect.Method.invoke(Method.java:566)

	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)

	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

Caused by: java.io.IOException: Input path does not exist: hdfs://localhost:9000/user/bigdata/bank_dataset

	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:313)

	... 19 more