This repo contains a set of tutorials for Hadoop designed to work inside or alongside the Oracle Big Data Lite VM. Big Data Lite is a single-node Hadoop cluster that runs in as a Virtual Box Virtual Machine.
Current tutorials are as follows:
###Big Data Lite Tutorials (Tested with v4.6)
- 1 - MaxTemperature - A simple Hadoop MapReduce application that uses NOAA weather data to find the maximum tempature by year. Can be imported into Eclipse IDE as a Java Application.
- 1 - cluster-mapred - Demonstrates a MapReduce pipeline that uses a clustering algorithum to identify duplicates addresses for a given entity name. Can be imported into Eclipse IDE as a Maven project.