- Apache Hadoop
- Apache Spark
- HiveQL (HQL)
- Map Reduce
- Java 1.8
- Python 3
-
Word Count in Java
- Design and Implement an efficient java code to determine 100 most frequent/repeated words in a given dataset. The objective here is to obtain the result with the least possible execution Time.
-
Top-K words using Apache Hadoop and Map-Reduce Approach.
- Determine 100 most frequent/repeated words in the given dataset considering only the words having more than 3 characters using both MapReduce and Hive.
-
Text Mining using Apache Spark.
- Process a large text data set using Apache Spark and derive insights like total number of minor revision, page title and page id of all the pages that have at most five URL links mentioned in their text field and all the contributor with more than one contribution along with the revision-id. Sort the list in descending order of timestamp.
-
Processing streaming data using Apache Spark.
- Analyse data streams over network in real time and derive valuable insights like moving average of “dOctets” field over time window of one minute, all the “srcaddr” & “dstaddr” that appears twice in a sliding window of 2 minutes.