Learning Apache Spark and Streaming Analysis recently.
Reference: https://github.com/mahmoudparsian/pyspark-tutorial
Introduction: https://github.com/mahmoudparsian/pyspark-tutorial/blob/master/tutorial/combine-by-key/distributed_computing_with_spark_by_Javier_Santos_Paniego.pdf
- Dna Sequence: DNA base counting
- Word Count: classic word count
- Bigrams: find frequency of bigrams
- Join: join of two relations R(K, V1), S(K, V2)
- Map: basic mapping of RDD elements
- Sum: how to add all RDD elements together
- Multiply: how to multiply all RDD elements together
- Sort: Find average by using combineByKey()
- Filter: how to filter RDD elements
- Average: how to find average
- Cartesian Product: rdd1.cartesian(rdd2)
- Sort By Key: sort by key ascending/ descending
- Add Indices: how to add indices
- Map Partitions: mapPartitions() by examples