Giter VIP home page Giter VIP logo

bigdataspark's Introduction

BigDataSpark

Big Data Using Apache Spark

###COURSE CONTENT

Week 1: Big Data and Data Science

    Introduction to Big Data and Data Science - learn about big data and see examples of how data science can leverage big data
    Performing Data Science and Preparing Data - explore data science definitions and topics, and the process of preparing data
    Setting up the Course Software Environment  - download and install the course software, run your first Apache Spark notebook, and submit your first assignment

Week 2: Introduction to Apache Spark

    Big Data, Hardware Trends, and the History of  Apache Spark - discuss big data and hardware trends, and learn about the history of Apache Spark
    Spark Essentials - learn about Spark's Resilient Distributed Datasets, transformations, and actions 
    Lab 1: Learning Apache Spark  - perform your first course lab where you will learn about the Spark data model, transformations, and actions, and write a word counting program to count the words in all of Shakespeare's plays

Week 3: Data Management

    Semi-Structured Data - explore the concept of semi-structured data and how tabular data is handled in Spark
    Structured Data - learn about structured data, the relational data model, SQL, and joins in SQL and Spark 
    Lab 2: Web Server Log Analysis with Apache Spark  - use Spark to explore a NASA Apache web server log in the second course lab 

Week 4: Data Quality, Exploratory Data Analysis, and Machine Learning

    Data Quality - learn about the challenges of data quality and cleaning
    Exploratory Data Analysis - understand the statistics of Exploratory Data Analysis and data distributions
    Machine Learning - learn about Spark's machine learning library, mllib 
    Lab 3: Text Analysis and Entity Resolution - perform text analysis and entity resolution on Google and Amazon product listings using Spark in the third course lab 

Week 5: Data Management

    Lab 4: Introduction to Machine Learning with Apache Spark - use Spark's mllib Machine Learning library to perform collaborative filtering on a movie dataset in the fourth course lab 

Useful Links:

The US National Institute of Standards and Technology has an excellent primer on Exploratory Data Analysis


The five-number summary is a [descriptive statistic] (https://en.wikipedia.org/wiki/Descriptive_statistics) that provides information about a set of observations. It consists of the five most important sample percentiles:

You can compare the five-number summaries of multiple observations using a box plot:

Box Plot

bigdataspark's People

Contributors

mshayeb avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.