More information and details regarding this program can be accessed through this link.
Individuals are expected to have some familiarity working with relational databases and structured-query-language (SQL).
In this three-course specialisation we are introduced to querying big data using modern distributed SQL engines. Upon completion of this program, we can query huge datasets in clusters and cloud storage using newer breed of technologies like Hive, Impala, Presto and Drill.
This program is offered in collaboration with following industry partner:
Estimated Duration: 4 Months
Program Structure: Self Paced (Approx 3Hrs/Week)
In this introductory lesson, we get an overview of database systems and common query language. We learn to distinguish between operational and analytical databases and understand key design principles before working with our data. Later on, we learn the features and benefits of different SQL dialects and explore databases in a big data system using virtual configured setting.
In this lesson, we focus on big data SQL engines like Apache Hive and Apache Impala. We learn how to explore and navigate databases using different tools and identify ways to group and aggregate data to answer analytic questions. We finally learn, how to combine data from multiple tables and realise explicit differences between relational database management systems (RDBMs) and modern query engines.
In this lesson, we discover how to manage and load big datasets in distributed clusters and storage. We learn how to choose among the different data types, file formats and performance issues while working with our data in big data systems. We end this course, by learning how to optimise our queries and workloads in Apache Hive and Apache Impala.
- Glynn Durham - Senior Instructor | Cloudera
- Ian Cook - Curriculum Developer | Cloudera
- SQL
- Hive
- Impala