Gain insights into Spark, its architecture, application lifecycle, and APIs. Delve into data frames, datasets, and Spark SQL to effectively manage and query big data.

spark.tar.gz

SparkShellUI

SparkHistoryServerUI

Spark has come to dominate the big data processing space in a short span of time since its release and now serves as the de-facto unified big data processing engine in the industry. 

In this course, you will get a complete introduction to the basics of Spark. You will start by learning about the architecture, the application lifecycle, and its API.

From there, you will dive into the data frame data structure and its API as well as the strongly-typed datasets API. Lastly, you’ll get into the Spark SQL engine which will allow you to issue queries on structured data with a schema.

By the end of this course, you will have the confidence to use Spark in any of your big data projects.

An Introduction to Spark

The ecosystem of Big Data is still fledgling but, Spark stands out in its rapid adoption across enterprises in favor of the traditional Hadoop stack's Map Reduce paradigm. If you have previously worked with  MapReduce, you'll appreciate Spark more, and understand better the pain points it addresses that were inherent in the Hadoop's Map Reduce model. Spark is very versatile and works seamlessly with a variety of old and new Big Data technologies e.g. it can run on YARN and also use HDFS as storage, both of which come from the original Hadoop stack. Spark as an in-memory distributed processing engine enhances rather than replaces the capabilities of the modern Big Data technology stack.

The course is compact but comprehensive and  intentionally avoids deep discussions about the internals and technical workings of Spark. The course does a robust coverage of the fundamentals of Spark with an aim to impart enough context and knowledge to the reader so as to set up the reader to independently learn and work with complex tasks and capabilities of Spark.

The reader is more than welcome and encouraged to share any feedback on this course. Thank you for your support.

Spark Overview

DataFrames

Datasets

Spark SQL

Summary

Closing Remarks