Gain insights into Spark, its architecture, application lifecycle, and APIs. Delve into data frames, datasets, and Spark SQL to effectively manage and query big data.

spark.tar.gz

SparkShellUI

SparkHistoryServerUI

Spark has come to dominate the big data processing space in a short span of time since its release and now serves as the de-facto unified big data processing engine in the industry. 

In this course, you will get a complete introduction to the basics of Spark. You will start by learning about the architecture, the application lifecycle, and its API.

From there, you will dive into the data frame data structure and its API as well as the strongly-typed datasets API. Lastly, you’ll get into the Spark SQL engine which will allow you to issue queries on structured data with a schema.

By the end of this course, you will have the confidence to use Spark in any of your big data projects.

An Introduction to Spark

## Overview
Spark SQL allows developers to programmatically issue ANSI SQL:2003–compatible queries on structured data with a schema. Spark SQL was introduced in version 1.3. Since then, several higher-level functionalities have been built upon it. Some of these are:

- Generates optimized query plans and the final execution of compact JVM code.

- Serves as a bridge to external tools using database ODBC/JDBC connectors.

- Adds the ability to read and write structured files in various formats like JSON, CSV, or Avro and convert them into temporary tables.

- Connects to the Apache Hive metastore and tables.

- Introduces an interactive Spark SQL shell for adhoc and quick data exploration.

- Unifies the various components of Spark and allows for creating DataFrame/Dataset abstractions in languages supported by Spark (Java, Scala, Python and R).



# Overview
Spark SQL allows developers to programmatically issue ANSI SQL:2003–compatible queries on structured data with a schema. Spark SQL was introduced in version 1.3. Since then, several higher-level functionalities have been built upon it. Some of these are:

- Generates optimized query plans and the final execution of compact JVM code.

- Serves as a bridge to external tools using database ODBC/JDBC connectors.

- Adds the ability to read and write structured files in various formats like JSON, CSV, or Avro and convert them into temporary tables.

- Connects to the Apache Hive metastore and tables.

- Introduces an interactive Spark SQL shell for adhoc and quick data exploration.

- Unifies the various components of Spark and allows for creating DataFrame/Dataset abstractions in languages supported by Spark (Java, Scala, Python and R).



Get an introduction to the Spark SQL engine and its two sub-components, Tungsten Project and Catalyst optimizer.


Spark SQL Engine

Get an introduction to the Spark SQL engine and its two sub-components, Tungsten Project and Catalyst optimizer.

Spark Overview

DataFrames

Datasets

Spark SQL

Summary

Spark SQL Engine

Overview