Spark SQL Engine
Get an introduction to the Spark SQL engine and its two sub-components, Tungsten Project and Catalyst optimizer.
We'll cover the following
Overview
Spark SQL allows developers to programmatically issue ANSI SQL:2003–compatible queries on structured data with a schema. Spark SQL was introduced in version 1.3. Since then, several higher-level functionalities have been built upon it. Some of these are:
-
Generates optimized query plans and the final execution of compact JVM code.
-
Serves as a bridge to external tools using database ODBC/JDBC connectors.
-
Adds the ability to read and write structured files in various formats like JSON, CSV, or Avro and convert them into temporary tables.
-
Connects to the Apache Hive metastore and tables.
-
Introduces an interactive Spark SQL shell for adhoc and quick data exploration.
-
Unifies the various components of Spark and allows for creating DataFrame/Dataset abstractions in languages supported by Spark (Java, Scala, Python and R).
Get hands-on with 1400+ tech skills courses.