Spark SQL Engine
Get an introduction to the Spark SQL engine and its two sub-components, Tungsten Project and Catalyst optimizer.
We'll cover the following...
Overview
Spark SQL allows developers to programmatically issue ANSI SQL:2003–compatible queries on structured data with a schema. Spark SQL was introduced in version 1.3. Since then, several higher-level functionalities have been built upon it. Some of these are:
-
Generates optimized query plans and the final execution of compact JVM code.
-
Serves as a bridge to external tools using database ODBC/JDBC connectors.
-
Adds the ability to read and write structured files in various formats like JSON, CSV, or Avro and convert them into temporary tables.
-
Connects to the Apache Hive metastore and tables. ...