Spark SQL Goodness
Get introduced to the functionality in Spark that allows the developer to query DataFrames in a relational database manner: Spark SQL.
We'll cover the following
SQL in SparkSQL
Structured Query Language (abbreviated SQL) has been a golden standard for manipulating data for many years now. It is a powerful tool used widely, so much that top-of-the-art cloud services (AWS S3, among others) provide SQL-like functionality to inspect and retrieve data. It also provides a human-readable syntax and its learning curve is not too steep.
All these positive aspects made it a reasonable choice to embed SQL into the SparkSQL module, even more so if the data Spark works with, on many occasions, is structured or semi-structured.
This lesson focuses on teaching how to use SparkSQL to make our lives easier as Big Data apprentices.
If your SQL knowledge be a bit rusty, check some great courses here on educative.io.
A practical introduction to SparkSQL
Just as SQL relational databases can expose a view of a table that can be accessed by any application needing to interact with the DataSource, SparkSQL follows a similar approach but requires a view as the main entry point to querying a DataFrame.
The below diagram shows how a view is an abstraction on top of the DataFrame when we interact with it through SparkSQL:
Get hands-on with 1400+ tech skills courses.