

Spark SQL Data Source

Spark SQL Data Source

Learn about the various sources and formats of data that can be read and written using Spark SQL.

Reading data into DataFrames

Once data has been ingested, processed, and loaded into Spark SQL databases and tables, it can be read as DataFrames. An example is shown below:

scala> val movies = spark.read.format("csv")
                              .option("header", "true")
                              .option("samplingRatio", 0.001)
                              .option("inferSchema", "true")

scala> movies.write.saveAsTable("movieData")

scala> val movieTitles = spark.sql("SELECT title FROM movieData")

scala> movieTitles.show(3, false)
|title                            |
|Albela                           |
|Lagaan: Once Upon a Time in India|
|Meri Biwi Ka Jawab Nahin         |
only showing top 3 rows

In the above example, we create the Spark SQL table movieData and then execute a Spark SQL query to return only the titles of the movies as a DataFrame. ...