Spark SQL Data Source
Learn about the various sources and formats of data that can be read and written using Spark SQL.
We'll cover the following...
Reading data into DataFrames
Once data has been ingested, processed, and loaded into Spark SQL databases and tables, it can be read as DataFrames. An example is shown below:
scala> val movies = spark.read.format("csv")
.option("header", "true")
.option("samplingRatio", 0.001)
.option("inferSchema", "true")
.load("/data/BollywoodMovieDetail.csv")
scala> movies.write.saveAsTable("movieData")
scala> val movieTitles = spark.sql("SELECT title FROM movieData")
scala> movieTitles.show(3, false)
+---------------------------------+
|title |
+---------------------------------+
|Albela |
|Lagaan: Once Upon a Time in India|
|Meri Biwi Ka Jawab Nahin |
+---------------------------------+
only showing top 3 rows
In the above example, we create the Spark SQL table movieData
and then execute a Spark SQL query to return only the titles of the movies as a DataFrame. ...