...

Working with DataFrames

Work through an example to learn how to read and write Spark DataFrames.

We'll cover the following...

Creating DataFrames
Writing DataFrames

Reading and writing data in Spark is very convenient given the high-level abstractions available to connect to a variety of external data sources, such as Kafka, RDBMSs, or NoSQL stores. Spark provides an interface, DataFrameReader, that allows us to read data into a DataFrame from various sources and in a number of formats such as JSON, CSV, Parquet, or Text.

For any meaningful data analysis, we’ll be creating DataFrames from data files. When doing so, we can either instruct Spark to infer the schema of the data itself or specify it for Spark. Let’s see examples of both below:

Creating DataFrames

The following snippet reads the data file BollywoodMovieDetail.csv from the location /data/BollywoodMovieDetail.csv.

val movies = spark.read.format("csv")
  .option("header","true")
  .option("inferSchema","true")
  .load("/data/BollywoodMovieDetail.csv")

The Spark inferred schema can be examined as follows:

...

Spark Overview

DataFrames

Datasets

Spark SQL

Summary

Working with DataFrames

Creating DataFrames