...

Spark Application - An Example

Work through an example Spark application and explore the web UIs of Spark and Spark History Server.

We'll cover the following...

Spark web UI
Spark web history server

The quintessential example for big data has always been the word count application, which computes the number of times a word appears in a large text file. We’ll deviate from that example and instead use movies data to compute the number of movies that were sequels. Before we move forward, the columns for the data file that we’ll read-in and process are presented below for easy reference:

imdbId	title	releaseYear	releaseDate	genre	writers	actors	directors	sequel	hitFlop

If the movie was a sequel to a previous movie then the column sequel’s value is set to 1. Our task requires us to read the data file and then run a query to count the rows with the column sequel set to 1.

Our Scala program below shows how to compute the number of rows in a data set that have the sequel column ...

Spark Overview

DataFrames

Datasets

Spark SQL

Summary

Spark Application - An Example