Search⌘ K

Spark Application - An Example

Explore running a Spark application by processing movie data to count sequels. Understand how to load data, apply DataFrame transformations, execute actions, and use Spark-shell and spark-submit tools. Gain insight into viewing job execution via Spark web UI and history server.

We'll cover the following...

The quintessential example for big data has always been the word count application, which computes the number of times a word appears in a large text file. We’ll deviate from that example and instead use movies data to compute the number of movies that were sequels. Before we move forward, the columns for the data file that we’ll read-in and process are presented below for easy reference:

imdbId title releaseYear releaseDate genre writers actors directors sequel hitFlop

If the movie was a sequel to a previous movie then the column sequel’s value is set to 1. Our task requires us to read the data file and then run a query to count the rows with the column sequel set to 1.

Our Scala program below shows how to compute the number of rows in a data set that ...