Actions (I): Count, Take, and Collect

Let's focus on brief examples around basic actions.

We'll cover the following

In our previous examples and code snippets, a line of code to kick start the processing and get the output was usually the call to the show(...) method. This method, just as many others, belongs to a group of operations called actions.

An action triggers a cascade of previously applied transformations and returns a final result back to the driver program.

In other words, it forces evaluation of these transformations and initiates computation on the cluster. At the end of this process, either of the following things can happen:

  • Results (processed DataFrame’s rows) can be collected in the driver program, usually into another Java structure (such as a List.)

  • Results can be written to external storage, be that a file, a DB table, or any other DataSource.

At the code level, actions are just method invocations on the DataFrame. Let’s take a look at some of them.

Count

count() is used to count the elements or rows of a DataFrame (and other structures, like RDDs or JavaRDDs.)

Get hands-on with 1400+ tech skills courses.