Two types of operations

After having worked on the previous example projects, we’re better positioned to understand two crucial concepts, and their related operations, involved in any Spark application: transformations and actions.

These two concepts are exposed programmatically by the Spark API’s methods that belong to abstractions such as DataFrames, RDDs, or JavaRDDs, etc.

Transformations

Transformations are the kind of operations that can transform both the structure of a DataFrame and its contents. We’ve applied these two types of operations in previous examples while:

Renaming, dropping, and creating columns of a DataFrame or a Dataset (withColumn(), drop() methods, etc.).
Doing calculations on each row of the DataFrame, whether to add a new column or create a Dataset of POJOs (when we introduced the map() method and related interface MapFunction).

Every time we applied a transformation we also got a new DataFrame as a result. This happens is due to a fundamental property of the abstraction:

DataFrames are immutable structures. Explained in practical terms, they are objects that can be read or created but not updated. To obtain a modified version of a Dataframe, a new one is created based on the existing DataFrame’s information after a

...

Course Introduction

Spark Introduction and Basics

Getting Started with Spark

DataFrame Basic Operations

DataFrame Advanced Operations

Spark SQL and Other Functionalities

Building a Big Data Batch Application

Deployment and Cluster Execution

Monitoring and Performance Fundamentals

Conclusion

Apendix

Transformations and Actions

Two types of operations

Transformations