...

/

Transformations and Actions

Transformations and Actions

Learn the two cornerstone concepts of transformations and actions in Spark.

Two types of operations

After having worked on the previous example projects, we’re better positioned to understand two crucial concepts, and their related operations, involved in any Spark application: transformations and actions.

These two concepts are exposed programmatically by the Spark API’s methods that belong to abstractions such as DataFrames, RDDs, or JavaRDDs, etc.

Transformations

Transformations are the kind of operations that can transform both the structure of a DataFrame and its contents. We’ve applied these two types of operations in previous examples while:

  1. Renaming, dropping, and creating columns of a DataFrame or a Dataset (withColumn(), drop() methods, etc.).

  2. Doing calculations on each row of the DataFrame, whether to add a new column or create a Dataset of POJOs (when we introduced the map() method and related interface MapFunction).

Every time we applied a transformation we also got a new DataFrame as a result. This happens is due to a fundamental property of the abstraction:

  • DataFrames are immutable structures.
...
Access this course and 1400+ top-rated courses and projects.