Transformations and Actions
Learn the two cornerstone concepts of transformations and actions in Spark.
Two types of operations
After having worked on the previous example projects, we’re better positioned to understand two crucial concepts, and their related operations, involved in any Spark application: transformations and actions.
These two concepts are exposed programmatically by the Spark API’s methods that belong to abstractions such as DataFrames, RDDs, or JavaRDDs, etc.
Transformations
Transformations are the kind of operations that can transform both the structure of a DataFrame and its contents. We’ve applied these two types of operations in previous examples while:
-
Renaming, dropping, and creating columns of a DataFrame or a Dataset (
withColumn()
,drop()
methods, etc.). -
Doing calculations on each row of the DataFrame, whether to add a new column or create a Dataset of POJOs (when we introduced the
map()
method and related interfaceMapFunction
).
Every time we applied a transformation we also got a new DataFrame as a result. This happens is due to a fundamental property of the abstraction:
- DataFrames are immutable structures.