Transformations (I): Map and Filter
Let’s examine the first couple of fundamental transformations through code examples and visual representations.
This lesson follows the project embedded in the widget below.
It is recommended that we follow the explanations in tandem with the code, and run the project to see the results. It’s also helpful to change parts of the code and detour a bit from the code base to experiment and see the results live.
mvn install exec:exec
Let’s print the DataFrame’s first five lines, containing data about foods and their respective related data, by running the project for the first time with mvn install exec:exec
+--------------+--------------------+----------------+--------------------+
| FOOD NAME| SCIENTIFIC NAME| GROUP| SUB GROUP|
+--------------+--------------------+----------------+--------------------+
| Angelica| Angelica keiskei|Herbs and Spices| Herbs|
| Savoy cabbage|Brassica oleracea...| Vegetables| Cabbages|
| Silver linden| Tilia argentea|Herbs and Spices| Herbs|
| Kiwi| Actinidia chinensis| Fruits| Tropical fruits|
|Allium (Onion)| Allium| Vegetables|Onion-family vege...|
+--------------+--------------------+----------------+--------------------+
only showing top 5 rows
Map
In plain terms, the map transformation provides the functionality to apply a function to all the elements of a DataFrame (or other Spark abstractions).
As developers, it is always a good exercise to read a method signature because it can show intent and the contract we have to abide by for the map method it is defined as:
map(MapFunction<T,U> func, Encoder<U> encoder)
-
Func: Of the two arguments it takes, its first is a MapFunction type called Func. We’ve used it in a previous lesson, but we can reiterate that it defines an Interface containing a sole method (call) with an input of type ‘T’ and an Output or return type of ‘U.’ In Java jargon, this means ...