Mastering Big Data with Apache Spark and Java/

...

Transformations (I): Map and Filter

Let’s examine the first couple of fundamental transformations through code examples and visual representations.

We'll cover the following...

Map
Filter

Let’s print the DataFrame’s first five lines, containing data about foods and their respective related data, by running the project for the first time with mvn install exec:exec

+--------------+--------------------+----------------+--------------------+
|     FOOD NAME|     SCIENTIFIC NAME|           GROUP|           SUB GROUP|
+--------------+--------------------+----------------+--------------------+
|      Angelica|    Angelica keiskei|Herbs and Spices|               Herbs|
| Savoy cabbage|Brassica oleracea...|      Vegetables|            Cabbages|
| Silver linden|      Tilia argentea|Herbs and Spices|               Herbs|
|          Kiwi| Actinidia chinensis|          Fruits|     Tropical fruits|
|Allium (Onion)|              Allium|      Vegetables|Onion-family vege...|
+--------------+--------------------+----------------+--------------------+
only showing top 5 rows

Map

In plain terms, the map transformation provides the functionality to apply a function to all the elements of a DataFrame (or other Spark abstractions).

As developers, it is always a good exercise to read a method signature because it can show intent and the contract we have to abide by for the map method it is defined as:

map(MapFunction<T,U> func, Encoder<U> encoder)

Func: Of the two arguments it takes, its first is a MapFunction type called Func. We’ve used it in a previous lesson, but we can reiterate that it defines an Interface containing a sole method (call) with an input of type ‘T’ and an Output or return type of ‘U.’ In Java jargon, this means defining an interface in a class that implements the call() method within it.
Encoder: The ...

Course Introduction

Spark Introduction and Basics

Getting Started with Spark

DataFrame Basic Operations

DataFrame Advanced Operations

Spark SQL and Other Functionalities

Building a Big Data Batch Application

Deployment and Cluster Execution

Monitoring and Performance Fundamentals

Conclusion

Apendix

Transformations (I): Map and Filter

Map