Higher-Order Function

Explore examples to explore more complex functions in Spark SQL such as transform, map, explode, and more.

So far, we have seen simple examples of user and built in functions, but Spark also offers functions that can work on more complex types. Let’s work through an example that’ll require us to use some of the higher-order functions. Realize that, given the structure of our data, we can’t answer questions by filtering on the actor’s name, since the actors in a movie appear as a pipe delimited string in the actors column. Instead, we’ll first need to split the string by the delimiter to get an array of string tokens where each token is an actor’s name starring in the movie. We can use the split() function as follows:

spark-sql> SELECT title, releaseYear, hitFlop, split(actors,"[|]") AS actors FROM movies;

For demonstrating the various tools that come with Spark, we have used the spark-sql CLI utility instead of spark-shell for this example. A few rows from the above query are shown below:

Get hands-on with 1400+ tech skills courses.