Spark User Defined Functions
Learn how to create user defined functions in Spark and utilize in-built Spark functions.
We'll cover the following...
We have previously seen and worked with Spark’s in-built function, but Spark also allows users to define their own functionality wrapped inside user defined functions (UDFs) that can be invoked in Spark SQL. The major benefit of UDFs is reusability. UDFs exist per session and don’t persist within the underlying metastore. Let’s consider a simple function that returns the last two digits of the releaseYear
value e.g., if the function is passed-in 2021, it’ll return 21. The function definition and its use is presented below:
val movies = spark.read.format("csv")
.option("header", "true")
.option("samplingRatio", 0.001)
.option("inferSchema", "true")
...