Writing UDF
Let's create a UDF in PySpark.
We'll cover the following
Create a UDF
We need to provide type annotation while defining UDF because it helps PySpark translate code between Python and Scala during runtime. Now, let’s write some UDFs. There are different ways to convert a Python function into UDF, but we’ll use one simple way that is listed below:
-
Define a Python function.
-
Wrap the function with the
fn.udf
function to specify the proper return type annotation.
We prefer this way over the decorator because we can unit test the pure Python function separately. It’s a bit difficult to test PySpark UDFs.
Get hands-on with 1400+ tech skills courses.