UDF in Action

Learn to play around with user-defined functions.

Create an in-memory DataFrame

To elaborate a more complex application of UDF, let’s first create an in-memory DataFrame as follows:

wget http://deepyeti.ucsd.edu/jianmo/amazon/sample/sample_Home_and_Kitchen_5.json
wget http://deepyeti.ucsd.edu/jianmo/amazon/sample/sample_meta_Home_and_Kitchen.json
wget http://deepyeti.ucsd.edu/jianmo/amazon/categoryFilesSmall/Toys_and_Games_5.json.gz

Example of user-defined functions with an in-memory DataFrame

After successful code execution, we’ll see the message “Code Executed Successfully” in the terminal.

Adding a new column to DataFrame

Let’s say we want to add a new column that summarizes the Address column of the DataFrame. The new summary column should have three fields— the length of the full address, a boolean that indicates if there is a postcode or not, and the postcode itself. We can write the UDF as follows:

wget http://deepyeti.ucsd.edu/jianmo/amazon/sample/sample_Home_and_Kitchen_5.json
wget http://deepyeti.ucsd.edu/jianmo/amazon/sample/sample_meta_Home_and_Kitchen.json
wget http://deepyeti.ucsd.edu/jianmo/amazon/categoryFilesSmall/Toys_and_Games_5.json.gz

Apply a UDF to a DataFrame

After successful code execution, we’ll see the message “Code Executed ...