UDF in Action
Learn to play around with user-defined functions.
Create an in-memory DataFrame
To elaborate a more complex application of UDF, let’s first create an in-memory DataFrame as follows:
wget http://deepyeti.ucsd.edu/jianmo/amazon/sample/sample_Home_and_Kitchen_5.json wget http://deepyeti.ucsd.edu/jianmo/amazon/sample/sample_meta_Home_and_Kitchen.json wget http://deepyeti.ucsd.edu/jianmo/amazon/categoryFilesSmall/Toys_and_Games_5.json.gz
Example of user-defined functions with an in-memory DataFrame
After successful code execution, we’ll see the message “Code Executed Successfully” in the terminal.
Adding a new column to DataFrame
Let’s say we want to add a new column that summarizes the Address
column of the DataFrame. The new summary
column should have three fields— the length of the full address, a boolean that indicates if there is a postcode or not, and the postcode itself. We can write the UDF as follows:
wget http://deepyeti.ucsd.edu/jianmo/amazon/sample/sample_Home_and_Kitchen_5.json wget http://deepyeti.ucsd.edu/jianmo/amazon/sample/sample_meta_Home_and_Kitchen.json wget http://deepyeti.ucsd.edu/jianmo/amazon/categoryFilesSmall/Toys_and_Games_5.json.gz
Apply a UDF to a DataFrame
After successful code execution, we’ll see the message “Code Executed ...