Challenge: User-defined Functions

Let's solve programming challenges related to UDF in PySpark.

We'll cover the following

Task

Calculate the median review for a selected year—for example 2016—and compare it with the median review of the top product for that particular year. The top product is selected based on the total number of reviews for a particular product, such that the total review lies above the 75th percentile.

Steps

  1. Calculate the total review per product of 2016.

  2. Calculate median review for a grouped DataFrame.

Get hands-on with 1400+ tech skills courses.