0% completed

All LessonsFree Lessons (5)

Introduction

Getting Started Overview of Dataset

Data Input/Output

Introduction to Data Input and Output Read Data into DataFrame Rename Attributes Select a Subset of Attributes Data Input and Output: Save a Snapshot Read Parquet Data Source Write Production Code Quiz: Data Input and Output Challenge: Data Input and Output Solution: Data Input and Output

Data Transformation

Introduction to Data Transformation

Handling Date-time

Impute Unavailable Data Points

Average Review per Product

Total Number of Reviews for Each Product

Distribution of the Review Text Length

Yearly Median Review

Top reviews of 2017

Compare Total Review of 2016 and 2017

Conversion Between Wide and Long Format using melt and pivot

Date Transformation: Save a Snapshot

Avoid Global Scope

Quiz: Data Transformation

Challenge: Data Transformation

Solution: Data Transformation

User Defined Function (UDF)

Introduction to User-defined Functions Object Conversion Between Python and Scala Writing UDF UDF in Action UDF: Save a snapshot Quiz: User-defined Functions Challenge: User-defined Functions Solution: User Defined Function

Wrapping Up

Appendix

Amazon Review Data (2018)pandas and PySpark: Behind the Scenes

Project

Apriori Algorithm for Finding Frequent Itemsets with PySpark

From Pandas to PySpark DataFrame/

...

/

Setup

Setup

Learn to set up the environment for data transformation.

We'll cover the following...

Overview of the setup
- Pandas
- PySpark

Overview of the setup

First, we need to load our snapshot of the original data. In the case of pandas, we load the original data. However, for PySpark, we use the snapshot.

Here’s the list of imports we would need to work with pandas and PySpark: ...