Log In
Join
for free
Back To Course Home
From Pandas to PySpark DataFrame
0% completed
Introduction
Getting Started
Overview of Dataset
Data Input/Output
Introduction to Data Input and Output
Read Data into DataFrame
Rename Attributes
Select a Subset of Attributes
Data Input and Output: Save a Snapshot
Read Parquet Data Source
Write Production Code
Quiz: Data Input and Output
Challenge: Data Input and Output
Solution: Data Input and Output
Data Transformation
Introduction to Data Transformation
Setup
Handling Date-time
Impute Unavailable Data Points
Average Review per Product
Total Number of Reviews for Each Product
Distribution of the Review Text Length
Yearly Median Review
Top reviews of 2017
Compare Total Review of 2016 and 2017
Conversion Between Wide and Long Format using melt and pivot
Date Transformation: Save a Snapshot
Avoid Global Scope
Quiz: Data Transformation
Challenge: Data Transformation
Solution: Data Transformation
User Defined Function (UDF)
Introduction to User-defined Functions
Object Conversion Between Python and Scala
Writing UDF
UDF in Action
UDF: Save a snapshot
Quiz: User-defined Functions
Challenge: User-defined Functions
Solution: User Defined Function
Wrapping Up
Conclusion
Appendix
Amazon Review Data (2018)
pandas and PySpark: Behind the Scenes
UDF: Save a snapshot
Learn to create a dataset checkpoint.
We'll cover the following
Saving the updated dataset checkpoint
Get hands-on with 1400+ tech skills courses.
Start Free Trial