ydata-profiling

Discover how to easily generate detailed reports for exploratory data analysis (EDA) with ydata-profiling.

Extended pandas libraries

We'll wrap up the course by looking at several useful pandas library extensions that have proven to be highly valuable to many data practitioners in real-world use cases. Although they aren’t directly part of pandas, they are designed to make the use of pandas even more effective and efficient.

Let’s start by looking at the first of these extended libraries—ydata-profiling (formerly known as pandas-profiling).

Introduction to ydata-profiling

One of the key early steps when embarking on a data science project is performing exploratory data analysis (EDA) to gain a strong understanding of the datasets available.

EDA is the systematic process of examining, cleaning, transforming, and modeling data to discover useful information, form hypotheses, and support decision-making.

Given how manual and potentially repetitive the EDA process tends to be, the open-source community came together to build ydata-profiling. The ydata-profiling library is a popular library that offers an interactive and comprehensive report on the main characteristics of a dataset.

The ydata-profiling library aims to produce interactive and detailed EDA reports for pandas and Spark DataFrames with just one line of code, thereby increasing the speed at which we can gain a firm understanding of our data.

We’ll be working with a retail shop customer dataset for this lesson, as shown below.

Get hands-on with 1400+ tech skills courses.