Exploratory Data Analysis

Get familiar with exploratory data analysis, statistical properties, and visualizing underlying data relationships.

An EDA exercise includes data cleaning, visualization, descriptive statistics, and hypothesis testing. With EDA we analyze and summarize the main characteristics of a dataset. Our goal is to gain insights into the underlying relationships, which is a crucial step before building predictive models.

In this lesson, we’ll perform our EDA on the airline fares dataset. The dataset includes flights operating between various Indian cities and their fares. A flight from Kolkata to Bangalore may sometimes provide service to Hyderabad as well. There are instances where, even on the same route, the fare might fluctuate based on the booking date, and we have multiple records in our dataset for such cases, as we can see below for flight 6E-148:

Press + to interact
flight_number = "6E-148"
data[data.flight==flight_number]

From the above output, we can see that:

  • Fares for the same flight vary greatly (2,482, 7,420) just on the basis of the number of days before departure.

  • The dataset consists of a mix of numerical features (duration, days_left) and categorical features (airline, flight, source_city, departure_time, stops, arrival_time, destination_city, ...