Overview of Data Visualization, Variables, and their Types

Why do we need data visualization?

Data is an essential component of any organization—it’s all around us. For example, when we go grocery shopping, the store records certain information about our purchases, such as the items we purchase, their quantity, and so on. This information can assist companies in making critical decisions, such as which products are most popular with customers and which products earn the most profit.

The retailer can then analyze the data using their databases, going row by row in the records and understanding the relationships between different products. It may be feasible for a small local store, but what about large e-commerce retailers like Amazon, eBay, and others where millions of customer transactions exist?

In this case, it would be practically impossible to draw any meaningful conclusion by going through each record of data because the dataset is so large. Let’s consider that we have a restaurant dataset and are required to determine if customers with higher bills tend to give more tips.

What would be our approach to handling this situation? First, we would need to review each row and observe the relationship between the total bill and the tips given. We can then draw a conclusion based on our findings. It would be a bit time-consuming, but we’d manage. We can find and represent the relationship between the total bill and tips as a graph, as shown below:

The graph above illustrates the relationship between the total bill and tips. If we closely observe the graph, we can conclude that customers with higher bills tend to give higher tips.

What if we wanted to interpret how many customers are smokers in the dataset given above? We would need to review the data again. What if the number of records scales from 20 into the hundreds? It would become impractical to observe data only by following the rows, and the problem would scale as the size of the data increases.

In these kinds of scenarios, data visualizations come in handy. We can create several complex visualizations using Python’s seaborn library. In comparison to raw data, visualization can more effectively convey information. As Henrik Ibsen said, “A picture is worth a thousand words.” Additional visualizations capture the audience’s attention and efficiently convey information.

The data visualization above shows the relationship between the total bill and tips given by the customers. The visualization also categorizes the customers as smokers and nonsmokers. If we observe the above visualization closely, we can conclude that most customers are nonsmokers.

In this course, we’ll learn to construct various visualizations and discuss important concepts (such as variables, types of statistical analyses, and so on) in order to understand and use Python’s seaborn library for data visualization and analysis.

What is a variable?

In statistics, a variable is anything that can change or vary over time. For example, the temperature varies during the day; it may be cold in the morning and warm in the afternoon, indicating that the temperature is changeable.

Likewise, a student’s grades can change throughout semesters, reflecting exceptional performance in some semesters and average performance in others. Other variables can be a person’s weight, age, height, salary, and expenses, since they can change over time.

Variables in Python

In Python, the concept of variables is the same as in statistics—anything that can vary with time. In Python, a variable refers to a storage location with different values. The variable type determines the kind of data a variable can have.

Press + to interact
student_age = 22 # variable creation and declaration
student_name = "Sara" # type of variable is determined based on the value assigned
print("Name of student: ",student_age)
print("Age of student: ",student_name)

As shown in the code snippet above, student_age is assigned an integer value, so its data type is an integer. Similarly, the variable student_name is assigned a string value, so its data type is a string.

We can verify the statements above using the default type() method in Python, which returns the class type of the parameter passed as an argument in the type() method.

Press + to interact
student_age = 22 # variable creation and declaration
student_name = "Sara" # type of variable is determined based on the value assigned
print("Data type of variable student_age: ",type(student_age))
print("Data type of variable student_name:",type(student_name))

Primitive variable types in Python

The primitive variable types supported by the Python language are as follows:

  • Strings
  • Floats
  • Integers
  • Booleans
Press + to interact
student_name = "John" # a string variable
student_weight = 50.20 # a floating variable representing weight in kilo-grams(Kg)
student_age = 25 # an integer variable
interview_passed = True # a boolean variable

Types of variables in statistical analysis

The major types of variables used in statistical analysis are as follows:

  • Categorical variables
  • Numerical variables
  • Continuous variables
  • Discrete variables