Features

Learn about the different feature types that can be part of a dataset.

Chapter Goals:

  • Understand the difference between quantitative and categorical features
  • Learn methods to manipulate features and add them to a DataFrame
  • Write code to add MLB statistics to a DataFrame

A. Quantitative vs. categorical

We often refer to the columns of a DataFrame as the features of the dataset that it represents. These features can be quantitative or categorical.

A quantitative feature, e.g. height or weight, is a feature that can be measured numerically. These are features we could calculate the sum, mean, or other numerical metrics for.

A categorical feature, e.g. gender or birthplace, is one where the values are categories that could be used to group the dataset. These are the features we would use with the groupby function from the previous chapter.

Some features can be both quantitative or categorical, depending on the context they are used. For example, we could use year of birth as a quantitative feature if we were trying to find statistics such as the average birth year for a particular dataset. On the other hand, we could also use it as a categorical feature and group the data by the different years of birth.

B. Quantitative features

In the previous chapter, we focused on grouping a dataset by its categorical features. We'll now describe methods for dealing with quantitative features.

Two of the most important functions to use with quantitative features are sum and mean. In the previous chapter we also introduced sum and mean functions, which were used to aggregate quantitative features for each a group.

However, while the functions from the previous chapter were applied to the output of groupby, the ones we use in this chapter are applied to individual ...