Features
Learn about the different feature types that can be part of a dataset.
We'll cover the following...
Chapter Goals:
- Understand the difference between quantitative and categorical features
- Learn methods to manipulate features and add them to a DataFrame
- Write code to add MLB statistics to a DataFrame
A. Quantitative vs. categorical
We often refer to the columns of a DataFrame as the features of the dataset that it represents. These features can be quantitative or categorical.
A quantitative feature, e.g. height or weight, is a feature that can be measured numerically. These are features we could calculate the sum, mean, or other numerical metrics for.
A categorical feature, e.g. gender or birthplace, is one where the values are categories that could be used to group the dataset. These are the features we would use with the groupby
function from the previous chapter.
Some features can be both quantitative or categorical, depending on the context they are used. For example, we could use year of birth as a quantitative feature if we were trying to find statistics such as the average birth year for a particular dataset. On the other hand, we could also use it as a categorical feature and group the data by the different years of birth.
B. Quantitative features
In the previous chapter, we focused on grouping a dataset by its categorical features. We'll now describe methods for dealing with quantitative features.
Two of the most important functions to use with quantitative features are sum
and mean
. In the previous chapter we also introduced sum
and mean
functions, which were used to aggregate quantitative features for each a group.
However, while the functions from the previous chapter were applied to the output of groupby
, the ones we use in this chapter are applied to individual ...