Encoding
Learn how to encode variables.
We'll cover the following...
Encoding refers to the process of converting categorical features into numerical features so that ML algorithms can use them. Categorical features can take on a limited number of values and are unordered, making them difficult for algorithms to handle. By encoding these features, we can convert them into numerical representations that can be useful for ML algorithms.
It’s common in ML to have categorical features—such as “Sex,” “Zip code,” and “Profession”—that need to be transformed before they can be ingested by an ML algorithm. The table below features this type of categorical data:
Name | Sex | Zip code | Profession |
John Smith | Male | 12345 | Engineer |
Amy Johnson | Female | 67890 | Teacher |
Michael Davis | Male | 54321 | Doctor |
Sarah Miller | Female | 98765 | Accountant |
The scikit-learn library provides several tools for encoding features, including LabelEncoder
, OneHotEncoder
, and OrdinalEncoder
.
The LabelEncoder
method
The LabelEncoder
method assigns integer values to each category, starting from