Encoding

Learn how to encode variables.

Encoding refers to the process of converting categorical features into numerical features so that ML algorithms can use them. Categorical features can take on a limited number of values and are unordered, making them difficult for algorithms to handle. By encoding these features, we can convert them into numerical representations that can be useful for ML algorithms.

It’s common in ML to have categorical features—such as “Sex,” “Zip code,” and “Profession”—that need to be transformed before they can be ingested by an ML algorithm. The table below features this type of categorical data:

Name

Sex

Zip code

Profession

John Smith

Male

12345

Engineer

Amy Johnson

Female

67890

Teacher

Michael Davis

Male

54321

Doctor

Sarah Miller

Female

98765

Accountant

The scikit-learn library provides several tools for encoding features, including LabelEncoder, OneHotEncoder, and OrdinalEncoder.

The LabelEncoder method

The LabelEncoder method assigns integer values to each category, starting from 00. For example, it would convert “male” to 00 and “female” to 11 ...