One-hot encoding in Python

Most of the existing machine learning algorithms cannot be executed on categorical data. Instead, the categorical data needs to first be converted to numerical data. One-hot encoding is one of the techniques used to perform this conversion. This method is mostly used when deep learning techniques are to be applied to​ sequential classification problems.

One-hot encoding is essentially the representation of categorical variables as binary vectors. These categorical values are first mapped to integer values. Each integer value is then represented as a binary vector that is all 0s (except the index of the integer which is marked as 1).

svg viewer

Manual one-hot encoding

Have a look at the example below​ which manually converts the categorical list of colors to a numerical list using one-hot encoding:

import numpy as np
### Categorical data to be converted to numeric data
colors = ["red", "green", "yellow", "red", "blue"]
### Universal list of colors
total_colors = ["red", "green", "blue", "black", "yellow"]
### map each color to an integer
mapping = {}
for x in range(len(total_colors)):
mapping[total_colors[x]] = x
one_hot_encode = []
for c in colors:
arr = list(np.zeros(len(total_colors), dtype = int))
arr[mapping[c]] = 1
one_hot_encode.append(arr)
print(one_hot_encode)

One-hot encoding using scikit-learn

Take a look at the example below. It uses the scikit-learn library to perform one-hot encoding:

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
### Categorical data to be converted to numeric data
colors = (["red", "green", "yellow", "red", "blue"])
### integer mapping using LabelEncoder
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(colors)
print(integer_encoded)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
### One hot encoding
onehot_encoder = OneHotEncoder(sparse=False)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
print(onehot_encoded)
Copyright ©2024 Educative, Inc. All rights reserved