Fundamentals of Machine Learning: A Pythonic Introduction/

...

Feature Space

Learn feature engineering techniques by implementing feature space exploration, subspace analysis, and feature transformation.

We'll cover the following...

Feature space
- Subspace example
- Feature transformations

In both supervised learning and clustering, the term “data point” has been utilized. Specifically, each data point $\bold x$ within the training dataset is represented as a $d$ -dimensional vector in $\R^d$ such that $\bold x \in \R^d$ . The elements of $\bold x$ are referred to as features. Therefore, $\bold x$ is also referred to as a feature vector.

Feature space

A feature space is a mathematical space that represents the features or attributes of a given dataset. Each observation in the dataset is represented by a vector in the feature space, where each dimension of the vector corresponds to a specific feature or attribute of the observation.

For example, let’s say we have a dataset of cars, and each car is described by its make, model, year, horsepower, and fuel efficiency. The feature space for this dataset would be a five-dimensional space, where each dimension corresponds to one of these features. By analyzing the patterns and relationships among the feature vectors in the feature space, we can gain insights into the underlying structure and characteristics of the dataset.

Subspace example

Consider a dataset of cars $D={(\bold x_1, y_1), (\bold x_2, y_2), \dots, (\bold x_n, y_n)}$ , where each $\bold x_i \in \R^d$ is a vector of features that describe a car, such as its make, model, year, horsepower, and fuel efficiency. Let’s say we want to define a subspace that only includes cars with horsepower greater than 250. We can define a subspace that only contains vectors where the horsepower feature is greater than 250 with the following code:

Press + to interact

Python 3.10.4

import numpy as np
# Example feature vectors for cars dataset
# [Make, Model, Year, Horsepower, Fuel efficiency (mpg)]
car1 = np.array(['Toyota', 'Corolla', 2015, 132, 29])
car2 = np.array(['Honda', 'Accord', 2020, 252, 33])
car3 = np.array(['Tesla', 'Model S', 2018, 518, 98])
car4 = np.array(['Ford', 'Mustang', 2010, 315, 22])
car5 = np.array(['Chevrolet', 'Impala', 2012, 300, 23])
# Create array of feature vectors
cars = np.array([car1, car2, car3, car4, car5])
# Define subspace of cars with horsepower greater than 250
horsepower_subspace = cars[cars[:, 3].astype(int) > 250, :]
# Print subspace
print("Subspace of cars with horsepower greater than 250:\n")
print(horsepower_subspace)

Here is the explanation for the code above:

Lines 5–9: Here, we generate a random dataset of car features.
Line 12: We create a vector where each row represents a car’s features.
Line 15: We define a subspace of cars with horsepower greater than 250 by filtering the rows of the vector where the horsepower feature is greater than 250.
Lines 18–19 : We print the subspace.

Note: In real-world datasets, feature space is a subspace, but its dimension refers to the number of components in the feature vector, say $d$ . Therefore, we can think of it as a $d$ -dimensional vector space that contains feature vectors.

Feature transformations

Consider the regression dataset with $d=1$ ...

Course Overview

Supervised Learning

Detect Cyber Intrusion Using Machine Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

How to Predict the Traffic Volume Using Machine Learning

Feature Space

Feature space

Subspace example

Feature transformations