PCA Implementation Steps: 1 to 3
This lesson will introduce you to the pre-model algorithm and familiarize you with the implementation steps (1-3) of the principal component analysis.
Quick overview of the pre-model algorithm
As an extension of the data scrubbing process, unsupervised learning algorithms are sometimes used to advance a supervised learning algorithm to prepare the data for prediction modeling. In this way, unsupervised algorithms are used to clean or reshape the data rather than derive actionable insight.
Examples of pre-model algorithms include dimension reduction techniques, as introduced in the previous chapter, and k-means clustering. Both of these algorithms are examined in this chapter.
PCA
One of the most popular dimension reduction techniques is
The practical goal of PCA is to find a low-dimensional representation of the dataset that preserves as much about the original variables as possible. Rather than removing individual features from the dataset, PCA recreates dimensions as a linear combination of features called components. It then ranks components that contribute most to patterns in the data, allowing you to drop components that have the least impact on data variability.
In practice, the initial aim of PCA is to place the first axis in the direction of the greatest variance of the data points and maximize the variance depicted along that axis. A second axis is then placed perpendicular (on a 90-degree angle) to the first axis to form an orthogonal line, which creates the first two components.
In a two-dimensional setting, the location of the second axis is fixed according to the position of the first axis. In a three-dimensional space, where there are more options to place the second axis perpendicular to the first axis, the aim is to position it in a way that maximizes the variance on its axis. An example of PCA in a two-dimensional space is demonstrated in the following figure.
Get hands-on with 1400+ tech skills courses.