...

/

Dimensionality Reduction with PCA

Dimensionality Reduction with PCA

Learn how to use principal component analysis.

Principal component analysis (PCA)

Redundant information can skew the model outcome if we have a dataset with highly correlated features. This is known as the multicollinearity problem. Using Principal Component Analysis (PCA), we can reduce the number of attributes without losing the original information.

PCA is a data transformation technique that combines existing features into new components to maximize data variance. PCA also makes these components independent of each other (minimizing correlation) and ranks them based on their contribution factor. Later, we can select a subset of transformed features (components) that represent most of the data variance.

Let’s assume we have a dataset with two features (feature 1 and feature 2). PCA tries to fit these two features and calculates the first component in such a way that the variance is maximum and the sum of squared errors is the minimum. To ...