Effect of Regularization

Learn about the effects of regularization on standardized data.

Let's learn the effect of regularization using a wine quality dataset. One of the important aspects, and a reason to work with this data, is the presence of multicollinearity in it.

Overview

The wine quality dataset is a multiclass classification problem since our target column has classes of wine represented by a class number. However, we are using this dataset as a linear regression problem for learning purposes. Multicollinearity can lead to a variety of problems, including:

  • The effect of predictor variables (features) estimated by our regression will depend on what other variables are included in the model.

  • Predictors can have wildly different effects depending on the observations, and small changes can result in very different estimated effects.

  • With very high multicollinearity, the computer-calculated inverse matrix may not be accurate.

  • We can no longer interpret a coefficient on a variable as the effect on the target of a one-unit increase in that variable, holding the other variables constant. This is because when predictors are strongly correlated, there is no scenario in which one variable can change without a conditional change in another variable.

The ridge is best suited to deal with multicollinearity. Lasso also deals with multicollinearity between variables, but in a more brutal way (it zeroes out the less effective variable). The lasso is particularly useful when there are redundant or unimportant features in the data. Suppose, we have 1,000 features in a dataset. Lasso can perform feature selection automatically by forcing coefficients of the least important features to zero.

Read data

Let's read the data file in a data frame.

Get hands-on with 1400+ tech skills courses.