Gradient descent works well on objective functions that are convex as well as differentiable at all points. However, how does it work in cases where the objective function is not differentiable at one or more points? Let’s understand this with the example of lasso regression.
Lasso regression
Lasso regression or L1-regularized linear regression is a type of linear regression problem where model coefficients are constrained to near zero using the L1 penalty. Consider the scenario where we want to predict the price of a house based on its size, location, number of rooms, and other features. Using lasso regression, we can identify the most important features affecting the price and discard the irrelevant ones.
Assume X∈RN×d denotes the set of d-dimensional input features (size, location, number of rooms, etc.) for N houses and their corresponding true labels (prices) Y∈RN, the objective of lasso regression is given as follows: