Nonparametric Regression
Learn about the nonparametric regression techniques: Nearest neighbor regression and kernel regression.
We'll cover the following...
We worked on parametric approaches in the previous lesson. This lesson is dedicated to the nonparametric approach for regression.
Using the appropriate value of parameters, we can fit the data. But what if fitting the data is not possible with one model?
See the example below.
We can think of fitting a simple regression line on this data. It will look similar to this.
However, this is not ideal. Maybe we can try high order polynomials. See the example of fit below.
It is better than the previous solution, but it still does not capture the best model. What if we can fit the data locally (small points) instead of trying globally?
This is a good and reasonable fit. The only limitation is that we need a good amount of data to fit these local models.
Nearest-neighbor regression
In this technique, we predict the value based on past nearest data. We find the most similar K values to the new value and take their observations to predict the value by a simple average or weighted average. Consider the below example.
Assume that all the blue points are training data and we have to predict a green point. We can use the nearest-neighbor regression to predict the value. Consider K=1. So, we predict the value of the nearest target value of its neighbor.
This is called 1-nearest neighbor regression.
Instead of 1-nearest neighbor, K neighbors are considered. This will give a more generalized prediction and safety from outliers. We can also provide weights on neighbors to get weighted nearest neighbors.
Distance metrics
We choose the neighbor based on the minimum distance from the new point. We use different types of metrics to get this distance. A few examples are correlation-based, rank-based, cosine similarity, Euclidean, etc.
Quiz: Value of K in KNN
What are the advantages of K-nearest neighbors (K>1) compared to 1-nearest neighbor?
Suitable for noisy data
No discontinuity
Faster prediction
Quiz: Complexity of KNN
If we keep increasing data, what would the model complexity of the K-nearest neighbor be?
Increases with data increment
Decreases with data increment
Remains constant with data increment
Interview question:
Interview question:
What distance matrices can be used in K-nearest neighbor regression?
Kernel regression
While we can weigh neighbors in the K-nearest neighbor regression. In kernel regression, we weigh all the data points in the dataset.
...