...

Nonparametric Regression

Learn about the nonparametric regression techniques: Nearest neighbor regression and kernel regression.

We'll cover the following...

We worked on parametric approaches in the previous lesson. This lesson is dedicated to the nonparametric approach for regression.

Using the appropriate value of parameters, we can fit the data. But what if fitting the data is not possible with one model?

See the example below.

We can think of fitting a simple regression line on this data. It will look similar to this.

However, this is not ideal. Maybe we can try high order polynomials. See the example of fit below.

It is better than the previous solution, but it still does not capture the best model. What if we can fit the data locally (small points) instead of trying globally?

This is a good and reasonable fit. The only limitation is that we need a good amount of data to fit these local models.

Nearest-neighbor regression

In this technique, we predict the value based on past nearest data. We find the most similar K values to the new value and take their observations to predict the value by a simple average or weighted average. Consider the below example.

Assume that all the blue points are training data and we have to predict a green point. We can use the nearest-neighbor regression to predict the value. Consider K=1. So, we predict the value of the nearest target value of its neighbor.

This is called 1-nearest neighbor regression.

Instead of 1-nearest neighbor, K neighbors are considered. This will give a more generalized prediction and safety from outliers. We can also provide weights on neighbors to get weighted nearest neighbors.

Distance metrics

We choose the neighbor based on the minimum distance from the new point. We use different types of metrics to get this distance. A few examples are correlation-based, rank-based, cosine similarity, Euclidean, etc.

Quiz: Value of K in KNN

What are the advantages of K-nearest neighbors (K>1) compared to 1-nearest neighbor?

Suitable for noisy data

No discontinuity

Faster prediction

Quiz: Complexity of KNN

If we keep increasing data, what would the model complexity of the K-nearest neighbor be?

Increases with data increment

Decreases with data increment

Remains constant with data increment

Interview question:

What distance matrices can be used in K-nearest neighbor regression?

Show Answer

Q1 / Q2

Kernel regression

While we can weigh neighbors in the K-nearest neighbor regression. In kernel regression, we weigh all the data points in the dataset.

\hat{y} = \frac{\sum_{i = 1}^{i = N} C e i y_{i}}{\sum_{i = 1}^{i = N} C e i} = \frac{\sum_{i = 1}^{i = N} K e r n e l (d i s t a n c e (x_{i} x_{e}) * y_{i}}{\sum i}

Are You Ready to Become a Data Scientist?

Python Basics

Python Libraries

More Data Science Tools

Data Structures and Algorithms - I

Data Structures and Algorithms - II

Statistics and Probability

Feature Engineering

Basics of Machine Learning

Regression

Classification

Unsupervised Learning

Advanced Topics in Machine Learning

Conclusion

Nonparametric Regression

Nearest-neighbor regression

Interview question:

Kernel regression