Techniques for model selection in machine learning

Model selection is a crucial stage in machine learning that focuses on choosing the best model and algorithm for a certain task. It is essential to acquire a model that provides precise and accurate results with good performance and fits well with our requirements. This ensures we get the expected outputs and correctly use the dataset to serve the real purpose.

Multiple techniques are used during the model selection process to obtain the best possible observation at each step and reach the best-suited model with low chances of inaccuracy. We can categorize these techniques into model selection phase categories.

Note: The model on which the grid search is applied is also known estimator.

Once these combinations are identified, the models are trained and then tested for each combination. The results of all combinations, based on their performance, are compared to select the ideal hyperparameter settings to optimize the model’s performance.

Random search

It is a hyperparameter tuning technique that uses a fixed number of parameters from the specified distribution. This sampling has two types:

With replacement if any parameter(s) is given as a distribution
Without replacement if all parameters are presented as a list

We have 8 sample data points from the true objective function. Present them on a graph plot.
Build a surrogate model(the probability representation of the objective function) to get an estimated idea of how the true objective function can be and mark the deviations.
Build an acquisition function to decide on the 9th parameter and identify the point where it is maximized and use that to mark the 9th parameter in the surrogate model.
Keep repeating the steps until the true objective function is obtained.

Note: The number of folds are specified e.g. in this case k = 5, where k represents folds.

This lower the variance in the evaluation, which helps to achieve more accuracy while analyzing the model's performance. Consequently, maximizing the learning and the validity of the test results that are difficult to achieve. Cross-validation can be computationally intensive because we are training and testing continuously for several subsets, but it helps to reduce the risks of overfitting and underfitting.

Train-test split

In this technique, we split the dataset into two sub-datasets, i.e., train and test. The purpose of splitting datasets is to check if the model performs well on the data that is not trained.

Model performance comparison

In this technique, we compare the performances of the overall algorithm of the models based on the parameters involved. Let's briefly discuss a few development-based parameters that are compared to select a model that provides efficient production and has a longer lifetime.

Accuracy

It can be defined as the number of test cases that are correctly classified divided by the total test cases. It can be applied to generic problems that have a balanced dataset.

$Accuracy = (TP + TN) / (TP + TN + FP + FN)$

However, if the dataset is imbalanced, for example, the ratio of a fault occurrence and no-fault occurrence is 1: 99, then the accuracy will be false, showing a 99% and ignoring that 1%.

Precision

It is a measure of the correctness of the classified dataset. Considering the positive cases, we can say that it is the ratio of the correctly classified positive cases to the total classified positive cases.

$Precision = TP / (TP + FP)$

The greater the fraction, the higher the precision and, consequently, the higher probability of correct classification. The model which has a good probability of correctly classifying the cases is considered a good model.

F1 Score

It can be defined as the harmonic mean of precision and recall as it is used to balance the strength of each in cases where precision and recall are available to drive conclusions.

$F1 Score = 2 * ((precision * recall) / (precision + recall))$

For example, in repairing crucial medical equipment, precision will help to save on the company’s cost by identifying exact repairing points, and recall will help to ensure that the machinery is stable and not a threat to human lives.

AUC-ROC

It can be defined as the rate of correctly classified positive cases against incorrectly classified positive cases. We plot a ROC curve to present the relation. The area under the obtained graph can determine the model's performance.

Summary

Various techniques help select a model, such as grid searches, random searches, Bayesian optimization, cross-validation, train-test split, and model performance comparison. Using these techniques effectively makes it possible to thoroughly examine the model, tune hyperparameters, and compare the obtained results of different potential models to get the best fit.

Test your understanding

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments