Introduction to Hyperparameters
Explore the concept of hyperparameters in machine learning and how they differ from trainable parameters. Understand how various hyperparameters such as learning rate, number of trees, and regularization affect model performance. Gain insight into specific hyperparameters for algorithms like random forest and linear regression, and learn why tuning them is essential for optimizing model accuracy.
Introduction to hyperparameters
An ML model has two types of parameters: hyperparameters and trainable parameters after training the model.
Parameters are values that are learned by the ML model during training. Examples of parameters include the coefficients in a linear regression model or the decision tree split points in a decision tree. During the training process, these parameters are adjusted iteratively until the ML model’s performance is optimized and the error between predicted output and actual output is minimal.
Hyperparameters are different parameter values that are set before starting the training process for an ML model. The main function of the hyperparameters is to control the learning process. They have a significant effect on the performance of ML models.
Examples of hyperparameters
Some examples of hyperparameters in ML algorithms include:
Regularization strength: This controls the amount of regularization applied to the ML model, which helps prevent overfitting.
Number of trees in a random forest: A larger number of trees can lead to better ML model performance. It can also increase the risk of overfitting due to the depth of the trees.
Number of layers and units in a neural network: These control the complexity of the model and can impact the ability of the model to fit the data.
Learning rate: This controls the step size at which the optimizer can make updates to the model parameters during training. A smaller learning rate might lead to more accurate models, but it will take a longer time to train.
Loss: This is the function that is used in the boosting process for the histogram-based gradient boosting classification tree.
The number of clusters: This includes the number of clusters to form, as well as the number of centroids required to generate k-means clustering.
Minimum samples: This includes the number of samples (or total weight) in a neighborhood for a point to be considered as a core point for DBSCAN clustering. The value of the minimum samples can either increase or decrease the ML model performance.
These are just a few examples of hyperparameters. The particular hyperparameters that are used during training will vary according to the type of model that is being used.
Hyperparameters for the random forest algorithm
Random forest is an ensemble learning algorithm that combines multiple decision trees to form a forest of trees. The random forest has different hyperparameter values used during the training process, as illustrated in the image below. It’s recommended to use the random forest algorithm from the scikit-learn Python library.
The following is a list of popular hyperparameters that are used to control the learning process for the random forest algorithm for classification problems:
n_estimators: This is the number of trees in the forest, for example, 50 trees.criterion: This is the function to measure the quality of a split.max_depth: This is the maximum depth of the tree.min_samples_split: This is the minimum number of samples required to split an internal node.min_samples_leaf: This is the minimum number of samples required to be at a leaf node.max_features: This is the number of features to consider when looking for the best split.bootstrap: This determines whether bootstrap samples are used when building trees.class_weight: There are the weights associated with classes in the form{class_label: weight}.max_samples: This is the number of samples needed to draw from X to train each base estimator.
Hyperparameters for the linear regression algorithm
Linear regression is a statistical method that is used to model the relationship between a dependent variable and one or more independent variables. It is based on the linear equation:
where,
is the dependent variable. is the slope of the line. is the independent variable. is the y-intercept.
The goal of linear regression is to find the best line of fit that minimizes the differences between the observed values and the predicted values, as shown in the image below.
The following is a list of popular hyperparameters that are used to control the learning process for the linear regression algorithm using the LinearRegression() function from the scikit-learn library.
fit_intercept: This determines whether to calculate the intercept for this model.copy_X: If this isTrue, then the features (X) will be copied.n_jobs: This denotes the number of jobs to use for the computation.positive: If this is set toTrue, it forces the coefficients to be positive.
The majority of ML algorithms utilize default hyperparameter values to control the learning process when training on the dataset. When the hyperparameter values are modified, the performance of the machine learning model can increase or decrease based on the combination of hyperparameters selected.