...

/

Options for Logistic Regression in scikit-learn

Options for Logistic Regression in scikit-learn

Learn about the hyperparameters of the logistic regression function in scikit-learn and their possible values.

Hyperparameters of logistic regression

We have used and discussed most of the options that you may supply to scikit-learn when instantiating or tuning the hyperparameters of a LogisticRegression model class. Here, we list them all and provide some general advice on their usage:

A Complete List of Options for the Logistic Regression Model in scikit-learn

Parameters

Possible values

Notes and advice for choosing



penalty

string,

'l1',

'l2', 'elasticnet', 'none'

L1 (lasso) or L2 (ridge) regularization of coefficients. L1 performs feature selection, while L2 does not. Elastic-net is a blend of L1 and L2. The best overall model performance should be assessed by trying all options.




dual

bool,

True,

False

This has to do with the optimization algorithm used to find coefficients. The documentation says "only implemented for L2 penalty with liblinear solver. Prefer dual=False when n_samples >n_features.


tol

float (decimal number)

Determines the size of the change in values for the optimization algorithm to stop. This is one way to control how long the optimization runs for, and how close to the ideal value the solution is.


C


float

The regularization parameter for L1 or L2 penalties. Smaller values mean more regularization. This needs to be determined using a validation set, or cross-validation.

fit_intercept

Bool

Whether or not an intercept term should be estimated. Unless you are sure you don't need an intercept, it's probably best to have one.

intercept_scaling

float

Can be used to avoid regularizing the intercept, an undesirable practice, when using the liblinear solver.



class_weight

Dictionary-specify weight for each class, 'balanced' string or None

Whether or not to weight different classes during the model training process. Otherwise all samples will be considered equally important when fitting the model. Can be useful for imbalanced datasets: try using 'balanced' in this case.

random_state

int

Seed for a random number generator used by certain solver algorithms.




solver

string ('newton-cg', 'lbfgs', 'liblinear', 'sag',

'saga')

Select the type of optimization algorithm used to estimate the model parameters. See earlier discussion in this section or the documentation for the relative strengths and weaknesses of different solvers.


max_iter

int

The maximum number of iterations for the solution algorithm, which controls how close to the ideal parameters the solution is. If you get a warning that the solution algorithm did not converge, you can try increasing this.


multi_class

string ('ovr', 'multinomial', 'auto')

Various strategies for multiclass classification, beyond the scope of this course.

verbose

int

Controls the nature of the output to the terminal, during the optimization procedure.

warm_start

bool

If re-using the same model object for multiple training iterations, whether or not to use the previous solution as the starting point for the next optimization procedure.

n_jobs

int

None

Number of processors to use for parallel processing in the case of 'ovr' multiclass classification.

l1_ratio

float

A parameter controlling the relative contributions of L1 and L2 regularization when using the elastic-net penalty.

Default values of the parameters

...
Access this course and 1400+ top-rated courses and projects.