How to implement logistic regression using the Scikit learn kit

Logistic regression is a supervised classification algorithm. We’ve discussed what logistic regression is here. Now we will implement logistic regression using the Scikit learn toolkit.

We’ll use the wine dataset to train on the logistic regression model from scikit learn. We split the data into train and test (80-20 split) to make sure the classification algorithm is able to generalize well to unseen data.

Importing the necessary libraries

We import the dataset from sklearn’s provided dataset. We will use the sklearn train test split function to split the data into train and test samples. For evaluation, we use sklearn’s provided confusion matrix and accuracy functions. Finally, we import the LogisticRegression from the sklearn library, as shown below:

import numpy as np
from sklearn.datasets import load_wine 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression 
from sklearn.metrics import confusion_matrix, accuracy_score 

Loading the dataset

We load the dataset into a local variable, and we call it dataset:

dataset = load_wine()

Train test split

We split the data into test and train using the sklearn library function imported above. We use an 80-20 split, where 80% of the data is train and 20% is test. x-train and y_train contain the training data and labels respectively, while x_test and y_test contain the testing data and labels.

x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.20, random_state=15)

Making the logistic regression model

We make a logistic regression model and call it logistic_model, as shown below:

logistic_model = LogisticRegression()

Training the model

We train the model on the training data and the training labels.

logistic_model.fit(x_train, y_train)

Predicting the labels of the test data

The model uses the trained parameters learnt from the training data to predict the labels of the test data.

y_pred = logistic_model.predict(x_test)

Evaluating scores

We use the accuracy function and predicted labels to find the accuracy of the model. We multiply by 100 to get accuracy out of 100.

Similarly, we use the predicted labels to find the confusion matrix.

accuracy = accuracy_score(y_test,y_pred)*100

confusion_mat = confusion_matrix(y_test,y_pred)

Printing the results

print("Accuracy is",accuracy)
print("Confusion Matrix")
print(confusion_mat)

Code playground

#Importing the necessary libraries
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
# Importing the dataset from the sklearn library into a local variable called dataset
dataset = load_wine()
# Splitting the data test into train 80% and test 20%.
# x_train, y_train are training data and labels respectively
# x_test, y_test are testing data and labels respectively
x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.20, random_state=15)
# Making the logistic regression model
logistic_model = LogisticRegression()
# Training the model on the training data and labels
logistic_model.fit(x_train, y_train)
# Using the model to predict the labels of the test data
y_pred = logistic_model.predict(x_test)
# Evaluating the accuracy of the model using the sklearn functions
accuracy = accuracy_score(y_test,y_pred)*100
confusion_mat = confusion_matrix(y_test,y_pred)
# Printing the results
print("Accuracy is",accuracy)
print("Confusion Matrix")
print(confusion_mat)

The logistic regression model defined above gives 94% accuracy on the wine dataset. The confusion matrix analysis shows that the model is performing well.

Free Resources