Logistic regression is a supervised classification algorithm. We’ve discussed what logistic regression is here. Now we will implement logistic regression using the Scikit learn toolkit.
We’ll use the wine dataset to train on the logistic regression model from scikit learn. We split the data into train and test (80-20 split) to make sure the classification algorithm is able to generalize well to unseen data.
We import the dataset from sklearn’s provided dataset. We will use the sklearn train test split
function to split the data into train and test samples. For evaluation, we use sklearn’s provided confusion matrix and accuracy functions. Finally, we import the LogisticRegression
from the sklearn library, as shown below:
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
We load the dataset
into a local variable, and we call it dataset:
dataset = load_wine()
We split the data into test and train using the sklearn library function imported above. We use an 80-20 split, where 80% of the data is train and 20% is test. x-train
and y_train
contain the training data and labels respectively, while x_test
and y_test
contain the testing data and labels.
x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.20, random_state=15)
We make a logistic regression model and call it logistic_model
, as shown below:
logistic_model = LogisticRegression()
We train the model on the training data and the training labels.
logistic_model.fit(x_train, y_train)
The model uses the trained parameters learnt from the training data to predict
the labels of the test data.
y_pred = logistic_model.predict(x_test)
We use the accuracy
function and predicted labels to find the accuracy of the model. We multiply by 100 to get accuracy out of 100.
Similarly, we use the predicted labels to find the confusion matrix
.
accuracy = accuracy_score(y_test,y_pred)*100
confusion_mat = confusion_matrix(y_test,y_pred)
print("Accuracy is",accuracy)
print("Confusion Matrix")
print(confusion_mat)
#Importing the necessary librariesimport numpy as npfrom sklearn.datasets import load_winefrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import confusion_matrix, accuracy_score# Importing the dataset from the sklearn library into a local variable called datasetdataset = load_wine()# Splitting the data test into train 80% and test 20%.# x_train, y_train are training data and labels respectively# x_test, y_test are testing data and labels respectivelyx_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.20, random_state=15)# Making the logistic regression modellogistic_model = LogisticRegression()# Training the model on the training data and labelslogistic_model.fit(x_train, y_train)# Using the model to predict the labels of the test datay_pred = logistic_model.predict(x_test)# Evaluating the accuracy of the model using the sklearn functionsaccuracy = accuracy_score(y_test,y_pred)*100confusion_mat = confusion_matrix(y_test,y_pred)# Printing the resultsprint("Accuracy is",accuracy)print("Confusion Matrix")print(confusion_mat)
The logistic regression model defined above gives 94% accuracy on the wine dataset. The confusion matrix
analysis shows that the model is performing well.