How to implement a LightGBM classifier in Python
pip install lightgbm
We can also use the conda command to install the lightgbm module in Python:
conda install -c conda-forge lightgbm
Implementing a LightGBM classifier
Here are the steps to implement a LightGBM classifier.
Import the libraries
The first step is to import the required libraries to use the functionality provided by these libraries.
from sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_splitfrom sklearn import metricsfrom sklearn.metrics import accuracy_score, classification_reportimport lightgbm as lgb
Load the dataset
The next step is to load the dataset. We’ll use the breast cancer dataset provided by the sklearn library.
data = load_breast_cancer()X = data.datay = data.target
Understand the parameters
The LGBMClassifier class constructor takes in several parameters. There are four necessary parameters, along with several optional parameters that can be used for further customization.
LGBMClassifier Parameters
Argument | Description |
| Specifies the number of boosting rounds or iterations |
| Sets the maximum number of leaves in one tree (It’s important for controlling the complexity of the model and avoiding overfitting) |
| Defines the increment size during each iteration as it converges towards minimizing the loss function |
| Specifies the learning task and the corresponding objective function |
Train the model
Now, we’ll use LGBMClassifier to fit the dataset for training the model. We perform a train-test split on the dataset (X and y) with a test size of 20%. Then, we initialize a LGBMClassifier model with specified hyperparameters such as 100 estimators, a maximum depth of 6, a learning rate of 0.1, and a binary classification objective. Finally, the model is trained on the training data (X_train, y_train).
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)model = lgb.LGBMClassifier(n_estimators=100, max_depth=6, learning_rate=0.1, objective='binary')model.fit(X_train, y_train)
Make a prediction
Now, we’ll use our trained classifier to make a prediction using X_test.
y_pred = model.predict(X_test)
Evaluate the model
Finally, let’s evaluate the performance of our classifier.
accuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))report = classification_report(y_test, y_pred)print("Classification Report:\n", report)
Example
The following code shows how we can use a LightGBM classifier in Python:
from sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_splitfrom sklearn import metricsfrom sklearn.metrics import accuracy_score, classification_reportimport lightgbm as lgb# Load the breast cancer datasetdata = load_breast_cancer()# Extract the features (X) and target (y)X = data.datay = data.target# Splitting the datasetX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)# Training the modelmodel = lgb.LGBMClassifier(n_estimators=100, max_depth=6, learning_rate=0.1, objective='binary', verbosity=-1)# Fit the model on the training datamodel.fit(X_train, y_train)# Make predictions on the test datay_pred = model.predict(X_test)# Calculate and print the accuracy of the modelaccuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))# Print the classification report of the modelreport = classification_report(y_test, y_pred)print("Classification Report:\n", report)
Code explanation:
Line 8: We load the breast cancer dataset from
sklearnand store it in thedatavariable.Lines 11–12: We extract the feature matrix
Xand the target vectoryfrom the loaded dataset.Xcontains the input data, andycontains the binary classification labels.Line 15: We split the dataset into training (
X_trainandy_train) and testing (X_testandy_test) sets using thetrain_test_split()function. Here, 20% of the data is reserved for testing, and 80% is used for training.Line 18: We create an instance of the
LGBMClassifierclass with specified parameters.Line 21: We train the model on the training data using the
fit()method.Line 24: The trained model is used to make predictions on the test data.
Lines 27–28: We calculate the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.
Lines 31–32: We generate and print the classification report for the model.
Free Resources