How to use CatBoostClassifier in Python

Key takeaways:
CatBoost is a machine learning library that excels at handling categorical data using techniques like ordered boosting and gradient-based optimization.
It supports both regression and classification tasks efficiently.
Installation is simple using pip or conda commands.
The process involves importing necessary libraries, loading a dataset, and understanding key parameters like iterations, depth, and learning_rate.
Categorical features are handled through the cat_features parameter.
Train the model using CatBoostClassifier.fit() and make predictions with predict().
Model evaluation is done by calculating accuracy and generating a classification report.
CatBoost provides high performance and flexibility for working with diverse datasets.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score, classification_report
from catboost import CatBoostClassifier
# Load the breast cancer dataset
data = load_breast_cancer()
# Extract the features (X) and target (y)
X = data.data
y = data.target
# Splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
# Training the model
model = CatBoostClassifier(iterations=100, depth=6, learning_rate=0.1, loss_function='Logloss', verbose=False)
# Fit the model on the training data
model.fit(X_train, y_train)
# Make predictions on the test data
y_pred = model.predict(X_test)
# Calculate and print the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))
# Print the classification report of the model
report = classification_report(y_test, y_pred)
print("Classification Report:\n", report)

Explanation

The code above is explained in detail below:

Lines 1–5: We import the required libraries.
Line 8: We load the breast cancer dataset from sklearn and store it in the data variable.
Lines 11–12: We extract the feature matrix X and the target vector y from the loaded dataset. X contains the input data, and y contains the binary classification labels.
Line 15: The dataset is split into training (X_train and y_train) and testing (X_test and y_test) sets using the train_test_split function. Here, 20% of the data is reserved for testing, and 80% is used for training.
Line 18: An instance of the CatBoostClassifier is created with specified parameters.
Line 21: The model is trained on the training data using the fit method.
Line 24: The trained model is used to make predictions on the test data.
Lines 27–28: We calculate the accuracy of the model’s predictions by comparing them with the true labels in the test set. The accuracy is printed as a percentage.
Lines 31–32: We generate and print the classification report for the model.

Conclusion

To sum up, CatBoost stands out as a powerful machine learning library. With its unique ordered boosting and gradient-based optimization techniques, it is quite good at managing categorical data. Its ability to deliver high performance in both regression and classification tasks across diverse datasets makes it a great tool.

Frequently asked questions

Haven’t found what you were looking for? Contact Us

How to use CatBoost classifier?

Install CatBoost: Use pip install catboost or conda install catboost.
Import libraries: Import CatBoostClassifier from catboost and other required libraries like sklearn.
Load dataset: Use any dataset (e.g., breast cancer dataset from sklearn).
Split data: Divide data into training and testing sets using train_test_split.
Create model: Initialize CatBoostClassifier with parameters like iterations, depth, and learning_rate.
Train model: Fit the model using fit() on the training dataset.
Make predictions: Use predict() on the test data.
Evaluate: Calculate accuracy and generate a classification report.

How to import CatBoost in Python?

from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Argument	Description
`iterations`	The number of boosting trees to build.
`depth`	The maximum depth of the trees in the ensemble.
`learning_rate`	The step size used for gradient descent during training.
`loss_function`	The loss function to optimize during training, such as log loss for binary classification.
`cat_features`	A list of indices or names of categorical features in the dataset. It is only used if there are categorical features in the dataset.

How to use CatBoostClassifier in Python

Installation

Import the libraries

Load the dataset

Understand the parameters

Train the model

Make a prediction

Evaluate the model

Example

Explanation

Conclusion

Frequently asked questions

How to use CatBoost classifier?

How to import CatBoost in Python?