Explanation
Line 1–2: Firstly, we import the necessary modules and functions. The xgb
module and the function load_iris
from scikit-learn’s datasets
module to load the famous Iris dataset.
Line 3–4: Next, we import the train_test_split
function from scikit-learn’s model_selection
module to split the dataset into training and test sets, and the accuracy_score
and classification_report
functions from scikit-learn’s metrics
module to evaluate the model’s performance.
Line 7: Now, we load the Iris dataset using load_iris()
and storing it in the data
variable.
Line 8: We separate the features X
and target labels y
from the loaded dataset in this line.
Line 11: Here, we split the data into training and test sets using train_test_split
. It takes the features X
and target labels y
as input and splits them. The test set size is 0.2
, which makes 20% of the whole dataset, and the random state is 42 to provide consistency.
Line 14: We create an XGBoost classifier using the XGBClassifier
class with default hyperparameters.
Line 17: We train the XGBoost classifier on the training data X_train
, y_train
using the fit
method.
Line 20: Next, we predict target labels on the test set X_test
using our trained model and the predict
method.
Line 23: Moving on, we calculate the model’s accuracy by comparing the predicted target labels predictions
with the true target labels from the test set y_test
.
Line 25–27: Finally, we print the model’s accuracy on the test set and the classification report, which contains precision, recall, F1-score, and support for each class in the Iris dataset. Instead of numerical numbers, the target names are given to show the class labels or species names.
Output
Upon execution, the code will show the model’s accuracy on the test set and the detailed classification report with precision, recall, F1-score, and support for each class.
The output shows that the model achieved an accuracy of 100%, meaning it correctly classified all samples. The precision, recall, and F1-score are also perfect, i.e., 1.00 for each class, indicating that the model predicted each class without any mistakes. This result shows that the model performed exceptionally well on this dataset.
Conclusion
To conclude, XGBoost is a powerful library for machine learning tasks, especially classification. It offers high-performance and regularization strategies that make it suitable for various applications. Using XGBoost’s capabilities, we obtained 100% or 1.0 accuracy in classifying Iris flowers into their respective species. XGBoost’s versatility and efficiency are potent tools for various real-world classification problems.
If you’re curious to learn more about how XGBoost is used in machine learning, check out these helpful resources: