Missing Data
Explore how XGBoost handles missing data by routing samples with missing values through optimal paths during splits, avoiding data removal. Learn practical techniques to save and load Python variables using the pickle package, facilitating model persistence and workflow management.
We'll cover the following...
XGBoost’s approach to handling missing data
As a final note on the use of both XGBoost and SHAP, one valuable trait of both packages is their ability to handle missing values. Recall that in the chapter “Data Exploration and Cleaning,” we found that some samples in the case study data had missing values for the PAY_1 feature. So far, our approach has been to simply remove these samples from the dataset when building models. This is because, without specifically addressing the missing values in some way, the machine learning models implemented by scikit-learn cannot work with the ...