Gini Impurity
Learn the math used by CART classification trees to define purity vs. impurity.
We'll cover the following
Impurity intuition
Like all machine learning algorithms, CART classification trees use math to learn from data. Before looking at the calculations used by CART classification trees, it’s helpful to understand the mathematics intuitively.
To keep things simple, consider the Adult Census Income dataset. This dataset is a classification scenario with two possible label values: <=50K
and >50K
. This scenario is also known as a binary classification scenario.
CART classification trees attempt to split labels into the purest grouping possible. Purity / impurity is a spectrum, as illustrated below:
Get hands-on with 1400+ tech skills courses.