Labeled Data

Separate the PCA components of a dataset by class.

We'll cover the following...

Chapter Goals:

  • Learn about labeled datasets
  • Separate principle component data by class label

A. Class labels

A big part of data science is classifying observations in a dataset into separate categories, or classes. A popular use case of data classification is in separating a dataset into "good" and "bad" categories. For example, we can classify a dataset of breast tumors as either malignant or benign.

The code below separates a breast cancer dataset into malignant and benign categories. The load_breast_cancer function is part of the scikit-learn library, and its ...