What is sklearn.datasets.load_breast_cancer in Python?

In Python machine learning programming, we have software called scikit-learn. This software contains some small datasets that are very easy to access, one of which is the load_breast_cancer dataset.

Uses

This dataset uses a machine learning algorithm to classify cancer scans as benignnon-cancerous or malignantcancerous.

Parameters

return_X_yboolean: The default value for this parameter is False.

Syntax for loading dataset

from sklearn.datasets import load_breast_cancer

Features

This is a binary classification dataset.

It has no Missing attribute or Null values.

The class distribution is as follows.

  • 212: malignant
  • 357: benign

This is a commonly used dataset. Machine learning papers have also used this dataset to address regression problems.

All the data types are numerical.

Code

Load the dataset:

from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
print(data)
print(data.keys())

After we execute the code, we get the following.

  1. data: It is mostly features in the dataset that would help classify a scan as benign or malignant. It can also be called feature data.

  2. key: All the variable data that would help us classify a scan as benign or malignant. It is mostly the key data. For example, the data classifies the scan as benign or malignant by 1 or 0.

  3. target name: Name of the target variable.

  4. feature name: All the features available in this dataset: radius, texture, compactness, concavity, concave points, perimeter, area, smoothness, etc.

  5. DESCR: Data description.

  6. filename: Data is in CSV format.

Free Resources