The Dataset and Exploratory Data Analysis
Learn how to do an exploratory data analysis with the breast cancer dataset.
We'll cover the following...
We have learned two models for classification: logistic regression and KNN. According to the no free lunch theorem, we must find the best model for our data.
The breast cancer dataset
Most of the time, benign tumors are not dangerous since they can’t spread throughout the body (benign brain tumors, however, can be life-threatening). They can’t invade neighboring tissue and can be removed with a low risk of growing back. However, benign tumors can have other possible adverse health effects, and through the process of tumor progression, many of their types can turn malignant (cancerous).
Breast cancer is one of the most common cancers in women. The original breast cancer dataset has 569 observations and 30 features (all numeric). The target classes are M (malignant) and B (benign) types of breast cancer, and the class distribution is 212 Malignant (represented by 0) and 357 Benign (represented by 1).
In the dataset given below, there are 10 real-valued features that are computed for each cell nucleus:
Radius: Mean of distances from the center to points on the perimeter.
Texture: Standard deviation of grayscale values.
Perimeter: Total length of a shape’s boundary.
Area: ...