...

The Dataset and Exploratory Data Analysis

Proceed with the Cleveland database and perform exploratory data analysis.

We'll cover the following...

The dataset
Exploratory data analysis

Let's move on and work with another famous dataset on heart disease in Cleveland. This original and full dataset is a part of the UCI machine learning repository and contains four databases: Cleveland, Hungary, Switzerland, and the VA Long Beach. This dataset was donated in 1988 to the public. The original database contains 76 attributes, but all published experiments by machine learning researchers refer to using a subset of 14 of them.

The dataset

In particular, the Cleveland database is the only one widely used by machine learning researchers. In the original database, the goal field refers to a patient’s presence of heart disease. It’s an integer value from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1, 2, 3, and 4) from absence (value 0). Information on the 14 attributes that we’re going to use is provided below:

age: Years
sex ...

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

The Dataset and Exploratory Data Analysis

The dataset