What is sklearn.datasets.load_boston(*[, return_X_y])?

Scikit-learn contains small datasets that are very easy to access. The Boston house-prices dataset is one of these.

Features

  1. The dataset contains 506 rows and 13 columns.

  2. The dataset has no null or missing values.

  3. Many machine learning papers use this dataset to address regression problems.

The dataset contains the following 13 characteristics:

Characteristic Description
CRIM This is the average per person crime rate by town.
ZN This is the extent of private land zoned for lots over 25,000 square feet.
INDUS This is the extent of non-retail business sections of land per town.
CHAS It is considered to be 1 if tract bounds river, otherwise it’s always 0.
NOX This refers to the Nitric Oxide concentration.
RM This is the average number of rooms per residence.
AGE This is the extent of proprietor-involved units worked before 1940.
DIS This is the weighted distance to five Boston business focuses.
RAD This is the index of access to radial highways.
TAX This is the property tax rate (full-value) per $10,000.
PRATIO This tells the student-teacher ratio.
B B is calculated by 1000(Bk-0.63)^2. Bk denotes the proportion of black people by town.
LSTAT This tells us the percent lower status of the population.

Parameters

return_X_y: If set to True, returns (data, target). The default value for this is False.

Code

from sklearn.datasets import load_boston
X, y = load_boston(return_X_y = True)
data = load_boston()
print('The DataFrame is:\n', data)
print('The shape of Dataframe is: ', X.shape)

Explanation

To include the Boston-house prices dataset, we have to import it using the scikit-learn library as done in line 1 of code.

The data object holds the prices data inside the dataset. The X.shape argument holds the figure of the dataset, i.e., 506 rows and 13 columns.