Scikit-learn
contains small datasets that are very easy to access. The Boston house-prices dataset is one of these.
The dataset contains 506 rows and 13 columns.
The dataset has no null
or missing values.
Many machine learning papers use this dataset to address regression problems.
The dataset contains the following 13 characteristics:
Characteristic | Description |
---|---|
CRIM | This is the average per person crime rate by town. |
ZN | This is the extent of private land zoned for lots over 25,000 square feet. |
INDUS | This is the extent of non-retail business sections of land per town. |
CHAS | It is considered to be 1 if tract bounds river, otherwise it’s always 0 . |
NOX | This refers to the Nitric Oxide concentration. |
RM | This is the average number of rooms per residence. |
AGE | This is the extent of proprietor-involved units worked before 1940. |
DIS | This is the weighted distance to five Boston business focuses. |
RAD | This is the index of access to radial highways. |
TAX | This is the property tax rate (full-value) per $10,000. |
PRATIO | This tells the student-teacher ratio. |
B | B is calculated by 1000(Bk-0.63)^2. Bk denotes the proportion of black people by town. |
LSTAT | This tells us the percent lower status of the population. |
return_X_y
: If set to True
, returns (data, target)
. The default value for this is False
.
from sklearn.datasets import load_bostonX, y = load_boston(return_X_y = True)data = load_boston()print('The DataFrame is:\n', data)print('The shape of Dataframe is: ', X.shape)
To include the Boston-house prices
dataset, we have to import it using the scikit-learn
library as done in line 1 of code.
The data
object holds the prices data inside the dataset. The X.shape
argument holds the figure of the dataset, i.e., 506 rows and 13 columns.