Case Study: Seattle House Prices-I
Learn about the ModernDive with this case study with EDA.
We'll cover the following...
Kaggle is a machine-learning and predictive-modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is “House Sales in King County, USA.” It consists of the sale prices of homes sold between May 2014 and May 2015 in King County, Washington, USA, which includes the greater Seattle metropolitan area. This dataset is in the house_prices
data frame included in the moderndive
package.
The dataset consists of 21,613 houses and 21 variables describing these houses (for a full list and description of these variables, see the help file by running ?house_prices
in the console). In this case study, we’ll create a multiple regression model where:
The outcome variable
is the sale price of houses. There are two explanatory variables:
A numerical explanatory variable
: House size sqft_living
is measured in square feet of living space. Note that 1 square foot is about 0.09 square meters.A categorical explanatory variable
: House condition
is a categorical variable with five levels where1
indicates poor and5
indicates excellent.
Exploratory data analysis
As we’ve said numerous ...