...

/

Linear Regression on Real Dataset

Linear Regression on Real Dataset

Learn to implement linear regression on a real data set with several features.

Linear regression with several unknowns

So far, we’ve worked with examples of linear regression consisting of only one unknown. However, systems from real-life situations are typically complex, with several unknowns. In this lesson, we’ll work with a more involved example of a real data setA. Tsanas, A. Xifara: ‘Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools’, Energy and Buildings, Vol. 49, pp. 560-567, 2012.

Context

The task at hand is to find a relationship that determines the amount of heating or cooling required by a building, given different features of the building like roof area, wall area, and glazing.

The data set consists of 88 different features, namely X1 to X8, of 768768 different buildings, along with their heating and cooling loads, Y1 and Y2, respectively. Below is a representation of the first five data records and the header information.

Linear system modeling

For this specific lesson, we are interested in the target Y1 (heating load) in the data depending upon the 88 feature attributes (X1, X2,…, X8). In particular, we want a prediction of the target given features. If we assume a linear relationship between the target and the features, we can model our data as a linear system.

Let 1\bold{1} represents a column vector of all ones and xi\bold{x_i} represents a column vector of the ithi^{th} feature of all the records. The data matrix will then be A=[1x1x2...x8]A=\begin{bmatrix}\bold{1}&\bold{x_1}&\bold{x_2}&...&\bold{x_8}\end{bmatrix} ...