What is Regression ?
You'll learn about Regression in this lesson. It is a concept that has been borrowed from Inferential Statistics and involves predicting a real-valued output.
What is Regression?
Regression comes under supervised learning and it involves predicting a real-valued output. Classification predicts a discrete-valued output.
Key terms
Input column or Independent features
The columns that are used to predict the output column are called the input columns or independent features. These are denoted as , , , … where is the first feature and so on. Note that denotes the total number of features or dimensions.
Output column or dependent feature
The column that is to be predicted is called the output column or dependent feature. It is denoted as .
Total instances or samples
-
Total number of rows or instances or samples are denoted as .
-
Samples, rows, or instances are denoted as , , …, and .
-
The first feature value for the first sample is denoted as . The second feature value for the first sample is denoted as . The feature value for the first sample is denoted as .
-
If we have = 2 features and = 3 instances, then we will have the following representation of individual values:
Row = , ,
Row = , ,
Row = , ,
Row = , ,
Training dataset
The dataset taken out of the original preprocessed labeled dataset on which the Machine Learning model is trained is called the training dataset.
Test dataset
The unseen dataset on which the model is evaluated is called the test dataset.
Validation dataset
We use the dataset, along with the training data set, to evaluate the performance of the trained model. Depending on the model’s performance, we either train the model again or finalize it.
Numerical and categorical features
-
Numerical columns or features have real number values present in them, like height, price of a house, etc.
-
Categorical features have values with which category can be associated. For example, sex can be male or female.
-
Categorical features can be ordinal and nominal.
-
Ordinal categorical variables have an order associated with them such as lower class, middle class, and upper class.
-
Nominal categorical variables don’t follow any order like sex.
-
In Python Numerical features have the type
float
orint
. Categorical features are usually represented asobject
type when they are stored in Pandas Dataframe.
Get hands-on with 1400+ tech skills courses.