Numerical Variables Transformation
Explore techniques to transform numerical variables to meet Gaussian distribution assumptions for models such as linear and logistic regression. Understand different transformations like logarithm, reciprocal, power, and Box-Cox to improve model performance by making features more Gaussian-like.
We'll cover the following...
Numerical Variables Transformation
In the Lesson of Probability Distributions, while discussing Gaussian Distributions, we discussed that algorithms in Machine Learning like Linear Regression or Logistic Regression assume the Features’ underlying distribution to be Gaussian. If not, the model might perform badly. If the distribution is not Gaussian, we can apply Transformations to make it Gaussian. Machine Learning models that assume the underlying distribution of the variables to be Gaussian are:
- Linear Regression
- Logistic Regression
- Linear Discriminant Analysis
- Naive Bayes
We can apply the following transformations on the dataset’s individual features after analyzing them. Transformations can help us to achieve good results by making the underlying features more Gaussian-like.
-
Logarithm Transformation: This transformation is used on the features that have positive values. This logarithm is the Natural Logarithm.
-
Reciprocal Transformation ( where ...