Numerical Variables Transformation
Numerical Variables Transformation refers to applying operations on the Numerical Columns to have better performance of Machine Learning models. You will learn more here.
We'll cover the following
Numerical Variables Transformation
In the Lesson of Probability Distributions, while discussing Gaussian Distributions, we discussed that algorithms in Machine Learning like Linear Regression or Logistic Regression assume the Features’ underlying distribution to be Gaussian. If not, the model might perform badly. If the distribution is not Gaussian, we can apply Transformations to make it Gaussian. Machine Learning models that assume the underlying distribution of the variables to be Gaussian are:
- Linear Regression
- Logistic Regression
- Linear Discriminant Analysis
- Naive Bayes
We can apply the following transformations on the dataset’s individual features after analyzing them. Transformations can help us to achieve good results by making the underlying features more Gaussian-like.
-
Logarithm Transformation: This transformation is used on the features that have positive values. This logarithm is the Natural Logarithm.
-
Reciprocal Transformation ( where is one of the values of the feature): This transformation can be applied to negative values and is not applied to the value .
-
Square Root or Cube Root Transformation: This transformation comes under the category of Power Transformations and it involves taking the power or where is the individual values of a feature.
-
Exponential or Power Transformations: It involves taking the power of an individual value of a feature (i.e ), where is any number. The goal is to try different values of , and see which works best for the case at hand.
-
Box-Cox Transform : Box-Cox Transform performs transformations under the different values of theparameter . The boxcox() SciPy function implements the Box-Cox transformation. It takes an argument, called lambda, that controls the type of transform to perform.
Below are some common values for lambda:
- = -1 is a reciprocal transform.
- = -0.5 is a reciprocal square root transform.
- = 0.0 is a log transform.
- = 0.5 is a square root transform.
- = 1.0 is no transform.
- if is not specified then an optimal value is chosen by the function based on the underlying distribution.
Get hands-on with 1400+ tech skills courses.