Skewness and Kurtosis

You will learn about the skewness and kurtosis, which are important properties of distributions, in this lesson.

Skewness

Skewness refers to the distortion or asymmetry in a Bell Curve or Normal Distribution. It tells us how much a distribution varies from the Normal Distribution. A Normal Distribution has a Skewness of zero. A distribution can be right (positively) or left (negatively) skewed.

Consequence of Skewness

  • Skewness helps us locate the outliers (the data points exhibiting mysterious behaviour). For example, the transactions happening over a credit card abruptly jump to a higher amount than the normal transactions happening. This contributes to an outlier.

  • The mean of a positively skewed distribution is greater than the median while the mean of a negatively skewed distribution is less than the median.

  • The mean of a positively skewed distribution is greater than the mode while the mean of a negatively skewed distribution is less than the mode.

  • Regression models in Machine Learning are affected by the presence of outliers, which can be indicated from a skewed distribution. So, it becomes necessary at times to remove the Skewness.

How to check Skewness?

Naive approach

  • The naive approach is to make the histogram or density curve of a column of a dataset. Check the curve to see if it is more Gaussian-like or skewed.

  • There are some mathematical measures to check the skewness of a column.

Method 1

Skewness = 1ni=1n(xixˉ)3[1n1i=1n(xxˉ)2]32\frac{\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^3 }{[\frac{1}{n-1}\sum_{i=1}^{n}(x-\bar{x})^2]^\frac{3}{2}} ...