Measures of Variability

Measures of Variability or Spread

Measures of Variability also known as the measure of spread shows us the dispersion in the dataset and how the data is distributed around the center (Measure of Location) of the dataset. The most commonly used Measures of Variability are discussed below.

Variance

The Variance is the expected value (mean) of the squared differences of the data values from the mean. It shows us how close or far the values in a dataset are from the mean of the dataset in squared units.

Formula

s2=1n1i=1n(xixˉ)2s^2={\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2}

  • s2s^2 is the variance.

  • nn is the total number of values in the dataset

  • i=1n\sum_{i=1}^n is the sum of the values from 1 to n.

  • (xixˉ)2(x_i-\bar{x})^2 is the square of the difference of each value in the dataset from the mean.

  • i=1n(xixˉ)2\sum_{i=1}^n(x_i-\bar{x})^2 is the sum of all the squared difference of values from the mean.

Example

  • Lets say we have a list of numbers as 34, 56, 190, 10000, and 45.

  • Here n = 5 (Number of Values)

  • The mean of the above list of numbers is calculated as

xˉ=34+56+190+10000+455=103255=2065\bar{x}=\frac{34 + 56 + 190 + 10000 + 45}{5}=\frac{10325}{5}=2065

  • The calculations are done below.
x x-x̄ (x-x̄)2^2
34 -2031 4124961
56 -2009 4036081
190 -1875 3515625
10000 7935 62964225
45 -2020 4080400
Σ(x-x̄)2^2=78721292

s2=1n1i=1n(xixˉ)2=7872129251=19680323s^2={\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2}=\frac{78721292}{5-1}=19680323

The above value is a squared value. If some unit existed, that unit will also be squared.

Standard Deviation

Standard deviation is calculated by taking the square root of the variance. It gives us the same measure but in simplified form, and the units are not squared anymore. This makes the inference clearer.

Formula

s=1N1i=1N(xixˉ)2s=\sqrt{\frac{1}{N-1}\sum_{i=1}^N(x_i-\bar{x})^2}

  • ss is the standard deviation.

  • nn is the total number of values in the dataset

  • i=1n\sum_{i=1}^n is the sum of the values from 1 to n.

  • (xixˉ)2(x_i-\bar{x})^2 is the square of the difference of each value in the dataset from the mean.

  • i=1n(xixˉ)2\sum_{i=1}^n(x_i-\bar{x})^2 is the sum of all the squared difference of values from the mean.

Example

  • Lets say we have a list of numbers as 34, 56, 190, 10000, and 45.

  • Here n = 5 (Number of Values)

  • The mean of the above list of numbers is calculated as

xˉ=34+56+190+10000+455=103255=2065\bar{x}=\frac{34 + 56 + 190 + 10000 + 45}{5}=\frac{10325}{5}=2065

  • We have calculated the variance from the above calculation i.e s2=19680323s^2=19680323.

  • So, s=(19680323)=4436.25s=\sqrt(19680323)=4436.25

Range

Range is defined as the difference between the largest and the smallest values in the dataset. It gives us an idea about the range of dataset values.

Formula

Range =xLx_L - xSx_S

Where:

  • xLx_L is the largest value in the list of numbers.

  • xSx_S is the smallest value in the list of numbers.

Example

  • Lets say we have a list of numbers as 34, 56, 190, 10000, 45, so n=5.

  • Here xLx_L = 10000 and xSx_S = 34.

  • Range = xLx_L - xSx_S = 10000 - 34 = 9966

Quartiles

Quartiles are numbers that divide the data values into quarters. They break the dataset into four segments. Like for median they also have a pre-condition that the data should be sorted.