In this lesson, we’ll apply different statistical measures using arrays.

With statistics, we get to understand and describe our data. It is the science of collecting, organizing, analyzing, and presenting data. For example, in a group of a hundred people of different ages and backgrounds, we might want to know the average age of the group or their average height. Or, we might want to know the most common ethnicity among the group.

Only after we’ve understood and are able to describe the data, can we make inferences about the data. The different statistical measures like mean, mode, median, variance, and standard deviation can help us analyze the data.

Instruction: Use the playground below for all the upcoming tasks.

Statistical measures

Use this playground to practice your learning:

Press + to interact
#include <iostream>
#include <math.h>
using namespace std;
// You may assume these two functions are available
void bubbleSort(int A[], int n);
void printSpecial(const char Msg[], int D[], int size);
// Prototypes: You need to write these function, scroll down to add code... and test it
float mean(int D[], int n);
float median(int D[], int n);
int frequency(int D[], int n, int t);
int mode(int D[], int n, int &mf);
float variance(int D[], int n);
float standardDeviation(int D[], int n);
int main()
{
const int capacity = 100;
int D[] = {1,2,3,4,5,6,1,1,2,9,9,2,7,7,8,4,3,1,2,3,
4,5,6,9,1,2,3,4,5,6,1,1,2,9,9,2,7,7,8,4,
3,1,2,3,4,5,6,9,1,2,3,4,5,6,1,1,2,9,9,2,
7,7,8,4,3,1,2,3,4,5,6,9,1,2,3,4,5,6,1,1,
2,9,9,2,7,7,8,4,3,1,2,3,4,5,6,9};
int n = sizeof(D)/sizeof(int);
cout << "The data:\n";
printSpecial("D", D, n); cout << endl<<endl;
cout << "Mean: " <<mean(D, n)<<endl;
cout << "Median: " <<median(D, n)<<endl;
int mod, modeFreq=0;
mod = mode(D, n, modeFreq);
cout << "Mode: " <<mod<<"\t Mode-Frequency: "<<modeFreq<<endl;
cout << "Variance: "<<variance(D, n)<<endl;
cout << "Standard Deviation: "<<standardDeviation(D, n)<<endl;
return 0;
}
float mean(int D[], int n)
{
// Write your code here...
}
float median(int D[], int n)
{
// Write your code here...
return 0.0;
}
int frequency(int D[], int n, int t)
{
// Write your code here...
return 0;
}
int mode(int D[], int n, int &mf)
{
// Write your code here...
return 0;
}
float variance(int D[], int n)
{
// Write your code here...
return 0.0;
}
float standardDeviation(int D[], int n)
{
// Write your code here...
return 0.0;
}

1. Calculating the mean

The mean is the average of the values. We calculate the arithmetic mean by taking the sum of the values and dividing the sum by the total number of values.

Press + to interact
// where N is the size of the array D[]
float mean(int D[], int n)
{
float result = 0;
// calculating the sum of the values of the array
for(int di=0; di<n; di++)
result+=D[di];
// dividing the sum with the size of the array
return result/n;
}

Instruction: Write the function in the above playground and test it.

Exercise: Calculating the grade

Write a function that calculates the total grade percentage of the student. You are given the percentage and achieved grade in each instrument (each out of 100).

float theGrader(int Marks[ ], int Percentage [ ], int k)

Here is an example (having 5 instruments) with the weightage and earned marks percentage in each instrument:

Instrument# Percentage Grade (out of 100)
Assignment # 1 10% 80
Assignment # 2 20% 90
Midterm 20% 85
Project 25% 95
Final exam 25% 75

The achieved percentage is as follows:

10×80+20×90+20×85+25×95+25×75100\frac{10\times80+20\times90+20\times85+25\times95+25\times75}{100}

=85.5= 85.5

The general formula for the achieved percentage is as follows:

i=0k1(pi×mi)100\frac{ \sum_{i=0}^{k-1} (p_i \times m_i)}{100}

where mim_i represent the marks achieved in the ii'th instrument and pip_i is the percentage of the instrument.

Instruction: Write the code in the following widget.

Press + to interact
#include <iostream>
using namespace std;
// where K is the total # of instruments
float theGrader(int Marks[ ], int Percentage [ ], int k)
{
// write code here...
}
int main() {
int Marks[] = {80, 90, 85, 95, 75}, n = 5,
Percentage[] = {10, 20, 20, 25, 25};
cout << "The acheived grade is: "<<theGrader(Marks, Percentage, n);
return 0;
}

2. Calculating the median

The median is the middle value. To calculate the median, we first need to sort the values in ascending or descending order.

If the total number of values is odd, we can easily find the middle value. However, if the total number of values is even, we take the average of the middle two values and that is the median.

Below, we’ve used bubble sort to sort the values first and then calculate the median.

Instruction: Write the code in the above exercise playground.

Press + to interact
// Assume: sorting with bubble sort is already there
void bubbleSort(int A[], int n);
// finding the median
float median(int D[], int n)
{
bubbleSort(D, n);
// if even number of total values then take the average
// of the middle two values
if(n%2==0)
return float(D[n/2-1]+D[n/2])/2;
// if odd number of values, then return the middle value
else
return D[n/2];
}

3. Calculating the mode

Another way to represent data is through the mode that is the value with the highest frequency in the data.

To calculate the mode, we need to know the frequencies of all elements. We will make the following function:

int frequency(int D[], int n, int t)

Here, D[] is the data of size n, and t is the value to search and count how many times t appears in D[].

Idea:

The mode() function assumes the first value to be the mode value mv. We then find the frequency of the first value of the array and store it inside mf.

Inside the for loop, in the first iteration, we find the frequency of the second value of the array and compare it with the frequency of the first element. If the frequency of the second (next) element f is greater than the first (previous) element, we update the mode and frequency values (mv and mf respectively). These steps are carried out for each element of the array.

Lastly, after the entire array has been traversed, we return the mode value mv that holds the value with the highest frequency.

Press + to interact
int frequency(int D[], int n, int t)
{
int f=0;
for(int di=0; di<n; di++)
if(D[di]==t)
f++;
return f;
}
int mode(int D[], int n, int &mf)
{
int mv = D[0]; // calling the first value as mode value(mv), just like max
mf = frequency(D, n, D[0]); // mv is ModeValue , mf is ModeFrequency
for(int di=1; di<n; di++)
{
int nv = D[di]; // taking next value
int f = frequency(D, n, nv); // next value's frequency
// if the new value's frequency is bigger then the previous one
if(f > mf)
mf = f, mv = nv; // update it.
}
return mv;
}

4. Calculating the variance and standard deviation

As the name suggests, the variance is used to calculate the degree of variability of each value from the mean.

  1. We take the difference of each value from the mean and then square the differences (to make them positive).
  2. We then divide the sum of the squared values by the total number of values.

Variance:

σ2=i=0n1(Diμ)2n\sigma^2 = \frac{\sum_{i=0}^{n-1}(D_i - \mu)^2} {n}

where μ\mu is the mean of the data D[].

Standard Deviation is the square root of the variance, usually represented by σ\sigma.

Press + to interact
float variance(int D[ ], int n)
{
float m = mean(D, n);
float var_value=0;
for(int di=0; di<n; di++)
{
var_value += (D[di]-m)*(D[di]-m);
}
return var_value/n;
}
float standardDeviation(int D[ ], int n)
{
float var = variance(D, n);
return sqrt(var);
}