Statistical Measures Using Arrays
Learn to apply statistical measures using arrays.
In this lesson, we’ll apply different statistical measures using arrays.
With statistics, we get to understand and describe our data. It is the science of collecting, organizing, analyzing, and presenting data. For example, in a group of a hundred people of different ages and backgrounds, we might want to know the average age of the group or their average height. Or, we might want to know the most common ethnicity among the group.
Only after we’ve understood and are able to describe the data, can we make inferences about the data. The different statistical measures like mean, mode, median, variance, and standard deviation can help us analyze the data.
Instruction: Use the playground below for all the upcoming tasks.
Statistical measures
Use this playground to practice your learning:
#include <iostream>#include <math.h>using namespace std;// You may assume these two functions are availablevoid bubbleSort(int A[], int n);void printSpecial(const char Msg[], int D[], int size);// Prototypes: You need to write these function, scroll down to add code... and test itfloat mean(int D[], int n);float median(int D[], int n);int frequency(int D[], int n, int t);int mode(int D[], int n, int &mf);float variance(int D[], int n);float standardDeviation(int D[], int n);int main(){const int capacity = 100;int D[] = {1,2,3,4,5,6,1,1,2,9,9,2,7,7,8,4,3,1,2,3,4,5,6,9,1,2,3,4,5,6,1,1,2,9,9,2,7,7,8,4,3,1,2,3,4,5,6,9,1,2,3,4,5,6,1,1,2,9,9,2,7,7,8,4,3,1,2,3,4,5,6,9,1,2,3,4,5,6,1,1,2,9,9,2,7,7,8,4,3,1,2,3,4,5,6,9};int n = sizeof(D)/sizeof(int);cout << "The data:\n";printSpecial("D", D, n); cout << endl<<endl;cout << "Mean: " <<mean(D, n)<<endl;cout << "Median: " <<median(D, n)<<endl;int mod, modeFreq=0;mod = mode(D, n, modeFreq);cout << "Mode: " <<mod<<"\t Mode-Frequency: "<<modeFreq<<endl;cout << "Variance: "<<variance(D, n)<<endl;cout << "Standard Deviation: "<<standardDeviation(D, n)<<endl;return 0;}float mean(int D[], int n){// Write your code here...}float median(int D[], int n){// Write your code here...return 0.0;}int frequency(int D[], int n, int t){// Write your code here...return 0;}int mode(int D[], int n, int &mf){// Write your code here...return 0;}float variance(int D[], int n){// Write your code here...return 0.0;}float standardDeviation(int D[], int n){// Write your code here...return 0.0;}
1. Calculating the mean
The mean is the average of the values. We calculate the arithmetic mean by taking the sum of the values and dividing the sum by the total number of values.
// where N is the size of the array D[]float mean(int D[], int n){float result = 0;// calculating the sum of the values of the arrayfor(int di=0; di<n; di++)result+=D[di];// dividing the sum with the size of the arrayreturn result/n;}
Instruction: Write the function in the above playground and test it.
Exercise: Calculating the grade
Write a function that calculates the total grade percentage of the student. You are given the percentage and achieved grade in each instrument (each out of 100).
float theGrader(int Marks[ ], int Percentage [ ], int k)
Here is an example (having 5 instruments) with the weightage and earned marks percentage in each instrument:
Instrument# | Percentage | Grade (out of 100) |
---|---|---|
Assignment # 1 | 10% | 80 |
Assignment # 2 | 20% | 90 |
Midterm | 20% | 85 |
Project | 25% | 95 |
Final exam | 25% | 75 |
The achieved percentage is as follows:
The general formula for the achieved percentage is as follows:
where represent the marks achieved in the 'th instrument and is the percentage of the instrument.
Instruction: Write the code in the following widget.
#include <iostream>using namespace std;// where K is the total # of instrumentsfloat theGrader(int Marks[ ], int Percentage [ ], int k){// write code here...}int main() {int Marks[] = {80, 90, 85, 95, 75}, n = 5,Percentage[] = {10, 20, 20, 25, 25};cout << "The acheived grade is: "<<theGrader(Marks, Percentage, n);return 0;}
2. Calculating the median
The median is the middle value. To calculate the median, we first need to sort the values in ascending or descending order.
If the total number of values is odd, we can easily find the middle value. However, if the total number of values is even, we take the average of the middle two values and that is the median.
Below, we’ve used bubble sort to sort the values first and then calculate the median.
Instruction: Write the code in the above exercise playground.
// Assume: sorting with bubble sort is already therevoid bubbleSort(int A[], int n);// finding the medianfloat median(int D[], int n){bubbleSort(D, n);// if even number of total values then take the average// of the middle two valuesif(n%2==0)return float(D[n/2-1]+D[n/2])/2;// if odd number of values, then return the middle valueelsereturn D[n/2];}
3. Calculating the mode
Another way to represent data is through the mode that is the value with the highest frequency in the data.
To calculate the mode, we need to know the frequencies of all elements. We will make the following function:
int frequency(int D[], int n, int t)
Here, D[]
is the data of size n
, and t
is the value to search and count how many times t
appears in D[]
.
Idea:
The mode()
function assumes the first value to be the mode value mv
. We then find the frequency of the first value of the array and store it inside mf
.
Inside the for
loop, in the first iteration, we find the frequency of the second value of the array and compare it with the frequency of the first element. If the frequency of the second (next) element f
is greater than the first (previous) element, we update the mode and frequency values (mv
and mf
respectively). These steps are carried out for each element of the array.
Lastly, after the entire array has been traversed, we return the mode value mv
that holds the value with the highest frequency.
int frequency(int D[], int n, int t){int f=0;for(int di=0; di<n; di++)if(D[di]==t)f++;return f;}int mode(int D[], int n, int &mf){int mv = D[0]; // calling the first value as mode value(mv), just like maxmf = frequency(D, n, D[0]); // mv is ModeValue , mf is ModeFrequencyfor(int di=1; di<n; di++){int nv = D[di]; // taking next valueint f = frequency(D, n, nv); // next value's frequency// if the new value's frequency is bigger then the previous oneif(f > mf)mf = f, mv = nv; // update it.}return mv;}
4. Calculating the variance and standard deviation
As the name suggests, the variance is used to calculate the degree of variability of each value from the mean.
- We take the difference of each value from the mean and then square the differences (to make them positive).
- We then divide the sum of the squared values by the total number of values.
Variance:
where is the mean of the data D[]
.
Standard Deviation is the square root of the variance, usually represented by .
float variance(int D[ ], int n){float m = mean(D, n);float var_value=0;for(int di=0; di<n; di++){var_value += (D[di]-m)*(D[di]-m);}return var_value/n;}float standardDeviation(int D[ ], int n){float var = variance(D, n);return sqrt(var);}