In this lesson, we’ll learn about two special numbers unique numbersa value which appears in the data exactly once and distinct numbersa value that does not repeat in a set of data but it does not have to be the only value that appears in the set, and how to compute them in the given data.

the terms “unique” and “distinct” are used to describe the elements of a set or collection.

A “unique” element is an element that appears only once in a set or collection. If an element appears more than once in a set, it is not considered to be unique.

A “distinct” element is an element that is different from all other elements in a set or collection. It doesn’t matter how many times an element appears in the set, as long as it is different from all other elements, it is considered to be distinct.

In other words, all unique elements are distinct, but not all distinct elements are unique.

For example, consider the set [1, 2, 3, 4]. Elements 1, 2, 3, and 4 are all unique and distinct elements of the set. If we add element 4 again, the set becomes [1, 2, 3, 4, 4]. In this set, the elements 1, 2, 3, and 4 are still distinct, but 4 is no longer unique because it appears twice in the set.

Finding unique/distinct elements in data

In the following lesson, we'll be provided with data. Now on that data, we need to process and calculate all its unique and distinct elements. Along with computing distinct elements, we need to compute each element's frequency and show its histogram (where each element's frequency is shown in terms of a number of asterisks).

Sample program

The data:
D:	 = { 
		1 2 3 4 3 5 6 1 2 2 
		9 9 2 7 7 8 2 3 1 2 
		3 1 5 6 9 1 2 3 1 5 
		6 1 1 2 9 9 2 7 7 8 
		2 3 1 2 3 5 5 6 9 1 
		2 3 8 5 6 1 1 2 9 9 
		2 7 7 8 3 3 1 2 3 6 
		5 6 9 1 2 3 2 5 6 1 
		1 2 9 9 2 7 7 8 3 3 
		1 2 3 1 5 6 9 
	 }


Unique:
Us:	 = { 
		4 
	 }
Distinct:
Ds:	 = { 
		1 2 3 4 5 6 9 7 8 
	 }
Frequencies:
Fs:	 = { 
		18 20 15 1 9 9 12 8 5 
	 }
__________________________________________

The plot:

1	******************	18	
2	********************	20	
3	***************	15	
4	*	1	
5	*********	9	
6	*********	9	
9	************	12	
7	********	8	
8	*****	5	

Instruction: Write the code in the following playground. The main() function for testing is already provided.

Press + to interact
#include <iostream>
#include <math.h>
using namespace std;
void printSpecial(const char Msg[ ], int D[ ], int size);
void findAllUniques(int D[ ], int N, int Us[ ], int &USize);
void findAllDistincts(int D[ ], int N, int Ds[ ], int &DSize);
void calculateDistinctsFrequecy(int D[ ], int N, int Ds[ ], int &DSize, int Fs[ ]);
void displayDistinctsFrequecyGraph(int Ds[ ], int &DSize, int Fs[ ]);
int main()
{
const int capacity = 100;
int D[ ] = { 1,2,3,4,3,5,6,1,2,2,
9,9,2,7,7,8,2,3,1,2,
3,1,5,6,9,1,2,3,1,5,
6,1,1,2,9,9,2,7,7,8,
2,3,1,2,3,5,5,6,9,1,
2,3,8,5,6,1,1,2,9,9,
2,7,7,8,3,3,1,2,3,6,
5,6,9,1,2,3,2,5,6,1,
1,2,9,9,2,7,7,8,3,3,
1,2,3,1,5,6,9},
Us[capacity], USize, Fs[capacity], Ds[capacity], DSize;
int N = sizeof(D)/sizeof(int); // instead of counting we will do the following
cout << "The data:\n";
printSpecial("D:\t", D, N); cout << endl<<endl;
findAllUniques(D, N, Us, USize);
cout << "Unique:\n";
printSpecial("Us:\t", Us, USize);
findAllDistincts(D, N, Ds, DSize);
cout << "Distinct:\n";
printSpecial("Ds:\t", Ds, DSize);
calculateDistinctsFrequecy(D, N, Ds, DSize, Fs);
cout << "Frequencies:\n";
printSpecial("Fs:\t", Fs, DSize);
cout << "The plot:\n\n";
cout << "__________________________________________";
displayDistinctsFrequecyGraph(Ds, DSize, Fs);
return 0;
}
int frequency(int D[ ], int N, int T)
{
int f=0;
for(int di=0; di<N; di++)
if(D[di]==T)
f++;
return f;
}
void findAllUniques(int D[ ], int N, int Us[ ], int &USize)
{
// Write code here.
}
void findAllDistincts(int D[ ], int N, int Ds[ ], int &DSize)
{
// Write code here.
}
void printSpecial(const char Msg[ ], int D[ ], int size)
{
cout << Msg << " = { ";
for(int i=0; i<size; i++)
{
cout << D[i] << " ";
}
cout << " }"<<endl;
}
void printASymbolKTimes(char sym, int k)
{
for(int i=1; i<=k; i++)
cout << sym;
}
void calculateDistinctsFrequecy(int D[ ], int N, int Ds[ ], int &DSize, int Fs[ ])
{
for(int di=0; di<DSize; di++)
{
Fs[di] = frequency(D, N, Ds[di]);
}
}
void displayDistinctsFrequecyGraph(int Ds[ ], int &DSize, int Fs[ ])
{
for(int di=0; di<DSize; di++)
{
cout << Ds[di] <<"\t";
printASymbolKTimes('*', Fs[di]);
cout <<"\t"<<Fs[di] <<"\t"<< endl;
}
}

Finding the unique values in an array

In the given data, the elements with a frequency of only one are the unique values.

For finding all uniques let us make the following function:

void findAllUniques(int D[ ], int N, int Us[ ], int &USize)

This function will iterate through the D[] array of N size and compute all the unique values from the data and save them in Us[] array and also save in the USize how many values are saved inside Us[].

Implementation for finding uniques in the data

The idea is that for each element D[di] in the data we look at its frequency in the entire data (by calling the frequency function) and if it only occurred once in the data we save it inside the unique set.

Here’s its implementation:

Press + to interact
void findAllUniques(int D[ ], int N, int Us[ ], int &USize)
{
USize=0;
for(int di=0, ui=0; di<N; di++)
{
if(frequency(D, N, D[di])==1) // D[i] is unique
{
Us[ui] = D[di], ui++; // Its shortform is Us[ui++] = D[di];
USize++;
// The above two instructions can be shortened as Us[USize] = D[di]
}
}
}

Finding the distinct values in an array

In the data, the distinct values are basically the union of all the elements in the data. So, each value will appear exactly once.

Implementation of finding all distinct values in the data

For computing the distinct element, we will make an array of distinct elements Ds[]. We will keep storing the distinct values inside Ds[] so that for every element of the data, we will first check if D[i] is not already present in the Ds[] array (by checking D[i] frequency inside Ds[] to be zero). In that case, we will add the element D[i] in Ds[i].

Here’s the implementation:

Press + to interact
void findAllDistincts(int D[ ], int N, int Ds[ ], int &DSize)
{
DSize=0;
for(int di=0; di<N; di++)
{
// If D[di] is appearing for the first time in Ds array
if(frequency(Ds, DSize, D[di])==0)
{
// add in distincts array Ds
Ds[DSize] = D[di], DSize++;
}
}
}

Exercise: Finding distinct values frequencies

To find the frequency of each distinct element in the data, we should make the following function:

void calculateDistinctsFrequency(int D[], int N, int Ds[ ], int &DSize, int Fs[ ]);
  • int D[] is the data in which we have to search the frequency of each element.
  • int N is the size of the data.
  • int Ds[] is the array of the distinct values already computed by findAllDistincts().
  • int DSize is the size of Ds[].
  • int Fs[] is the objective array we need to compute (with DSize as size) such that each Fs[i] should have the frequency of Ds[i] in the data D[].

For computing Fs[i], the frequency of the Ds[i] element in the data we should call the frequency function.

Instruction: Write the implementation of this function in the above playground.

Exercise: Displaying distinct elements/frequency graph

Now, to display the frequency of distinct elements in terms of asterisks, we are required to make the following function:

void displayDistinctsFrequencyGraph(int Ds[ ], int &DSize, int Fs[ ]);
  • int Ds[] is passed because of each distinct value frequency graph we need to display each distinct element.
  • int DSize is the size of the total distinct element in the data (and D[] size too).
  • int Fs[], holds the frequency of each distinct element in the data. The pair, Ds[i], Fs[i] is the distinct element and frequency synchronized pair but stored in separate arrays.