Introduction to Parallel Algorithms and Distributed Computing

Problem statement

How would we count the number of students present in the class?

Solution 1

A naive method to tackle this task would be to assign one student to oversee the counting process and count every student one by one. This procedure will take NN steps, which is the number of students in the class.

Upgraded problem statement

Suppose we are in a hall (which has a uniform structure containing some fixed rows and each row has the same fixed chairs) full of an audience, and we want to calculate the number of people. Can we solve this problem more efficiently?

Solution 2

Now as we know that the audience is well seated in a uniform arrangement of seats – each row has an equal number of people in it. We can count in a much more efficient way as compared to the Solution 1. For that, we will count the number of people in one row. Let’s say there are CC people in one row, and we have RR many rows then the total number of people will be rows count multiplied by the number of people in one row C×RC \times R.

Is Solution 2 better than Solution 1?. The cost of our current algorithm is the total number of rows which in this case is RR added to the CC because we invested our resources in counting RR and CC. Whereas the solution 1 costs NN.

Note: R+CR+C is way smaller than NN (which is equal to R×CR\times C). Therefore, our current solution is quite efficient.

To get an idea of the comparison of both algorithms, check out their comparison in the following graph.

Press + to interact
Comparing the counting algorithms
Comparing the counting algorithms

In the graph above, just for convenience, we’ve taken squared NN (number of students). That is why we see a slight non-linear curve of Solution-1 above. More importantly, it can be clearly seen that Solution 1: (R×CR \times C steps) is way too costly than Solution 2: (R+CR+C steps), which is the curve of N\sqrt{N}.

Final problem statement

Consider a big stadium full of an audience. Our job is to count all the people.

Solution 1 will be quite difficult to implement. Further, that will take a lot of time.
As far as Solution 2 is concerned, we cannot implement that here as in stadium, the number of seats in each row is not the same.

How can we count all the people present there in an efficient way?

Solution 3

This time, we involve all people in the audience at once to count them all. First of all, we’ll ask everybody to bring out a piece of paper and write two pieces of information:

  1. Turn number
  2. People count

At the start of the process, everyone’s turn number is set to 00 and people count is set to 11, as each individual has only counted themselves. Participants are then instructed to form groups of two among them and select one representative (they can choose either member of the group to be the representative) from each group. The non-representative members of the groups will sit down after handing over their papers to their elected representatives. The representatives will then increase the turn number by 11 and write the sum of people count of both members of the group on their paper. As a result, each representative will now have updated values of turn number as 11 and people count as 22 on their papers.

Note: With just one step, half the population is not eligible to be counted.

Continuing the process, all representatives will form pairs again and pass on their papers to the newly nominated representatives. The updated values on each representative’s paper for turn number and people count will now be 22 and 44 respectively. The turn number has been incremented by 11, but each representative now represents 2+2=42+2=4 people, as the people count written on both group members’ papers was 22.

This process will be repeated until there is only one person standing. If the turn number is tt for this person, the people count will be 2t2^t. This is because the people count has been updated in the following fashion: 1,2,4,8,16,32,...1, 2, 4, 8, 16, 32, ... which is 20,21,22,23,24,25,.... 2^0, 2^1, 2^2, 2^3, 2^4, 2^5, ....\space.

If we look at this idea carefully, it performs parallel counting of the crowd size in the stadium. Instead of one person for each row, everybody contributed to the task of counting. The people standing will be the active representatives that hold the summation of the population count to whom they represent. The count will look like this:
Initially, NN people will be standing. After the first step, N/2N/2 people will sit down, and N/2N/2 will become representatives. After the second step, N/4N/4 people will be standing. After the third step, N/8N/8 people will be standing, and so on.

Parallelism in distributed computing

The above example can be viewed as a helpful analogy to parallelism in distributed computinga model in which components of a software system are shared among multiple computers or nodes, but are run as one system in order to improve efficiency and performance. Here, we’re solving a single task with the help of all spectators present in the stadium. The task of counting the number of spectators has been distributed to count the people in the stadium.

In distributed computing, we use a number of networked computing devices to distribute a task we want to solve. It is different from parallel computing in terms that in parallel computing, the multiple processors use shared memory to communicate with each other, unlike distributed computing systems.

The shrinking sequence of the representatives will be:

N,N2,N4,N8,N16, ... ,1N, \frac{N}{2}, \frac{N}{4}, \frac{N}{8}, \frac{N}{16}, \space ... \space , 1

It will take log2N\log_2N steps to count all the audience. This is an extremely powerful way of counting. Here’s a more admirable analogy.

If we look at the population right now, which is roughly 7.57.5 billion to be approximately 7,524,594,2817,524,594,281, this technique will take log2(7524594281)\log_2 (7524594281) steps, which are approximately a total of 3333 iterations.

That is the power of parallelism.


Exercise: Counting the number of atoms in the universe

Imagine a hypothetical scenario where we want to count the number of total atoms in the universe. We have the leverage of our artistic imagination, such that each atom is a small computer that can perform addition.

We know that the number of atoms in the universe are 1082\approx 10^{82}. Using the distribution strategies discussed above, how many steps will it take to calculate the exact count of the total atoms?