Beginner to Advanced Computing and Logic Building/

...

Introduction to Parallel Algorithms and Distributed Computing

Learn about parallel computing with the help of a real-life example.

We'll cover the following...

Problem statement
- Solution 1
Upgraded problem statement
- Solution 2
Final problem statement
Solution 3
- Parallelism in distributed computing
Exercise: Counting the number of atoms in the universe

Problem statement

How would we count the number of students present in the class?

Solution 1

A naive method to tackle this task would be to assign one student to oversee the counting process and count every student one by one. This procedure will take $N$ steps, which is the number of students in the class.

Upgraded problem statement

Suppose we are in a hall (which has a uniform structure containing some fixed rows and each row has the same fixed chairs) full of an audience, and we want to calculate the number of people. Can we solve this problem more efficiently?

Solution 2

Now as we know that the audience is well seated in a uniform arrangement of seats – each row has an equal number of people in it. We can count in a much more efficient way as compared to the Solution 1. For that, we will count the number of people in one row. Let’s say there are $C$ people in one row, and we have $R$ many rows then the total number of people will be rows count multiplied by the number of people in one row $C \times R$ .

Is Solution 2 better than Solution 1?. The cost of our current algorithm is the total number of rows which in this case is $R$ added to the $C$ because we invested our resources in counting $R$ and $C$ . Whereas the solution 1 costs $N$ .

Note: $R+C$ is way smaller than $N$ (which is equal to $R\times C$ ). Therefore, our current solution is quite efficient.

To get an idea of the comparison of both algorithms, check out their comparison in the following graph.

Press + to interact

In the graph above, just for convenience, we’ve taken squared $N$ (number of students). That is why we see a slight non-linear curve of Solution-1 above. More importantly, it can be clearly seen that Solution 1: ( $R \times C$ steps) is way too costly than Solution 2: ( $R+C$ steps), which is the curve of $\sqrt{N}$ .

Final problem statement

Consider a big stadium full of an audience. Our job is to count all the people.

Solution 1 will be quite difficult to implement. Further, that will take a lot of time.
As far as Solution 2 is concerned, we cannot implement that here as in stadium, the number of seats in each row is not the same.

How can we count all the people present there in an efficient way?

Solution 3

This time, we involve all people in the audience at once to count them all. First of all, we’ll ask everybody to bring out a piece of paper and write two pieces of information:

Turn number
People count

At the start of the process, everyone’s turn number is set to $0$ and people count is set to $1$ , as each individual has only counted themselves. Participants are then instructed to form groups of two among them and select one representative (they can choose either member of the group to be the representative) from each group. The non-representative members of the groups will sit down after handing over their papers to their elected representatives. The representatives will then increase the turn number by $1$ and write the sum of people count of both members of the group on their paper. As a result, each representative will now have updated values of turn number as $1$ and people count as $2$ on their papers.

Note: With just one step, half the population is not eligible to be counted.

Continuing the process, all representatives will form pairs again and pass on their papers to the newly nominated representatives. The updated values on each representative’s paper for turn number and people count will now be $2$ and $4$ respectively. The turn number has been incremented by $1$ , but each representative now represents $2+2=4$ people, as the people count written on both group members’ papers was $2$ .

This process will be repeated until there is only one person standing. If the turn number is $t$ for this person, the people count will be $2^t$ . This is because the people count has been updated in the following fashion: $1, 2, 4, 8, 16, 32, ...$ which is $2^0, 2^1, 2^2, 2^3, 2^4, 2^5, ....\space$ .

If we look at this idea carefully, it performs parallel counting of the crowd size in the stadium. Instead of one person for each row, everybody contributed to the task of counting. The people standing will be the active representatives that hold the summation of the population count to whom they represent. The count will look like this:
Initially, $N$ people will be standing. After the first step, $N/2$ people will sit down, and $N/2$ will become representatives. After the second step, $N/4$ people will be standing. After the third step, $N/8$ people will be standing, and so on.

Parallelism in distributed computing

The above example can be viewed as a helpful analogy to parallelism in distributed computinga model in which components of a software system are shared among multiple computers or nodes, but are run as one system in order to improve efficiency and performance. Here, we’re solving a single task with the help of all spectators present in the stadium. The task of counting the number of spectators has been distributed to count the people in the stadium.

In distributed computing, we use a number of networked computing devices to distribute a task we want to solve. It is different from parallel computing in terms that in parallel computing, the multiple processors use shared memory to communicate with each other, unlike distributed computing systems.

The shrinking sequence of the representatives will be:

N, \frac{N}{2}, \frac{N}{4}, \frac{N}{8}, \frac{N}{16}, \space ... \space , 1

It will take $\log_2N$ steps to count all the audience. This is an extremely powerful way of counting. Here’s a more admirable analogy.

If we look at the population right now, which is roughly $7.5$ billion to be approximately $7,524,594,281$ , this technique will take $\log_2 (7524594281)$ steps, which are approximately a total of $33$ iterations.

That is the power of parallelism.

Exercise: Counting the number of atoms in the universe

Imagine a hypothetical scenario where we want to count the number of total atoms in the universe. We have the leverage of our artistic imagination, such that each atom is a small computer that can perform addition.

We know that the number of atoms in the universe are $\approx 10^{82}$ . Using the distribution strategies discussed above, how many steps will it take to calculate the exact count of the total atoms?

Getting Started

Computing in the Real World

Introduction to C++ (Table Printing)

GNU Debugger Tutorial

Memory, Operations and Control Structure

Divide, Conquer, and Rule

Iteration, Reusability and Extensibility

Introduction to Arrays

Problem Solving With Arrays

Multi-dimensional Arrays

Sliding Window

Conclusion