Algorithmic Efficiency: Guessing Game

Description

Let us play a guessing game where one player thinks of a random number within a range, and the rest of the players try to guess that number.

We will try to simulate this game but with a slightly modified objective which is to develop an algorithmA set of rules that must be followed when solving a particular problem (ref. Oxford Dictionary) that reduces the number of guesses to a minimum number of tries.

Algorithm development

Imagine a classroom where a teacher asks the students to guess a number within the range of 11 to 10001000. The students will bombard the teacher with random tries. Someone may ask, Is it 1? Or 21? Or 65? Or 330? Or 500? Or maybe 999? Or even 1000? Whatever the number really is, it will be extremely hard to guess right. In the worst case, students may have to make 10001000 tries to guess that number.

Now, the question is, can we guess this number in fewer tries? What if the teacher allows True/False questions too?

If someone asks, “Is it a prime number?” and the teacher replies that it is not a prime number. Then this attempt will reduce the number of possibilities to 832832, as the number of primes within the range of 11 to 10001000 is 168168.

What if someone asks “Is the number even or odd?” and the teacher replies that it is an even number. Now this question cuts down 500500 tries (odd numbers) at once and leaves only 500500 even numbers to guess from.

Someone may think of dividing the problem space into intervals of say 100s. For example, a student asks, “Is the number less than a 100?” If the teacher replies yes, we can guess the number in only a hundred tries (asking about each number from 1–99). But in case the teacher replies no, a student will probably ask if the number is in the next interval, such as less than 200 but greater than 100. Now, if the teacher replies yes, then again we’ll be able to guess the number in 101101 tries: 2 for determining the interval and the rest for trying every possibility within that interval.

If the number is between 900900 to 10001000, this strategy of searching by interval will take 109109 tries to guess the number which is much better than our previous naive strategies.

Now that we understand the questions, let’s jump into the classroom.

The classroom

So far, we have found that the most efficient way is to identify the interval first, and then within the range of 100100s, we try every possibility. There are two points to consider:

  1. Trying every possibility leads to too many tries. Can we shrink the interval size further and repeat the same procedure?
  2. In our interval search strategy, our question answers (“yes/no”) strategy was creating unequal distributions. Like in our first question: Does the number lie within the range 11001-100, “no” was yielding a remaining range of 900900 tries, and “yes” was yielding 100 tries.

In a worst-case scenario, it is possible that the desired number always lies in the bigger interval range consuming more tries.

Can we do better?

Yes, we can. What if we first divide the number (in general it should be the range size1000 in our case. So it would be 1000/2 = 500) by 22 and then ask if the number is in the first half or in the second half (by asking something like this: Is the number less than or equal to the middle number?). After that we can again divide it into ranges of 100100 numbers at a time and, in the worst case, we can solve it in 105105 steps. Now, we have a systematic way to solve a problem: divide the number in different parts and then solve it.

Can we still do better?

Yes, we can repeat the step of dividing the range into two parts and select the one for which the teacher nods yes. For example, we first take the middle of the range and ask the teacher if the number is in the first or second half. Whatever the teacher answers, select that range. Let’s suppose that the number is in the second half; now take the middle of the 2nd half and ask the teacher again by taking the middle and repeating the same procedure until we find the number (keep shrinking the range until the range becomes 11 number and that is our target).

Example

Let’s consider that the target number is 571571 and the range is again 110001-1000. First, we divide 10001000 by 22, which is 500500, and ask if this number is greater than 500500 or not, or if it is in the first half or second half? As 571571 is in the second half we’ll find the middle of the second half ((by adding the two extremes and dividing by 2: 501+10002=750\frac{501+1000}{2}=750 )). Then, the question is whether it is greater than 750750 or not? As it is not, we’ll divide this half again by 22 and again ask if this number is greater than 625625 or not and so on. Here is the table of the questions and the halving range with respect to that the guessed range is shrinking slowly.

Step Range Guess Answer Shrunk Range
1 1–1000 <501 No 501-1000
2 501–1000 <751 Yes 501-750
3 501–750 <626 Yes 501-625
4 501–625 <562 No 562-625
5 562–625 <594 Yes 562-593
6 562–593 <578 Yes 562-577
7 562–577 <569 No 569-577
8 569–577 >573 No 569-573
9 569–573 >570 Yes 571-572
10 571–572 >571 No 571 (answer)

In computer science, we call this halving strategy of finding a number within a range as binary searching.

Let us write down the step-by-step algorithm, in pseudocodea technique used to describe the steps of an algorithm or a coding logic in plain English.

Algorithm

if lowlow is the starting element, and highhigh is the ending element of that range, then midmid will be low+high2\bm{\frac{low+high}{2}}.

Step 1: Initialize low=1,high=maximum_size,mid=(low+high)/2low = 1, high = maximum\_size, mid = (low + high) / 2

Step 2: Repeat the whole process until mid=targetmid = target
    \space \space \space \space Step 2.1: If mid<targetmid < target, change highhigh to mid1mid-1
    \space \space \space \space Step 2.2: If mid>targetmid > target, change lowlow to mid+1mid+1
    \space \space \space \space Step 2.3: Recompute mid=(low+high)/2mid = (low + high) / 2

Step-3: mid holds the target value

Let’s take a look at the following interactive animation for better understanding.

Binary searching
Question 1

If finding 571571 in the range of 110001-1000 took 10 tries through binary search, then how many tries will it take to find a number 571571 for the range 120001-2000.

Show Answer
Question 2

How many steps will it take to find a number 571571 for the range 40004000.

Show Answer

How efficient is the binary search!

Assuming that 1N1-N was the original range, of size NN, in which we needed to find a target, binary search will shrink the range size by following the sequence:

N, N2, N4, N8, ... , 8, 4, 2, 1N, \space \frac{N}{2}, \space \frac{N}{4}, \space \frac{N}{8}, \space ...\space , \space 8, \space 4, \space 2, \space 1

Hence our binary search algorithm will take log2Nlog_2 N steps. For example: In our case where NN is 10001000, binary search will take only log2100010log_2 1000 \approx 10.


Exponentiation vs. logarithm

Exponentiation is a mathematical operation, written as bnb^n, involving two numbers: the base “b” and the exponent “n”. Where n is a positive integer, exponentiation corresponds to repeated multiplication of the base b. That is, bnb^n is the product of multiplying b with itself n times. Similarly, if we start from a number N and we keep dividing that by b until it becomes 11, that quantity is called logbN\log_b N, e.g. log21024\log_2 1024 is 1010 as 2102^{10} = 1024. In the same vein, log28\log_2 8 is 33. Similarly, log327\log_3 27 is 33, if the number N is not a perfect base of b. Then, the result will be an approximation. For example, log328\log_3 28 is 3.0331....3.0331....
Both logarithm and exponentiation are related. A beautiful analogy to understand the two quantities is the as follows. If we start from a quantity 11, and keep multiplying by a constant, (let us say 10) then the quantity will increase in the following fashion

1,10,100,1000,10000,100000,1000000,1, 10, 100, 1000, 10000, 100000, 1000000, …

1,10,102,103,104,105,106,107,1, 10, 10^2, 10^3, 10^4, 10^5, 10^6, 10^7, …

The number of steps, it will take to reach from 11 to 108210^{82} will be log101082=>82log_{10} 10^{82} => 82. If we say something is growing exponentially, it means it is growing very fast.
Just a little fact to remember; the total number of atoms in the known universe is around 108210^{82}. Now, if you start from a quantity of 11 and keep multiplying by 1010 then in only 8282 steps we’ll reach as big a number as the number of atoms in the universe.


Naive approach vs. binary search

To realize how efficient binary searching is, consider that we have a supercomputer that has a processor with a clock speed of 10610^6 GHz (Which means it generate 106×109=101510^{6} \times 10^{9} = 10^{15} clock ticks in one second). Assume it can compute execute 101510^{15} instructions in 1 second (for simplicity assume that in every clock tick it executes one instruction, though in reality a computer takes several thousand clock ticks to execute one instruction).

Now, imagine that our teacher has labeled the entire atoms in the universe with a unique number, hence a range between 1, ... , 10821, \space ... \space,\space 10^{82}, and they have picked a random atom, and we are challenged with finding the label of that atom. Now, this problem is the same as finding a number from a range of 1...10821 ... 10^{82}.

If the computer finds that number using iterative approachchecking each number one by one (by exhaustively trying every possibility, which was our first approach) then it will take 10821015.24.60.601.2×1062days3.2×1059\frac{10^{82}}{10^{15}.24.60.60}\approx1.2 \times 10^{62}\text{days} \approx 3.2\times 10^{59} years. In other words, it will take millions of millions of centuries for our supercomputer to find that number.

On the other hand, if we use the binary search algorithm, only 82×log21027382 \times \log_2 10 \approx 273 steps are required. Hence, our supercomputer will solve it in, 2731015\frac{273}{10^{15}} seconds (less than one-millionth part of a second) by that same computer.

This is the difference in efficiency we talked about.


Why search for an efficient strategy?

A computer scientist’s job is not just to come up with a solution, rather the real challenge is to come up with an Algorithm that is most efficient in terms of space and time. That is the standard every computer scientist aspires to achieve.

We’ll discover in this journey several computational problem-solving algorithms and learn strategies through writing codes in C++.


Exercise (Covid-101)

Consider the following scenario: we have an infinite population of a unicellular organism called Dinos (N Dinos) living in a single-dimensional world in a single row.

Accidently, the first dino has been infected by a deadly virus named Covid-101. This virus spreads from one dino to the other in a linear order (an un-infected dino can get an infection from the infected neighbor only). It took a while to become aware of this viral infection, and it has already infected a huge number of dinos.

Fortunately, Dr. Hakeem Luqman has made the antidote known as phakkee for that virus. There are some important considerations to make:

  1. The phakkee is very difficult and time consuming to make. Hence, making an infinite number of them and giving it to everyone will not be plausible.

  2. During the time it takes to test one dino, it can infect its neighbor.

  3. The infected dino can not be saved.

  4. If a dino is vaccinated the virus cannot infect it. Hence, infection will not proceed through a vaccinated dino (who is yet to be attacked by the virus). Therefore, a vaccinated dino is a guarantee to safeguard the rest of the population to its right.

We need to save the maximum number of dinos from Covid-101. How do we design an efficient algorithm that can save as many dinos as possible?