What is branch prediction?

Branch prediction is a technique to predict the outcome of a conditional operation. It's an essential part of modern CPUs. They contain a dedicated hardware unit called a branch prediction unit (BPU). BPU can predict which branch will be chosen with high confidence given the current encoded instruction.

Why is branch prediction needed?

Modern CPUs implement pipelining to ensure that no cycle goes wasted. Pipeline ensures that no portion of the CPU is sitting idle waiting for the information to be fetched or decoded, so the next instruction is fetched. In contrast, the current instruction is being decoded.

However, when a branch instruction comes to the CPU, it's not sure which instruction to fetch next as that depends on the current instruction’s result. This leads to cycles being wasted as the CPU waits for the next instruction to be fetched once the current instruction finishes its execution. To counter this problem, BPU predicts the branch, and the CPU fetches it while the current instruction is being decoded. This can lead to one of the two situations as follows:

Correct prediction: This will result in no wastage of cycles resulting in faster execution.
Incorrect prediction: This will cause the CPU to fetch the correct instruction and discard the already-fetched instruction, leading to the wastage of cycles.

Accurate prediction techniques need to be used to increase the efficiency of the CPUs. Branch prediction techniques can be divided into two main categories given as follows:

Static branch prediction
Dynamic branch prediction

Static branch prediction

The static branch prediction is the most straightforward branch prediction technique as it always predicts the same whenever it sees a specific branch. The possible decisions of a static branch predictor are as follows:

A branch is taken.
A branch is not taken.
The forward branch (if-else) is not taken, and the backward branch (loop) is taken.

These decisions remain the same and do not rely on the branch’s history.

Example

To understand it further, we consider a simple example where we populate an array of size $10$ with values between $0$ and $99$ inclusive. Then we sort the array and then traverse through the array to see if an index contains a value greater than equal to $50$ . The working of static branch prediction is shown below:

Note: On each iteration of the for loop, the branch prediction also takes place, but for simplicity, we ignore that in the example above.

We can see that predictions by the static branch prediction technique resulted in the correct branch half the time and wrong in the other half, making the accuracy $50\%$ , which is decent. Still, modern CPUs can not rely upon static branch prediction.

Dynamic branch prediction

Dynamic branch predictors rely on the past results of the conditional operation to predict whether to take the branch or not. There are many dynamic branch predicting techniques. However, the most popular ones are as follows:

1-bit branch prediction
2-bit branch prediction

The 1-bit prediction

The 1-bit dynamic branch prediction technique relies on the most recent result of the conditional operator to predict the next branch. To understand this further, we consider the following example:

We can see that the 1-bit dynamic branch predictor predicts the correct branch $8$ out of $10$ times which is an accuracy of $80\%$ . This represents a very good improvement over the static predictor, which predicts with a $50\%$ accuracy.

However, the data in this example suits the 1-bit dynamic predictor, so this accuracy can decrease or increase depending upon the data. In most cases, a 1-bit dynamic predictor will still provide better accuracy than the static predictor.

The 2-bit prediction

The 2-bit predictor is another type of dynamic branch predictor. This considers the last two outputs of the conditional operator. It changes its output when its prediction is false twice. The 2-bit predictor has a 4-state Finite Automata (FA) compared to the 2-state FA of the 1-bit dynamic branch predictor.

The table above shows the states in $2$ bits. These bits help us predict the branch. The most significant bit (left bit) is used to predict whether the branch will be taken or not. If the bit is $1$ , it indicates that the branch will be taken, else not.

The 2-bit predictor achieves better accuracy with nested loops than the 1-bit predictor. We consider another example to understand the working of a 2-bit dynamic branch predictor. In this example, we run the outer loop $100$ times, and the inner loop runs till the a is less than $10$ and is reset to $0$ in the outer loop.

#include <algorithm>
#include <ctime>
#include <iostream>
void SortData(int data[], int size)
{
    for(int i = 0; i < size;i++)
    {
        for(int j = 0; j < size;j++)
        {
            if(data[j] > data[i])
            {
                int temp = data[j];
                data[j] = data[i];
                data[i] = temp;
            }
        }
    }
}
int main()
{
    int size = 10;
    int data[size]; // Declare an array of size 10
    for (int i = 0; i < size; i++)
    {
        data[i] = std::rand() % 100;
    }
    clock_t start = clock();
    // Sort the data
    SortData(data, size); // Feel free to comment it
    for(int i = 0; i < 100000000 ; i++)
    {
        for (int j = 0; j < size; j++)
        {   
            if (data[j] >= 50){}
        }
    }
    double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;
    std::cout << elapsedTime << '\n';
}

Explanation

The explanation of the code is as follows:

Line 28: We add random data between and including $0$ and $99$ to the data.
Line 33: The SortData() is called to sort the data in ascending order.
Line 39: The branching condition is implemented.

Time complexity

The time complexity is $O(n^2)$ because of the sort function. However, if we do not sort the data, the time complexity reduces $O(n)$ . This would suggest that the code will execute faster without sorting, but in reality, the opposite happens. The code with higher time complexity executes faster. The next question that comes to mind is how and why. Well, this is because of branch prediction.

Analysis

To understand the discrepancy in the run time, we must understand the problem more clearly.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

State	Value
Strongly likely taken	11
Likely taken	10
Likely not taken	01
Strongly not likely taken	01

a	1^st iteration	2^nd iteration	3^rd iteration	100^th iteration
0	false	true	true	true
1	false	true	true	true
2	true	true	true	true
3	true	true	true	true
4	true	true	true	true
5	true	true	true	true
6	true	true	true	true
7	true	true	true	true
8	true	true	true	true
9	true	true	true	true
10	false	false	false	false