Branch prediction is a technique to predict the outcome of a conditional operation. It's an essential part of modern CPUs. They contain a dedicated hardware unit called a branch prediction unit (BPU). BPU can predict which branch will be chosen with high confidence given the current encoded instruction.
Modern CPUs implement pipelining to ensure that no cycle goes wasted. Pipeline ensures that no portion of the CPU is sitting idle waiting for the information to be fetched or decoded, so the next instruction is fetched. In contrast, the current instruction is being decoded.
However, when a branch instruction comes to the CPU, it's not sure which instruction to fetch next as that depends on the current instruction’s result. This leads to cycles being wasted as the CPU waits for the next instruction to be fetched once the current instruction finishes its execution. To counter this problem, BPU predicts the branch, and the CPU fetches it while the current instruction is being decoded. This can lead to one of the two situations as follows:
Correct prediction: This will result in no wastage of cycles resulting in faster execution.
Incorrect prediction: This will cause the CPU to fetch the correct instruction and discard the already-fetched instruction, leading to the wastage of cycles.
Accurate prediction techniques need to be used to increase the efficiency of the CPUs. Branch prediction techniques can be divided into two main categories given as follows:
Static branch prediction
Dynamic branch prediction
The static branch prediction is the most straightforward branch prediction technique as it always predicts the same whenever it sees a specific branch. The possible decisions of a static branch predictor are as follows:
A branch is taken.
A branch is not taken.
The forward branch (if-else) is not taken, and the backward branch (loop) is taken.
These decisions remain the same and do not rely on the branch’s history.
To understand it further, we consider a simple example where we populate an array of size
Note: On each iteration of the for loop, the branch prediction also takes place, but for simplicity, we ignore that in the example above.
We can see that predictions by the static branch prediction technique resulted in the correct branch half the time and wrong in the other half, making the accuracy
Dynamic branch predictors rely on the past results of the conditional operation to predict whether to take the branch or not. There are many dynamic branch predicting techniques. However, the most popular ones are as follows:
1-bit branch prediction
2-bit branch prediction
The 1-bit dynamic branch prediction technique relies on the most recent result of the conditional operator to predict the next branch. To understand this further, we consider the following example:
We can see that the 1-bit dynamic branch predictor predicts the correct branch
However, the data in this example suits the 1-bit dynamic predictor, so this accuracy can decrease or increase depending upon the data. In most cases, a 1-bit dynamic predictor will still provide better accuracy than the static predictor.
The 2-bit predictor is another type of dynamic branch predictor. This considers the last two outputs of the conditional operator. It changes its output when its prediction is false twice. The 2-bit predictor has a 4-state Finite Automata (FA) compared to the 2-state FA of the 1-bit dynamic branch predictor.
This FA is complicated compared to the one we saw in the example above. Although the basics remain the same, if the conditional operator runs for the first time, the initial prediction is randomly generated. After that, the states are changed according to the latest output of the operator.
The question remains how will the predictions be made from the states of the FA shown above? Well, to understand, we have to look at the table below:
State | Value |
Strongly likely taken | 11 |
Likely taken | 10 |
Likely not taken | 01 |
Strongly not likely taken | 01 |
The table above shows the states in
The 2-bit predictor achieves better accuracy with nested loops than the 1-bit predictor. We consider another example to understand the working of a 2-bit dynamic branch predictor. In this example, we run the outer loop
The above example shows that in the outer loop's first iteration, the 2-bit predictor predicted false
a | 1st iteration | 2nd iteration | 3rd iteration | 100th iteration |
0 | false | true | true | true |
1 | false | true | true | true |
2 | true | true | true | true |
3 | true | true | true | true |
4 | true | true | true | true |
5 | true | true | true | true |
6 | true | true | true | true |
7 | true | true | true | true |
8 | true | true | true | true |
9 | true | true | true | true |
10 | false | false | false | false |
We can see a trend in the above table. Using this, we can calculate the accuracy of the 2-bit predictor as shown below:
This shows a significant improvement over the 1-bit predictor with an accuracy of
In this example, we'll write the code of the first example.
Note: The loop on line 35 increases the execution time to be easily readable.
#include <algorithm>#include <ctime>#include <iostream>void SortData(int data[], int size){for(int i = 0; i < size;i++){for(int j = 0; j < size;j++){if(data[j] > data[i]){int temp = data[j];data[j] = data[i];data[i] = temp;}}}}int main(){int size = 10;int data[size]; // Declare an array of size 10for (int i = 0; i < size; i++){data[i] = std::rand() % 100;}clock_t start = clock();// Sort the dataSortData(data, size); // Feel free to comment itfor(int i = 0; i < 100000000 ; i++){for (int j = 0; j < size; j++){if (data[j] >= 50){}}}double elapsedTime = static_cast<double>(clock()-start) / CLOCKS_PER_SEC;std::cout << elapsedTime << '\n';}
The explanation of the code is as follows:
Line 28: We add random data between and including data
.
Line 33: The SortData()
is called to sort the data in ascending order.
Line 39: The branching condition is implemented.
The time complexity is
To understand the discrepancy in the run time, we must understand the problem more clearly.
The sorted array increases the accuracy of the branch predictor to
Note: Run the code after commenting the
SortData
function call to see the execution time of unsorted code and then compare.
Free Resources