What is auto-vectorization?

Auto-vectorization is a technique used to optimize software programs by automatically modifying certain types of code to take advantage of the Single Instruction, Multiple Data (SIMD) unit, a feature of modern processors.

Note: The SIMD unit is a component of the processor (CPU) specifically designed to perform the same operation on multiple data elements simultaneously using a single instruction.

By doing this, the program can run faster and more efficiently, which is essential for applications that require processing large amounts of data and are computationally expensive, such as video games or scientific simulations.

How auto-vectorization works

When we write software programs, we might need to perform the same operation on a large set of data. An example would be if we have multiple lists of numbers, and we need to add all the numbers together. Traditionally, computers would perform this operation one number at a time, which can be slow if we have a lot of numbers to process.

One real-world example would be in image processing where merging two images together pixel by pixel could take a long time if done sequentially. We can achieve merging via the addition of image pixel values. By vectorizing the addition operation, multiple pixel values can be added together simultaneously, significantly reducing the processing time.

Auto-vectorization modifies the code to perform the same operation on multiple data elements simultaneously. The compiler, a software program that translates human-readable code to machine code, performs auto-vectorization during the compilation process, where it analyzes the code and considers the specific hardware and software configuration to determine whether the code can be vectorized and how best to do so. This optimization process involves modifying the code to take advantage of the SIMD unit in the most effective way possible

If the code is deemed suitable for vectorization, the compiler generates machine code that uses instructions to perform the computation. Auto-vectorization occurs during the compilation process, which means it is not visible in real-time during program execution. The code remains in vectorized form until it is executed on the CPU. However, some profiling tools can show whether the compiler has vectorized certain parts of the code, which can help developers understand the impact of vectorization.

Example

Now, let's go ahead and take a look at an example in the C language. We will look at a simplified version of the vectorization process via the following:

Original code

#include <stdio.h>
// define N as the number of elements in array
#define N 4
// add_array function: adds corresponding elements of arrays arr1 and arr2 and stores the result in sum
// add_array function will be vectorized
void add_arrays(float *arr1, float *arr2, float *sum)
{
  // set loop to add elements until N
  for (int i = 0; i < N; i++)
  {
    sum[i] = arr1[i] + arr2[i];
  }  
}
int main()
{
  // initialize arrays arr1 and arr2 with some values
  float arr1[N] = {1.0, 3.0, 3.0, 4.0};
  float arr2[N] = {5.0, 4.0, 5.0, 4.0};
  float sum[N];
  //call the add function
  add_arrays(arr1, arr2, sum);
  
  //print the values in array sum
  for (int i = 0; i < N; i++)
  {
    printf("%.2f + %.2f = %.2f\n", arr1[i], arr2[i], sum[i]);
  }
  return (0);
}

#include <stdio.h>
#include <xmmintrin.h> // includes the SSE instruction set
#define N 4
void add_arrays(float* arr1, float* arr2, float* sum) 
{
  for (int i = 0; i < N; i+=4) 
  {
    // set 128-bit (32-byte) variable data types 
    __m128 a, b, c;
    // load element from memory
    a = _mm_loadu_ps(&arr1[i]);
    b = _mm_loadu_ps(&arr2[i]);
    //add elements
    c = _mm_add_ps(a, b);
    //store from result into sum array
    _mm_storeu_ps(&sum[i], c);
  }
}
int main()
{
    // initialize arrays arr1 and arr2 with some values
    float arr1[N] = {1.0, 3.0, 3.0, 4.0};
    float arr2[N] = {5.0, 4.0, 5.0, 4.0};
    float sum[N];
    //call the add function
    add_arrays(arr1, arr2, sum);
    //set loop to print each element from arrays until N
    for (int i = 0; i < N; i++)
    {
        printf("%.2f + %.2f = %.2f\n", arr1[i], arr2[i], sum[i]);
    }
    return 0;
}

Free Resources

License: Creative Commons-Attribution NonCommercial-ShareAlike 4.0 (CC-BY-NC-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

What is auto-vectorization?

How auto-vectorization works

Example

Original code

Vectorized code

Explanation

Conclusion