Calculate Reciprocal of Array Elements using C++ SIMD

October 26, 2024
C++
0 Comments
370 Views

For many applications requiring mathematical operations over large datasets, performance is critical. Calculating the reciprocal of each element in an array is one such task. While a basic scalar implementation can handle small arrays, using SIMD can greatly accelerate the process by allowing simultaneous calculations on multiple elements.

The scalar implementation is straightforward:

#include <iostream>
#include <vector>

void reciprocal(float *data, const size_t n) {
    for (size_t i = 0; i < n; ++i) {
        data[i] = 1.0f / data[i];
    }
}

int main() {
    std::vector<float> a = {
        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
    };

    reciprocal(a.data(), a.size());
    for (auto value: a) {
        std::cout << value << " ";
    }

    return 0;
}

This version iterates through the array and computes each element’s reciprocal individually. While simple, it can be slow for large arrays. A part of the output:

1 0.5 0.333333 ... 0.0625 0.0588235 0.0555556

Here's how to perform the same operation using AVX2:

#include <immintrin.h>

void reciprocal(float *data, const size_t n) {
    __m256 one = _mm256_set1_ps(1.0f);

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 vresult = _mm256_div_ps(one, vdata);
        _mm256_storeu_ps(&data[i], vresult);
    }

    for (; i < n; ++i) {
        data[i] = 1.0f / data[i];
    }
}

Explanation of AVX2 code:

_mm256_set1_ps creates a vector where each element is 1.0, which will be divided by elements in the input array.
_mm256_loadu_ps loads 8 float values from the array.
_mm256_div_ps computes the reciprocal of each element simultaneously.
_mm256_storeu_ps stores result back into the array.

For cases where the array size isn't a multiple of 8, a scalar loop handles the remaining elements.

Related

Leave a Comment