Calculate Absolute Value of Array Elements using C++ SIMD

Calculate Absolute Value of Array Elements using C++ SIMD

Calculating the absolute value of array elements is a common operation, especially in applications involving signal processing, scientific computing, or machine learning. While you can compute the absolute value of each element sequentially using standard library functions, utilizing SIMD optimizations can significantly improve performance by processing multiple elements in parallel.

Here's the basic implementation:

#include <iostream>
#include <vector>
#include <cmath>

void abs(float *data, const size_t n) {
    for (size_t i = 0; i < n; ++i) {
        data[i] = std::fabs(data[i]);
    }
}

int main() {
    std::vector<float> a = {
        -1.5, 0, 1, -1.5, 2, -2.5, 3, -3.5, 4, -5, 6, -7, 8, -9, 10, -11, 12, -13,
    };

    abs(a.data(), a.size());
    for (auto value: a) {
        std::cout << value << " ";
    }

    return 0;
}

This approach works by iterating over each element in the array and applying std::fabs to compute the absolute value. Output:

1.5 0 1 1.5 2 2.5 3 3.5 4 5 6 7 8 9 10 11 12 13

However, when dealing with larger arrays, this method may become inefficient because it processes only one element at a time.

Here's the optimized implementation using AVX2:

#include <immintrin.h>

void abs(float *data, const size_t n) {
    __m256 signMask = _mm256_set1_ps(-0.0f);

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        vdata = _mm256_andnot_ps(signMask, vdata);
        _mm256_storeu_ps(&data[i], vdata);
    }

    for (; i < n; ++i) {
        data[i] = std::fabs(data[i]);
    }
}

Here's a breakdown of the SIMD version:

  • _mm256_set1_ps(-0.0f) creates a sign mask. This mask has all bits set to 1, except the sign bit, which is cleared.
  • _mm256_loadu_ps loads 8 floating-point numbers from array.
  • _mm256_andnot_ps clears the sign bit, effectively computing the absolute value of the loaded elements.
  • _mm256_storeu_ps stores results back to the array.

The remaining elements are handled by the scalar std::fabs function in the final loop.

Leave a Comment

Cancel reply

Your email address will not be published.