Negate Array Elements using C++ SIMD

October 30, 2024
C++
0 Comments
111 Views

Negating all elements in an array is a common operation in applications such as signal processing, image manipulation, and scientific computations. For small datasets, a scalar approach is adequate. However, this approach becomes inefficient as the array size grows, leading to significant processing time. Using SIMD can improve the speed of the negation process.

Here's a basic scalar implementation:

#include <iostream>
#include <vector>

void negate(float *data, const size_t n) {
    for (size_t i = 0; i < n; ++i) {
        data[i] = -data[i];
    }
}

int main() {
    std::vector<float> a = {
        -1.5, 0, 1, -1.5, 2, -2.5, 3, -3.5, 4, -5, 6, -7, 8, -9, 10, -11, 12, -13,
    };

    negate(a.data(), a.size());
    for (auto value: a) {
        std::cout << value << " ";
    }

    return 0;
}

This code negates each element in an array. Output:

1.5 -0 -1 1.5 -2 2.5 -3 3.5 -4 5 -6 7 -8 9 -10 11 -12 13

This implementation works well for small arrays but can be slow for larger ones.

Here's the optimized implementation using AVX2:

#include <immintrin.h>

void negate(float *data, const size_t n) {
    __m256 signMask = _mm256_set1_ps(-0.0f);

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 vneg = _mm256_xor_ps(signMask, vdata);
        _mm256_storeu_ps(&data[i], vneg);
    }

    for (; i < n; ++i) {
        data[i] = -data[i];
    }
}

In this AVX2 implementation:

_mm256_loadu_ps loads 8 floating-point numbers from the array.
_mm256_xor_ps negates each element by applying XOR operation with sign mask.
_mm256_storeu_ps stores negated results back into array.

Any remaining elements are processed in the final loop.

Related

Leave a Comment