Apply Sign Function to Array Elements using C++ SIMD

Apply Sign Function to Array Elements using C++ SIMD

In many applications, such as signal processing, machine learning, and data analysis, it's common to apply a sign function to array elements. This function evaluates each element, returning 1 if positive, -1 if negative, and 0 if zero. While a scalar approach works for smaller datasets, SIMD allows us to perform this operation on larger arrays much faster by processing multiple values simultaneously.

Scalar implementation:

#include <iostream>
#include <vector>

void sign(float *data, const size_t n) {
    for (size_t i = 0; i < n; ++i) {
        if (data[i] > 0.0f) {
            data[i] = 1.0f;
        } else if (data[i] < 0.0f) {
            data[i] = -1.0f;
        } else {
            data[i] = 0.0f;
        }
    }
}

int main() {
    std::vector<float> a = {
        -2.1, -3.5, 4.7, 9.8, -7.2, 0, 3.3, -1.9, 2.1,
        15, 1.4, 8.2, -8.3, -5.5, -4.2, 6.1, 9.9, -2.8,
    };

    sign(a.data(), a.size());
    for (auto value: a) {
        std::cout << value << " ";
    }

    return 0;
}

This code loops through each array element, compares its value, and sets it to one, negative one, or zero accordingly. Output:

-1 -1 1 1 -1 0 1 -1 1 1 1 1 -1 -1 -1 1 1 -1

This approach is straightforward and works well for smaller arrays but can become slow for large datasets, as it processes one element at a time sequentially.

Code optimization with AVX2:

#include <immintrin.h>

void sign(float *data, const size_t n) {
    __m256 zero = _mm256_set1_ps(0.0f);
    __m256 one = _mm256_set1_ps(1.0f);
    __m256 negOne = _mm256_set1_ps(-1.0f);

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 posMask = _mm256_cmp_ps(vdata, zero, _CMP_GT_OQ);
        __m256 negMask = _mm256_cmp_ps(vdata, zero, _CMP_LT_OQ);
        __m256 pos = _mm256_and_ps(posMask, one);
        __m256 neg = _mm256_and_ps(negMask, negOne);
        __m256 vresult = _mm256_or_ps(pos, neg);
        _mm256_storeu_ps(&data[i], vresult);
    }

    for (; i < n; ++i) {
        if (data[i] > 0.0f) {
            data[i] = 1.0f;
        } else if (data[i] < 0.0f) {
            data[i] = -1.0f;
        } else {
            data[i] = 0.0f;
        }
    }
}

Explanation of key AVX2 instructions:

  • _mm256_loadu_ps loads 8 float values from the array.
  • _mm256_cmp_ps compares elements to 0 to create a mask for values that are positive or negative.
  • _mm256_and_ps applies 1 to elements where the positive mask is true and -1 where the negative mask is true.
  • _mm256_or_ps combines the positive and negative results.
  • _mm256_storeu_ps stores the result back to the array.

Leave a Comment

Cancel reply

Your email address will not be published.