Apply Sign Function to Array Elements using C++ SIMD

Apply Sign Function to Array Elements using C++ SIMD

In many applications, such as signal processing, machine learning, and data analysis, it's common to apply a sign function to array elements. This function evaluates each element, returning 1 if positive, -1 if negative, and 0 if zero. While a scalar approach works for smaller datasets, SIMD allows us to perform this operation on larger arrays much faster by processing multiple values simultaneously.

Scalar implementation:

#include <iostream> #include <vector> void sign(float *data, const size_t n) { for (size_t i = 0; i < n; ++i) { if (data[i] > 0.0f) { data[i] = 1.0f; } else if (data[i] < 0.0f) { data[i] = -1.0f; } else { data[i] = 0.0f; } } } int main() { std::vector<float> a = { -2.1, -3.5, 4.7, 9.8, -7.2, 0, 3.3, -1.9, 2.1, 15, 1.4, 8.2, -8.3, -5.5, -4.2, 6.1, 9.9, -2.8, }; sign(a.data(), a.size()); for (auto value: a) { std::cout << value << " "; } return 0; }

This code loops through each array element, compares its value, and sets it to one, negative one, or zero accordingly. Output:

-1 -1 1 1 -1 0 1 -1 1 1 1 1 -1 -1 -1 1 1 -1

This approach is straightforward and works well for smaller arrays but can become slow for large datasets, as it processes one element at a time sequentially.

Code optimization with AVX2:

#include <immintrin.h> void sign(float *data, const size_t n) { __m256 zero = _mm256_set1_ps(0.0f); __m256 one = _mm256_set1_ps(1.0f); __m256 negOne = _mm256_set1_ps(-1.0f); size_t i = 0; for (; i + 8 <= n; i += 8) { __m256 vdata = _mm256_loadu_ps(&data[i]); __m256 posMask = _mm256_cmp_ps(vdata, zero, _CMP_GT_OQ); __m256 negMask = _mm256_cmp_ps(vdata, zero, _CMP_LT_OQ); __m256 pos = _mm256_and_ps(posMask, one); __m256 neg = _mm256_and_ps(negMask, negOne); __m256 vresult = _mm256_or_ps(pos, neg); _mm256_storeu_ps(&data[i], vresult); } for (; i < n; ++i) { if (data[i] > 0.0f) { data[i] = 1.0f; } else if (data[i] < 0.0f) { data[i] = -1.0f; } else { data[i] = 0.0f; } } }

Explanation of key AVX2 instructions:

  • _mm256_loadu_ps loads 8 float values from the array.
  • _mm256_cmp_ps compares elements to 0 to create a mask for values that are positive or negative.
  • _mm256_and_ps applies 1 to elements where the positive mask is true and -1 where the negative mask is true.
  • _mm256_or_ps combines the positive and negative results.
  • _mm256_storeu_ps stores the result back to the array.

Leave a Comment

Cancel reply

Your email address will not be published.