Count Negative Elements in Array using C++ SIMD

Count Negative Elements in Array using C++ SIMD

Counting negative elements in an array is a common task that can be optimized with SIMD operations. Using SIMD allows us to process multiple elements simultaneously, enhancing performance, especially for large datasets.

The scalar implementation:

#include <iostream> #include <vector> size_t countNegative(const float *data, const size_t n) { size_t count = 0; for (size_t i = 0; i < n; ++i) { if (data[i] < 0.0f) { ++count; } } return count; } int main() { std::vector<float> a = { -2.1, 0, 4.7, 9.8, -7.2, 0, 3.3, 0, 2.1, 15, 0, 8.2, -8.3, 0, -4.2, 6.1, 9.9, 0, }; auto value = countNegative(a.data(), a.size()); std::cout << value; return 0; }

In the code, we iterate through each element in the array, checking if it is negative and incrementing a counter if true. The code outputs 4.

This approach works well for small datasets, but may become inefficient for larger arrays due to its sequential nature.

The AVX2 implementation:

#include <immintrin.h> size_t countNegative(const float *data, const size_t n) { size_t count = 0; __m256 zero = _mm256_setzero_ps(); size_t i = 0; for (; i + 8 <= n; i += 8) { __m256 vdata = _mm256_loadu_ps(&data[i]); __m256 cmp = _mm256_cmp_ps(vdata, zero, _CMP_LT_OQ); size_t mask = _mm256_movemask_ps(cmp); count += _mm_popcnt_u32(mask); } for (; i < n; ++i) { if (data[i] < 0.0f) { ++count; } } return count; }

Explanation of AVX2 instructions:

  • _mm256_setzero_ps creates a vector of eight zero values which will be used for comparison.
  • _mm256_loadu_ps loads eight float values from the array.
  • _mm256_cmp_ps compares each element with zero, returning a result to reflect negative elements.
  • _mm256_movemask_ps converts the comparison result to a bit mask, where each bit indicates whether the corresponding element is negative.
  • _mm_popcnt_u32 counts the number of set bits in the mask, which tells us how many of the eight elements are negative.

Any remaining elements are handled in a final loop.

Leave a Comment

Cancel reply

Your email address will not be published.