Count Positive Elements in Array using C++ SIMD

Count Positive Elements in Array using C++ SIMD

Counting positive elements in an array is a common task, especially in fields like signal processing and data analysis, where you need to filter or transform data based on conditions. A basic scalar approach iterates through each element in the array, checking if it's greater than zero. However, SIMD can process multiple elements simultaneously, resulting in a faster implementation for large arrays.

The scalar version:

#include <iostream>
#include <vector>

size_t countPositive(const float *data, const size_t n) {
    size_t count = 0;
    for (size_t i = 0; i < n; ++i) {
        if (data[i] > 0.0f) {
            ++count;
        }
    }

    return count;
}

int main() {
    std::vector<float> a = {
        -2.1, 0, 4.7, 9.8, -7.2, 0, 3.3, 0, 2.1,
        15, 0, 8.2, -8.3, 0, -4.2, 6.1, 9.9, 0,
    };

    auto value = countPositive(a.data(), a.size());
    std::cout << value;

    return 0;
}

In this code, each element of the array is checked in a loop to see if it's greater than zero. The counter is incremented if the condition is met. The code outputs 8.

Here's the optimized implementation using AVX2:

#include <immintrin.h>

size_t countPositive(const float *data, const size_t n) {
    size_t count = 0;
    __m256 zero = _mm256_setzero_ps();

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 cmp = _mm256_cmp_ps(vdata, zero, _CMP_GT_OQ);
        size_t mask = _mm256_movemask_ps(cmp);
        count += _mm_popcnt_u32(mask);
    }

    for (; i < n; ++i) {
        if (data[i] > 0.0f) {
            ++count;
        }
    }

    return count;
}

Explanation of AVX2 instructions:

  • _mm256_setzero_ps creates a vector of eight zeros, which we use to compare against array elements.
  • _mm256_loadu_ps loads eight floating-point elements from the array.
  • _mm256_cmp_ps compares each element in the loaded vector with zero, returning a result to reflect positive elements.
  • _mm256_movemask_ps converts the comparison result into a bit mask, where each bit indicates whether the corresponding element is positive.
  • _mm_popcnt_u32 counts the number of set bits in the mask, which tells us how many of the eight elements are positive.

Any remaining elements are processed in a final fallback loop.

Leave a Comment

Cancel reply

Your email address will not be published.