Count Negative Elements in Array using C++ SIMD

November 5, 2024
C++
0 Comments
120 Views

Counting negative elements in an array is a common task that can be optimized with SIMD operations. Using SIMD allows us to process multiple elements simultaneously, enhancing performance, especially for large datasets.

The scalar implementation:

#include <iostream>
#include <vector>

size_t countNegative(const float *data, const size_t n) {
    size_t count = 0;
    for (size_t i = 0; i < n; ++i) {
        if (data[i] < 0.0f) {
            ++count;
        }
    }

    return count;
}

int main() {
    std::vector<float> a = {
        -2.1, 0, 4.7, 9.8, -7.2, 0, 3.3, 0, 2.1,
        15, 0, 8.2, -8.3, 0, -4.2, 6.1, 9.9, 0,
    };

    auto value = countNegative(a.data(), a.size());
    std::cout << value;

    return 0;
}

In the code, we iterate through each element in the array, checking if it is negative and incrementing a counter if true. The code outputs 4.

This approach works well for small datasets, but may become inefficient for larger arrays due to its sequential nature.

The AVX2 implementation:

#include <immintrin.h>

size_t countNegative(const float *data, const size_t n) {
    size_t count = 0;
    __m256 zero = _mm256_setzero_ps();

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 cmp = _mm256_cmp_ps(vdata, zero, _CMP_LT_OQ);
        size_t mask = _mm256_movemask_ps(cmp);
        count += _mm_popcnt_u32(mask);
    }

    for (; i < n; ++i) {
        if (data[i] < 0.0f) {
            ++count;
        }
    }

    return count;
}

Explanation of AVX2 instructions:

_mm256_setzero_ps creates a vector of eight zero values which will be used for comparison.
_mm256_loadu_ps loads eight float values from the array.
_mm256_cmp_ps compares each element with zero, returning a result to reflect negative elements.
_mm256_movemask_ps converts the comparison result to a bit mask, where each bit indicates whether the corresponding element is negative.
_mm_popcnt_u32 counts the number of set bits in the mask, which tells us how many of the eight elements are negative.

Any remaining elements are handled in a final loop.

Related

Leave a Comment