Set Positive Array Elements to Zero using C++ SIMD

Set Positive Array Elements to Zero using C++ SIMD

In performance-critical applications like game engines, real-time simulations, or high-throughput data processing, handling large arrays efficiently is crucial. One common operation you might encounter is clamping all positive values in an array to zero. While a naive loop works just fine for small data sizes, it doesn't leverage the full power of modern CPUs. SIMD allows a single instruction to operate on multiple data points in parallel, significantly speeding up such operations.

Here's the basic implementation:

#include <iostream>
#include <vector>

void clampToZero(float *data, const size_t n) {
    for (size_t i = 0; i < n; ++i) {
        data[i] = data[i] > 0 ? 0 : data[i];
    }
}

int main() {
    std::vector<float> a = {
        -1.5, 0, 1, -1.5, 2, -2.5, 3, -3.5, 4, -5, 6, -7, 8, -9, 10, -11, 12, -13,
    };

    clampToZero(a.data(), a.size());
    for (auto value: a) {
        std::cout << value << " ";
    }

    return 0;
}

The scalar approach iterates over each element in the array, checking if it's positive - and if so, replaces it with zero. Output:

-1.5 0 0 -1.5 0 -2.5 0 -3.5 0 -5 0 -7 0 -9 0 -11 0 -13

This method performs well on small arrays, it quickly becomes inefficient when applied to larger datasets.

Here's the optimized implementation using AVX2:

#include <immintrin.h>

void clampToZero(float *data, const size_t n) {
    __m256 zero = _mm256_setzero_ps();

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 vdataZero = _mm256_min_ps(vdata, zero);
        _mm256_storeu_ps(&data[i], vdataZero);
    }

    for (; i < n; ++i) {
        data[i] = data[i] > 0 ? 0 : data[i];
    }
}

Here's how the AVX2 version operates:

  • _mm256_setzero_ps initializes a vector filled with zeros.
  • _mm256_loadu_ps reads 8 floating-point values at once from the input array.
  • _mm256_min_ps performs an element-wise comparison between the loaded values and zero, selecting the smaller of the two. This effectively sets all positive values to zero, while keeping non-positives unchanged.
  • _mm256_storeu_ps writes the modified vector back into the original array.

Any leftover elements that don't fit into a full SIMD register are handled with the regular scalar loop.

Leave a Comment

Cancel reply

Your email address will not be published.