Set Negative Array Elements to Zero using C++ SIMD

Set Negative Array Elements to Zero using C++ SIMD

In many applications, you might need to set negative elements of an array to zero. This is commonly used in data processing, machine learning, and image processing workflows to remove negative values from datasets. While a standard loop can accomplish this task, using SIMD instructions can significantly boost performance by processing multiple elements at once.

Here's the basic implementation:

#include <iostream>
#include <vector>

void clampToZero(float *data, const size_t n) {
    for (size_t i = 0; i < n; ++i) {
        data[i] = data[i] < 0 ? 0 : data[i];
    }
}

int main() {
    std::vector<float> a = {
        -1.5, 0, 1, -1.5, 2, -2.5, 3, -3.5, 4, -5, 6, -7, 8, -9, 10, -11, 12, -13,
    };

    clampToZero(a.data(), a.size());
    for (auto value: a) {
        std::cout << value << " ";
    }

    return 0;
}

The traditional scalar implementation loops through each element in the array and checks if it is negative. If an element is negative, it sets it to zero. Output:

0 0 1 0 2 0 3 0 4 0 6 0 8 0 10 0 12 0

While this works fine for small arrays, it becomes inefficient for larger datasets as it processes each element one by one.

Here's the optimized implementation using AVX2:

#include <immintrin.h>

void clampToZero(float *data, const size_t n) {
    __m256 zero = _mm256_setzero_ps();

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 vdataZero = _mm256_max_ps(vdata, zero);
        _mm256_storeu_ps(&data[i], vdataZero);
    }

    for (; i < n; ++i) {
        data[i] = data[i] < 0 ? 0 : data[i];
    }
}

Here's how the AVX2 version works:

  • _mm256_setzero_ps creates a vector of zeros.
  • _mm256_loadu_ps loads 8 elements at a time from the input array.
  • _mm256_max_ps compares each element of the vector with zero. It keeps the maximum value for each element, effectively setting any negative values to zero.
  • _mm256_storeu_ps stores the result back to the original array.

The remaining elements are processed individually using the scalar method.

Leave a Comment

Cancel reply

Your email address will not be published.