Apply Binary Threshold for Array Elements using C++ SIMD

Apply Binary Threshold for Array Elements using C++ SIMD

Applying a binary threshold to an array, where each element is set to 1 if it exceeds a threshold and 0 otherwise, is a common operation in image processing, computer vision, and data filtering tasks. A basic scalar approach can handle this task but becomes inefficient for larger datasets. SIMD can significantly enhance performance by processing multiple elements simultaneously.

Here's the basic implementation:

#include <iostream>
#include <vector>

void binaryThreshold(float *data, const size_t n, const float thresh) {
    for (size_t i = 0; i < n; ++i) {
        data[i] = data[i] > thresh ? 1.0f : 0.0f;
    }
}

int main() {
    std::vector<float> a = {
        2, 4, 18, 3, 17, 7, 14, 11, 8, 6, 15, 12, 9, 10, 16, 5, 1, 13,
    };

    binaryThreshold(a.data(), a.size(), 9);
    for (auto value: a) {
        std::cout << value << " ";
    }

    return 0;
}

This approach iterates through each element in the array, checks if it exceeds the threshold, and updates the value accordingly. Output:

0 0 1 0 1 0 1 1 0 0 1 1 0 1 1 0 0 1

However, this sequential approach quickly becomes inefficient for large arrays, which is where SIMD offers an advantage.

Here's the optimized implementation using AVX2:

#include <immintrin.h>

void binaryThreshold(float *data, const size_t n, const float thresh) {
    __m256 vthresh = _mm256_set1_ps(thresh);
    __m256 one = _mm256_set1_ps(1.0f);
    __m256 zero = _mm256_set1_ps(0.0f);

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 mask = _mm256_cmp_ps(vdata, vthresh, _CMP_GT_OQ);
        __m256 vresult = _mm256_blendv_ps(zero, one, mask);
        _mm256_storeu_ps(&data[i], vresult);
    }

    for (; i < n; ++i) {
        data[i] = data[i] > thresh ? 1.0f : 0.0f;
    }
}

Here's a breakdown of each AVX2 instruction:

  • _mm256_set1_ps initializes an AVX2 register with 8 identical float values.
  • _mm256_loadu_ps loads 8 floating-point values from array.
  • _mm256_cmp_ps compares each element in array with a threshold value and sets a mask where each element in array is greater than threshold value.
  • _mm256_blendv_ps selects values from two vectors (zero and one) based on a mask. It's setting elements to 1 if they exceed the threshold or 0 if they do not.
  • _mm256_storeu_ps stores the 8 processed values back into array.

Remaining elements are handled with a scalar loop at the end.

Leave a Comment

Cancel reply

Your email address will not be published.