Find Maximum Element in Array using C++ SIMD

Find Maximum Element in Array using C++ SIMD

Finding the maximum element in an array is a common operation in many applications, including graphics, data analysis, and machine learning. While a straightforward scalar approach can effectively handle small datasets, it becomes inefficient for larger arrays due to its sequential nature. SIMD allows us to perform operations on multiple data elements simultaneously, greatly enhancing performance.

Here's a simple scalar implementation:

#include <iostream>
#include <vector>

float max(const float *data, const size_t n) {
    float value = data[0];
    for (size_t i = 1; i < n; ++i) {
        if (data[i] > value) {
            value = data[i];
        }
    }

    return value;
}

int main() {
    std::vector<float> a = {
        -2.1, -3.5, 4.7, 9.8, -7.2, 0, 3.3, -1.9, 2.1,
        15, 1.4, 8.2, -8.3, -5.5, -4.2, 6.1, 9.9, -2.8,
    };

    auto value = max(a.data(), a.size());
    std::cout << value;

    return 0;
}

In this code, we iterate through each element to find the maximum value. The output is 15.

This approach works fine for small datasets but can be slow when dealing with large arrays due to the linear iteration. The AVX2 version:

#include <immintrin.h>

float max(const float *data, const size_t n) {
    __m256 vmax = _mm256_set1_ps(data[0]);

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        vmax = _mm256_max_ps(vmax, vdata);
    }

    float maxArray[8];
    _mm256_storeu_ps(maxArray, vmax);
    float value = data[0];
    for (size_t j = 1; j < 8; ++j) {
        if (maxArray[j] > value) {
            value = maxArray[j];
        }
    }

    for (; i < n; ++i) {
        if (data[i] > value) {
            value = data[i];
        }
    }

    return value;
}

Explanation of AVX2 code:

  • _mm256_set1_ps initializes the maximum value with the first element of the data array.
  • _mm256_loadu_ps loads 8 elements at a time from the data array.
  • _mm256_max_ps compares the current maximum values in vmax with the newly loaded data in vdata, updating vmax with the larger values.

After processing all the elements in batches, we store the maximum values from vmax into a temporary array, for further comparison. The remaining elements are processed in the final loop.

Leave a Comment

Cancel reply

Your email address will not be published.