Finding the maximum element in an array is a common operation in many applications, including graphics, data analysis, and machine learning. While a straightforward scalar approach can effectively handle small datasets, it becomes inefficient for larger arrays due to its sequential nature. SIMD allows us to perform operations on multiple data elements simultaneously, greatly enhancing performance.
Here's a simple scalar implementation:
#include <iostream>
#include <vector>
float max(const float *data, const size_t n) {
float value = data[0];
for (size_t i = 1; i < n; ++i) {
if (data[i] > value) {
value = data[i];
}
}
return value;
}
int main() {
std::vector<float> a = {
-2.1, -3.5, 4.7, 9.8, -7.2, 0, 3.3, -1.9, 2.1,
15, 1.4, 8.2, -8.3, -5.5, -4.2, 6.1, 9.9, -2.8,
};
auto value = max(a.data(), a.size());
std::cout << value;
return 0;
}
In this code, we iterate through each element to find the maximum value. The output is 15.
This approach works fine for small datasets but can be slow when dealing with large arrays due to the linear iteration. The AVX2 version:
#include <immintrin.h>
float max(const float *data, const size_t n) {
__m256 vmax = _mm256_set1_ps(data[0]);
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256 vdata = _mm256_loadu_ps(&data[i]);
vmax = _mm256_max_ps(vmax, vdata);
}
float maxArray[8];
_mm256_storeu_ps(maxArray, vmax);
float value = data[0];
for (size_t j = 1; j < 8; ++j) {
if (maxArray[j] > value) {
value = maxArray[j];
}
}
for (; i < n; ++i) {
if (data[i] > value) {
value = data[i];
}
}
return value;
}
Explanation of AVX2 code:
_mm256_set1_ps
initializes the maximum value with the first element of the data array._mm256_loadu_ps
loads 8 elements at a time from the data array._mm256_max_ps
compares the current maximum values invmax
with the newly loaded data invdata
, updatingvmax
with the larger values.
After processing all the elements in batches, we store the maximum values from vmax
into a temporary array, for further comparison. The remaining elements are processed in the final loop.
Leave a Comment
Cancel reply