Finding the maximum element in an array is a common operation in many applications, including graphics, data analysis, and machine learning. While a straightforward scalar approach can effectively handle small datasets, it becomes inefficient for larger arrays due to its sequential nature. SIMD allows us to perform operations on multiple data elements simultaneously, greatly enhancing performance.

Here's a simple scalar implementation:

```
#include <iostream>
#include <vector>
float max(const float *data, const size_t n) {
float value = data[0];
for (size_t i = 1; i < n; ++i) {
if (data[i] > value) {
value = data[i];
}
}
return value;
}
int main() {
std::vector<float> a = {
-2.1, -3.5, 4.7, 9.8, -7.2, 0, 3.3, -1.9, 2.1,
15, 1.4, 8.2, -8.3, -5.5, -4.2, 6.1, 9.9, -2.8,
};
auto value = max(a.data(), a.size());
std::cout << value;
return 0;
}
```

In this code, we iterate through each element to find the maximum value. The output is 15.

This approach works fine for small datasets but can be slow when dealing with large arrays due to the linear iteration. The AVX2 version:

```
#include <immintrin.h>
float max(const float *data, const size_t n) {
__m256 vmax = _mm256_set1_ps(data[0]);
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256 vdata = _mm256_loadu_ps(&data[i]);
vmax = _mm256_max_ps(vmax, vdata);
}
float maxArray[8];
_mm256_storeu_ps(maxArray, vmax);
float value = data[0];
for (size_t j = 1; j < 8; ++j) {
if (maxArray[j] > value) {
value = maxArray[j];
}
}
for (; i < n; ++i) {
if (data[i] > value) {
value = data[i];
}
}
return value;
}
```

Explanation of AVX2 code:

`_mm256_set1_ps`

initializes the maximum value with the first element of the data array.`_mm256_loadu_ps`

loads 8 elements at a time from the data array.`_mm256_max_ps`

compares the current maximum values in`vmax`

with the newly loaded data in`vdata`

, updating`vmax`

with the larger values.

After processing all the elements in batches, we store the maximum values from `vmax`

into a temporary array, for further comparison. The remaining elements are processed in the final loop.

## Leave a Comment

Cancel reply