Negating all elements in an array is a common operation in applications such as signal processing, image manipulation, and scientific computations. For small datasets, a scalar approach is adequate. However, this approach becomes inefficient as the array size grows, leading to significant processing time. Using SIMD can improve the speed of the negation process.

Here's a basic scalar implementation:

```
#include <iostream>
#include <vector>
void negate(float *data, const size_t n) {
for (size_t i = 0; i < n; ++i) {
data[i] = -data[i];
}
}
int main() {
std::vector<float> a = {
-1.5, 0, 1, -1.5, 2, -2.5, 3, -3.5, 4, -5, 6, -7, 8, -9, 10, -11, 12, -13,
};
negate(a.data(), a.size());
for (auto value: a) {
std::cout << value << " ";
}
return 0;
}
```

This code negates each element in an array. Output:

`1.5 -0 -1 1.5 -2 2.5 -3 3.5 -4 5 -6 7 -8 9 -10 11 -12 13`

This implementation works well for small arrays but can be slow for larger ones.

Here's the optimized implementation using AVX2:

```
#include <immintrin.h>
void negate(float *data, const size_t n) {
__m256 signMask = _mm256_set1_ps(-0.0f);
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256 vdata = _mm256_loadu_ps(&data[i]);
__m256 vneg = _mm256_xor_ps(signMask, vdata);
_mm256_storeu_ps(&data[i], vneg);
}
for (; i < n; ++i) {
data[i] = -data[i];
}
}
```

In this AVX2 implementation:

`_mm256_loadu_ps`

loads 8 floating-point numbers from the array.`_mm256_xor_ps`

negates each element by applying XOR operation with sign mask.`_mm256_storeu_ps`

stores negated results back into array.

Any remaining elements are processed in the final loop.

## Leave a Comment

Cancel reply