Negating all elements in an array is a common operation in applications such as signal processing, image manipulation, and scientific computations. For small datasets, a scalar approach is adequate. However, this approach becomes inefficient as the array size grows, leading to significant processing time. Using SIMD can improve the speed of the negation process.
Here's a basic scalar implementation:
#include <iostream>
#include <vector>
void negate(float *data, const size_t n) {
for (size_t i = 0; i < n; ++i) {
data[i] = -data[i];
}
}
int main() {
std::vector<float> a = {
-1.5, 0, 1, -1.5, 2, -2.5, 3, -3.5, 4, -5, 6, -7, 8, -9, 10, -11, 12, -13,
};
negate(a.data(), a.size());
for (auto value: a) {
std::cout << value << " ";
}
return 0;
}
This code negates each element in an array. Output:
1.5 -0 -1 1.5 -2 2.5 -3 3.5 -4 5 -6 7 -8 9 -10 11 -12 13
This implementation works well for small arrays but can be slow for larger ones.
Here's the optimized implementation using AVX2:
#include <immintrin.h>
void negate(float *data, const size_t n) {
__m256 signMask = _mm256_set1_ps(-0.0f);
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256 vdata = _mm256_loadu_ps(&data[i]);
__m256 vneg = _mm256_xor_ps(signMask, vdata);
_mm256_storeu_ps(&data[i], vneg);
}
for (; i < n; ++i) {
data[i] = -data[i];
}
}
In this AVX2 implementation:
_mm256_loadu_ps
loads 8 floating-point numbers from the array._mm256_xor_ps
negates each element by applying XOR operation with sign mask._mm256_storeu_ps
stores negated results back into array.
Any remaining elements are processed in the final loop.
Leave a Comment
Cancel reply