In many applications, such as signal processing, machine learning, and data analysis, it's common to apply a sign function to array elements. This function evaluates each element, returning 1 if positive, -1 if negative, and 0 if zero. While a scalar approach works for smaller datasets, SIMD allows us to perform this operation on larger arrays much faster by processing multiple values simultaneously.
Scalar implementation:
#include <iostream>
#include <vector>
void sign(float *data, const size_t n) {
for (size_t i = 0; i < n; ++i) {
if (data[i] > 0.0f) {
data[i] = 1.0f;
} else if (data[i] < 0.0f) {
data[i] = -1.0f;
} else {
data[i] = 0.0f;
}
}
}
int main() {
std::vector<float> a = {
-2.1, -3.5, 4.7, 9.8, -7.2, 0, 3.3, -1.9, 2.1,
15, 1.4, 8.2, -8.3, -5.5, -4.2, 6.1, 9.9, -2.8,
};
sign(a.data(), a.size());
for (auto value: a) {
std::cout << value << " ";
}
return 0;
}
This code loops through each array element, compares its value, and sets it to one, negative one, or zero accordingly. Output:
-1 -1 1 1 -1 0 1 -1 1 1 1 1 -1 -1 -1 1 1 -1
This approach is straightforward and works well for smaller arrays but can become slow for large datasets, as it processes one element at a time sequentially.
Code optimization with AVX2:
#include <immintrin.h>
void sign(float *data, const size_t n) {
__m256 zero = _mm256_set1_ps(0.0f);
__m256 one = _mm256_set1_ps(1.0f);
__m256 negOne = _mm256_set1_ps(-1.0f);
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256 vdata = _mm256_loadu_ps(&data[i]);
__m256 posMask = _mm256_cmp_ps(vdata, zero, _CMP_GT_OQ);
__m256 negMask = _mm256_cmp_ps(vdata, zero, _CMP_LT_OQ);
__m256 pos = _mm256_and_ps(posMask, one);
__m256 neg = _mm256_and_ps(negMask, negOne);
__m256 vresult = _mm256_or_ps(pos, neg);
_mm256_storeu_ps(&data[i], vresult);
}
for (; i < n; ++i) {
if (data[i] > 0.0f) {
data[i] = 1.0f;
} else if (data[i] < 0.0f) {
data[i] = -1.0f;
} else {
data[i] = 0.0f;
}
}
}
Explanation of key AVX2 instructions:
_mm256_loadu_ps
loads 8 float values from the array._mm256_cmp_ps
compares elements to 0 to create a mask for values that are positive or negative._mm256_and_ps
applies 1 to elements where the positive mask is true and -1 where the negative mask is true._mm256_or_ps
combines the positive and negative results._mm256_storeu_ps
stores the result back to the array.
Leave a Comment
Cancel reply