Element-wise subtraction is a common operation in numerical computing, signal processing, and data science, where corresponding elements of two arrays are subtracted and stored in a result array. Modern CPUs provide powerful SIMD instructions that can greatly accelerate this process.
Here's the basic implementation:
#include <iostream>
#include <vector>
void vectorSubtract(const float *a, const float *b, float *result, const size_t n) {
for (size_t i = 0; i < n; ++i) {
result[i] = a[i] - b[i];
}
}
int main() {
std::vector<float> a = {
17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0,
};
std::vector<float> b = {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
};
std::vector<float> result(a.size());
vectorSubtract(a.data(), b.data(), result.data(), a.size());
for (auto value: result) {
std::cout << value << " ";
}
return 0;
}
In this code, we iterate over each element in the arrays, subtracting them and storing the result. Output:
16 14 12 10 8 6 4 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18
However, for large arrays, this approach can be slow due to its sequential nature.
Here's the optimized implementation using AVX2:
#include <immintrin.h>
void vectorSubtract(const float *a, const float *b, float *result, const size_t n) {
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256 va = _mm256_loadu_ps(&a[i]);
__m256 vb = _mm256_loadu_ps(&b[i]);
__m256 vresult = _mm256_sub_ps(va, vb);
_mm256_storeu_ps(&result[i], vresult);
}
for (; i < n; ++i) {
result[i] = a[i] - b[i];
}
}
Here how it works:
_mm256_loadu_ps
loads 8 floating-point numbers from arrays._mm256_sub_ps
performs element-wise subtraction of two vectors._mm256_storeu_ps
stores the result to the array.
The remaining elements (if any) are handled in a fallback loop.
Leave a Comment
Cancel reply