The Mean Absolute Percentage Error (MAPE) is a commonly used metric to measure the accuracy of a model. It calculates the percentage difference between actual and predicted values, normalized by the actual values. A basic implementation of MAPE works for small datasets, but optimizing the calculation with SIMD can significantly boost performance.
Here's the basic implementation:
#include <iostream>
#include <vector>
#include <cmath>
float mse(const float *a, const float *b, const size_t n) {
float sum = 0;
for (size_t i = 0; i < n; ++i) {
sum += std::fabs((a[i] - b[i]) / a[i]);
}
return (sum / (float) n) * 100;
}
int main() {
std::vector<float> a = {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18,
};
std::vector<float> b = {
0.5, 1, 2.5, 3, 4.5, 5, 6.5, 7, 8.5,
9, 10.5, 11, 12.5, 13, 14.5, 15, 16.5, 17,
};
float value = mse(a.data(), b.data(), a.size());
std::cout << value;
return 0;
}
In this code, we iterate through the arrays, compute the absolute percentage difference for each pair of elements, accumulate the result, and divide by the total number of elements multiplied by 100 to get the mean absolute percentage error. Output:
13.6378
This method is effective for small datasets, but it becomes a performance bottleneck with larger data.
Here's the optimized implementation using AVX2:
#include <immintrin.h>
float mse(const float *a, const float *b, const size_t n) {
__m256 signMask = _mm256_set1_ps(-0.0f);
__m256 vsum = _mm256_setzero_ps();
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256 va = _mm256_loadu_ps(&a[i]);
__m256 vb = _mm256_loadu_ps(&b[i]);
__m256 vdiff = _mm256_sub_ps(va, vb);
vdiff = _mm256_div_ps(vdiff, va);
vdiff = _mm256_andnot_ps(signMask, vdiff);
vsum = _mm256_add_ps(vsum, vdiff);
}
__m128 bottom = _mm256_castps256_ps128(vsum);
__m128 top = _mm256_extractf128_ps(vsum, 1);
bottom = _mm_add_ps(bottom, top);
bottom = _mm_hadd_ps(bottom, bottom);
bottom = _mm_hadd_ps(bottom, bottom);
float sum = _mm_cvtss_f32(bottom);
for (; i < n; ++i) {
sum += std::fabs((a[i] - b[i]) / a[i]);
}
return (sum / (float) n) * 100;
}
AVX2 code explanation:
_mm256_loadu_ps
loads 8 floating-point elements from arrays._mm256_sub_ps
and_mm256_div_ps
used for subtraction and division to compute the percentage difference between elements._mm256_andnot_ps
calculates the absolute value of the differences._mm256_add_ps
used to accumulate results in thevsum
register.
To convert the SIMD result back to a scalar, we first split the vsum
into two halves, add them together, and reduce the sum to a single value.
The remaining elements are processed using a scalar loop at the end.
Leave a Comment
Cancel reply