The Mean Absolute Percentage Error (MAPE) is a commonly used metric to measure the accuracy of a model. It calculates the percentage difference between actual and predicted values, normalized by the actual values. A basic implementation of MAPE works for small datasets, but optimizing the calculation with SIMD can significantly boost performance.

Here's the basic implementation:

```
#include <iostream>
#include <vector>
#include <cmath>
float mse(const float *a, const float *b, const size_t n) {
float sum = 0;
for (size_t i = 0; i < n; ++i) {
sum += std::fabs((a[i] - b[i]) / a[i]);
}
return (sum / (float) n) * 100;
}
int main() {
std::vector<float> a = {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18,
};
std::vector<float> b = {
0.5, 1, 2.5, 3, 4.5, 5, 6.5, 7, 8.5,
9, 10.5, 11, 12.5, 13, 14.5, 15, 16.5, 17,
};
float value = mse(a.data(), b.data(), a.size());
std::cout << value;
return 0;
}
```

In this code, we iterate through the arrays, compute the absolute percentage difference for each pair of elements, accumulate the result, and divide by the total number of elements multiplied by 100 to get the mean absolute percentage error. Output:

`13.6378`

This method is effective for small datasets, but it becomes a performance bottleneck with larger data.

Here's the optimized implementation using AVX2:

```
#include <immintrin.h>
float mse(const float *a, const float *b, const size_t n) {
__m256 signMask = _mm256_set1_ps(-0.0f);
__m256 vsum = _mm256_setzero_ps();
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256 va = _mm256_loadu_ps(&a[i]);
__m256 vb = _mm256_loadu_ps(&b[i]);
__m256 vdiff = _mm256_sub_ps(va, vb);
vdiff = _mm256_div_ps(vdiff, va);
vdiff = _mm256_andnot_ps(signMask, vdiff);
vsum = _mm256_add_ps(vsum, vdiff);
}
__m128 bottom = _mm256_castps256_ps128(vsum);
__m128 top = _mm256_extractf128_ps(vsum, 1);
bottom = _mm_add_ps(bottom, top);
bottom = _mm_hadd_ps(bottom, bottom);
bottom = _mm_hadd_ps(bottom, bottom);
float sum = _mm_cvtss_f32(bottom);
for (; i < n; ++i) {
sum += std::fabs((a[i] - b[i]) / a[i]);
}
return (sum / (float) n) * 100;
}
```

AVX2 code explanation:

`_mm256_loadu_ps`

loads 8 floating-point elements from arrays.`_mm256_sub_ps`

and`_mm256_div_ps`

used for subtraction and division to compute the percentage difference between elements.`_mm256_andnot_ps`

calculates the absolute value of the differences.`_mm256_add_ps`

used to accumulate results in the`vsum`

register.

To convert the SIMD result back to a scalar, we first split the `vsum`

into two halves, add them together, and reduce the sum to a single value.

The remaining elements are processed using a scalar loop at the end.

## Leave a Comment

Cancel reply