Calculate Mean Absolute Percentage Error using C++ SIMD

October 23, 2024
C++
0 Comments
67 Views

The Mean Absolute Percentage Error (MAPE) is a commonly used metric to measure the accuracy of a model. It calculates the percentage difference between actual and predicted values, normalized by the actual values. A basic implementation of MAPE works for small datasets, but optimizing the calculation with SIMD can significantly boost performance.

Here's the basic implementation:

#include <iostream>
#include <vector>
#include <cmath>

float mse(const float *a, const float *b, const size_t n) {
    float sum = 0;
    for (size_t i = 0; i < n; ++i) {
        sum += std::fabs((a[i] - b[i]) / a[i]);
    }

    return (sum / (float) n) * 100;
}

int main() {
    std::vector<float> a = {
        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
        12, 13, 14, 15, 16, 17, 18,
    };
    std::vector<float> b = {
        0.5, 1, 2.5, 3, 4.5, 5, 6.5, 7, 8.5,
        9, 10.5, 11, 12.5, 13, 14.5, 15, 16.5, 17,
    };

    float value = mse(a.data(), b.data(), a.size());
    std::cout << value;

    return 0;
}

In this code, we iterate through the arrays, compute the absolute percentage difference for each pair of elements, accumulate the result, and divide by the total number of elements multiplied by 100 to get the mean absolute percentage error. Output:

13.6378

This method is effective for small datasets, but it becomes a performance bottleneck with larger data.

Here's the optimized implementation using AVX2:

#include <immintrin.h>

float mse(const float *a, const float *b, const size_t n) {
    __m256 signMask = _mm256_set1_ps(-0.0f);
    __m256 vsum = _mm256_setzero_ps();

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 va = _mm256_loadu_ps(&a[i]);
        __m256 vb = _mm256_loadu_ps(&b[i]);
        __m256 vdiff = _mm256_sub_ps(va, vb);
        vdiff = _mm256_div_ps(vdiff, va);
        vdiff = _mm256_andnot_ps(signMask, vdiff);
        vsum = _mm256_add_ps(vsum, vdiff);
    }

    __m128 bottom = _mm256_castps256_ps128(vsum);
    __m128 top = _mm256_extractf128_ps(vsum, 1);

    bottom = _mm_add_ps(bottom, top);
    bottom = _mm_hadd_ps(bottom, bottom);
    bottom = _mm_hadd_ps(bottom, bottom);

    float sum = _mm_cvtss_f32(bottom);
    for (; i < n; ++i) {
        sum += std::fabs((a[i] - b[i]) / a[i]);
    }

    return (sum / (float) n) * 100;
}

AVX2 code explanation:

_mm256_loadu_ps loads 8 floating-point elements from arrays.
_mm256_sub_ps and _mm256_div_ps used for subtraction and division to compute the percentage difference between elements.
_mm256_andnot_ps calculates the absolute value of the differences.
_mm256_add_ps used to accumulate results in the vsum register.

To convert the SIMD result back to a scalar, we first split the vsum into two halves, add them together, and reduce the sum to a single value.

The remaining elements are processed using a scalar loop at the end.