The Mean Absolute Percentage Error (MAPE) is a commonly used metric to measure the accuracy of a model. It calculates the percentage difference between actual and predicted values, normalized by the actual values. A basic implementation of MAPE works for small datasets, but optimizing the calculation with SIMD can significantly boost performance.
Here's the basic implementation:
#include <iostream>
#include <vector>
#include <cmath>
float mse(const float *a, const float *b, const size_t n) {
    float sum = 0;
    for (size_t i = 0; i < n; ++i) {
        sum += std::fabs((a[i] - b[i]) / a[i]);
    }
    return (sum / (float) n) * 100;
}
int main() {
    std::vector<float> a = {
        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
        12, 13, 14, 15, 16, 17, 18,
    };
    std::vector<float> b = {
        0.5, 1, 2.5, 3, 4.5, 5, 6.5, 7, 8.5,
        9, 10.5, 11, 12.5, 13, 14.5, 15, 16.5, 17,
    };
    float value = mse(a.data(), b.data(), a.size());
    std::cout << value;
    return 0;
}In this code, we iterate through the arrays, compute the absolute percentage difference for each pair of elements, accumulate the result, and divide by the total number of elements multiplied by 100 to get the mean absolute percentage error. Output:
13.6378This method is effective for small datasets, but it becomes a performance bottleneck with larger data.
Here's the optimized implementation using AVX2:
#include <immintrin.h>
float mse(const float *a, const float *b, const size_t n) {
    __m256 signMask = _mm256_set1_ps(-0.0f);
    __m256 vsum = _mm256_setzero_ps();
    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 va = _mm256_loadu_ps(&a[i]);
        __m256 vb = _mm256_loadu_ps(&b[i]);
        __m256 vdiff = _mm256_sub_ps(va, vb);
        vdiff = _mm256_div_ps(vdiff, va);
        vdiff = _mm256_andnot_ps(signMask, vdiff);
        vsum = _mm256_add_ps(vsum, vdiff);
    }
    __m128 bottom = _mm256_castps256_ps128(vsum);
    __m128 top = _mm256_extractf128_ps(vsum, 1);
    bottom = _mm_add_ps(bottom, top);
    bottom = _mm_hadd_ps(bottom, bottom);
    bottom = _mm_hadd_ps(bottom, bottom);
    float sum = _mm_cvtss_f32(bottom);
    for (; i < n; ++i) {
        sum += std::fabs((a[i] - b[i]) / a[i]);
    }
    return (sum / (float) n) * 100;
}AVX2 code explanation:
- _mm256_loadu_psloads 8 floating-point elements from arrays.
- _mm256_sub_psand- _mm256_div_psused for subtraction and division to compute the percentage difference between elements.
- _mm256_andnot_pscalculates the absolute value of the differences.
- _mm256_add_psused to accumulate results in the- vsumregister.
To convert the SIMD result back to a scalar, we first split the vsum into two halves, add them together, and reduce the sum to a single value.
The remaining elements are processed using a scalar loop at the end.
 
             
                         
                         
                        
Leave a Comment
Cancel reply