Calculate Mean Absolute Deviation of Array Elements using C++ SIMD

Calculate Mean Absolute Deviation of Array Elements using C++ SIMD

The Mean Absolute Deviation (MAD) measures the average absolute difference between each array element and the mean of the array. It is a common statistical tool for analyzing variability. Using SIMD operations, we can significantly optimize the MAD calculation by processing multiple elements in parallel.

The scalar implementation:

#include <iostream>
#include <vector>

float mad(const float *data, const size_t n) {
    float sum = 0.0f;
    for (size_t i = 0; i < n; ++i) {
        sum += data[i];
    }

    float mean = sum / (float) n;

    sum = 0.0f;
    for (size_t i = 0; i < n; ++i) {
        sum += std::abs(data[i] - mean);
    }

    return sum / (float) n;
}

int main() {
    std::vector<float> a = {
        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
    };

    auto value = mad(a.data(), a.size());
    std::cout << value;

    return 0;
}

The MAD is calculated in two steps:

  • Compute the mean of the array.
  • Calculate the mean of the absolute deviations from the mean.

The code output is 4.5. This approach processes one element at a time, making it less efficient for large arrays.

AVX2 implementation:

#include <immintrin.h>

float mad(const float *data, const size_t n) {
    __m256 vsum = _mm256_setzero_ps();

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        vsum = _mm256_add_ps(vsum, vdata);
    }

    __m128 bottom = _mm256_castps256_ps128(vsum);
    __m128 top = _mm256_extractf128_ps(vsum, 1);

    bottom = _mm_add_ps(bottom, top);
    bottom = _mm_hadd_ps(bottom, bottom);
    bottom = _mm_hadd_ps(bottom, bottom);

    float sum = _mm_cvtss_f32(bottom);
    for (; i < n; ++i) {
        sum += data[i];
    }

    float mean = sum / (float) n;
    __m256 vmean = _mm256_set1_ps(mean);
    __m256 signMask = _mm256_set1_ps(-0.0f);
    vsum = _mm256_setzero_ps();

    i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 vdiff = _mm256_sub_ps(vdata, vmean);
        vdiff = _mm256_andnot_ps(signMask, vdiff);
        vsum = _mm256_add_ps(vsum, vdiff);
    }

    bottom = _mm256_castps256_ps128(vsum);
    top = _mm256_extractf128_ps(vsum, 1);

    bottom = _mm_add_ps(bottom, top);
    bottom = _mm_hadd_ps(bottom, bottom);
    bottom = _mm_hadd_ps(bottom, bottom);

    sum = _mm_cvtss_f32(bottom);
    for (; i < n; ++i) {
        sum += std::abs(data[i] - mean);
    }

    return sum / (float) n;
}

Code explanation:

  • Mean Calculation:

The code computes the sum by loading 8 elements at a time using _mm256_loadu_ps, then accumulates these into vsum with _mm256_add_ps. Once all chunks are processed, the partial sums in vsum are reduced horizontally using _mm_hadd_ps and combined. Any remaining elements are summed in a scalar loop, and the mean is computed by dividing the total sum by n.

  • Absolute Deviation Accumulation:

For each chunk of 8 elements, the difference from the mean is calculated using _mm256_sub_ps. The absolute value of these differences is computed with _mm256_andnot_ps to clear the sign bit. These absolute deviations are then accumulated into vsum using _mm256_add_ps. Finally, the partial sums in vsum are horizontally reduced with _mm_hadd_ps and combined, while any leftover elements are handled in a scalar loop.

  • Mean Absolute Deviation (MAD) Calculation:

The total of absolute deviations is divided by n to compute the MAD.

Leave a Comment

Cancel reply

Your email address will not be published.