The Mean Absolute Deviation (MAD) measures the average absolute difference between each array element and the mean of the array. It is a common statistical tool for analyzing variability. Using SIMD operations, we can significantly optimize the MAD calculation by processing multiple elements in parallel.
The scalar implementation:
#include <iostream>
#include <vector>
float mad(const float *data, const size_t n) {
float sum = 0.0f;
for (size_t i = 0; i < n; ++i) {
sum += data[i];
}
float mean = sum / (float) n;
sum = 0.0f;
for (size_t i = 0; i < n; ++i) {
sum += std::abs(data[i] - mean);
}
return sum / (float) n;
}
int main() {
std::vector<float> a = {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
};
auto value = mad(a.data(), a.size());
std::cout << value;
return 0;
}
The MAD is calculated in two steps:
- Compute the mean of the array.
- Calculate the mean of the absolute deviations from the mean.
The code output is 4.5. This approach processes one element at a time, making it less efficient for large arrays.
AVX2 implementation:
#include <immintrin.h>
float mad(const float *data, const size_t n) {
__m256 vsum = _mm256_setzero_ps();
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256 vdata = _mm256_loadu_ps(&data[i]);
vsum = _mm256_add_ps(vsum, vdata);
}
__m128 bottom = _mm256_castps256_ps128(vsum);
__m128 top = _mm256_extractf128_ps(vsum, 1);
bottom = _mm_add_ps(bottom, top);
bottom = _mm_hadd_ps(bottom, bottom);
bottom = _mm_hadd_ps(bottom, bottom);
float sum = _mm_cvtss_f32(bottom);
for (; i < n; ++i) {
sum += data[i];
}
float mean = sum / (float) n;
__m256 vmean = _mm256_set1_ps(mean);
__m256 signMask = _mm256_set1_ps(-0.0f);
vsum = _mm256_setzero_ps();
i = 0;
for (; i + 8 <= n; i += 8) {
__m256 vdata = _mm256_loadu_ps(&data[i]);
__m256 vdiff = _mm256_sub_ps(vdata, vmean);
vdiff = _mm256_andnot_ps(signMask, vdiff);
vsum = _mm256_add_ps(vsum, vdiff);
}
bottom = _mm256_castps256_ps128(vsum);
top = _mm256_extractf128_ps(vsum, 1);
bottom = _mm_add_ps(bottom, top);
bottom = _mm_hadd_ps(bottom, bottom);
bottom = _mm_hadd_ps(bottom, bottom);
sum = _mm_cvtss_f32(bottom);
for (; i < n; ++i) {
sum += std::abs(data[i] - mean);
}
return sum / (float) n;
}
Code explanation:
- Mean Calculation:
The code computes the sum by loading 8 elements at a time using _mm256_loadu_ps
, then accumulates these into vsum
with _mm256_add_ps
. Once all chunks are processed, the partial sums in vsum
are reduced horizontally using _mm_hadd_ps
and combined. Any remaining elements are summed in a scalar loop, and the mean is computed by dividing the total sum by n
.
- Absolute Deviation Accumulation:
For each chunk of 8 elements, the difference from the mean is calculated using _mm256_sub_ps
. The absolute value of these differences is computed with _mm256_andnot_ps
to clear the sign bit. These absolute deviations are then accumulated into vsum
using _mm256_add_ps
. Finally, the partial sums in vsum
are horizontally reduced with _mm_hadd_ps
and combined, while any leftover elements are handled in a scalar loop.
- Mean Absolute Deviation (MAD) Calculation:
The total of absolute deviations is divided by n
to compute the MAD.
Leave a Comment
Cancel reply