Apply Ceil Function for Array Elements using C++ SIMD

November 15, 2024
C++
0 Comments
111 Views

The ceil function rounds each element in an array up to the nearest integer value. For example, applying ceil to 4.3 returns 5, while applying it to -2.7 returns -2. When working with large data arrays, even a simple ceil function applied to each element can impact performance. SIMD allows us to apply the same operation to multiple data elements simultaneously, leveraging parallelism in modern CPUs.

Here's a basic implementation:

#include <iostream>
#include <vector>
#include <cmath>

void ceil(float *data, const size_t n) {
    for (size_t i = 0; i < n; ++i) {
        data[i] = std::ceil(data[i]);
    }
}

int main() {
    std::vector<float> a = {
        -2.1, -3.5, 4.7, 9.8, -7.2, 0, 3.3, -1.9, 2.1,
        15, 1.4, 8.2, -8.3, -5.5, -4.2, 6.1, 9.9, -2.8,
    };

    ceil(a.data(), a.size());
    for (auto value: a) {
        std::cout << value << " ";
    }

    return 0;
}

This code uses the standard C++ library function std::ceil, which processes one element at a time. Output:

-2 -3 5 10 -7 0 4 -1 3 15 2 9 -8 -5 -4 7 10 -2

It is straightforward implementation, but not optimized for large arrays.

Here's the optimized version using AVX2:

#include <immintrin.h>

void ceil(float *data, const size_t n) {
    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 vresult = _mm256_ceil_ps(vdata);
        _mm256_storeu_ps(&data[i], vresult);
    }

    for (; i < n; ++i) {
        data[i] = std::ceil(data[i]);
    }
}

Explanation of the AVX2 code:

_mm256_loadu_ps loads eight elements from the array.
_mm256_ceil_ps applies the ceil function to each of the eight elements simultaneously.
_mm256_storeu_ps writes the results to the original array.

A final loop processes the remaining elements individually to ensure all elements are processed.

Related

Leave a Comment