Apply Ceil Function for Array Elements using C++ SIMD

Apply Ceil Function for Array Elements using C++ SIMD

The ceil function rounds each element in an array up to the nearest integer value. For example, applying ceil to 4.3 returns 5, while applying it to -2.7 returns -2. When working with large data arrays, even a simple ceil function applied to each element can impact performance. SIMD allows us to apply the same operation to multiple data elements simultaneously, leveraging parallelism in modern CPUs.

Here's a basic implementation:

#include <iostream> #include <vector> #include <cmath> void ceil(float *data, const size_t n) { for (size_t i = 0; i < n; ++i) { data[i] = std::ceil(data[i]); } } int main() { std::vector<float> a = { -2.1, -3.5, 4.7, 9.8, -7.2, 0, 3.3, -1.9, 2.1, 15, 1.4, 8.2, -8.3, -5.5, -4.2, 6.1, 9.9, -2.8, }; ceil(a.data(), a.size()); for (auto value: a) { std::cout << value << " "; } return 0; }

This code uses the standard C++ library function std::ceil, which processes one element at a time. Output:

-2 -3 5 10 -7 0 4 -1 3 15 2 9 -8 -5 -4 7 10 -2

It is straightforward implementation, but not optimized for large arrays.

Here's the optimized version using AVX2:

#include <immintrin.h> void ceil(float *data, const size_t n) { size_t i = 0; for (; i + 8 <= n; i += 8) { __m256 vdata = _mm256_loadu_ps(&data[i]); __m256 vresult = _mm256_ceil_ps(vdata); _mm256_storeu_ps(&data[i], vresult); } for (; i < n; ++i) { data[i] = std::ceil(data[i]); } }

Explanation of the AVX2 code:

  • _mm256_loadu_ps loads eight elements from the array.
  • _mm256_ceil_ps applies the ceil function to each of the eight elements simultaneously.
  • _mm256_storeu_ps writes the results to the original array.

A final loop processes the remaining elements individually to ensure all elements are processed.

Leave a Comment

Cancel reply

Your email address will not be published.