When working with numerical data, a common task is to convert arrays of integers into arrays of floating-point numbers. Although the basic scalar approach works perfectly fine, it may not be the most efficient when handling large datasets. Modern CPUs offer SIMD capabilities that allow us to process multiple elements in parallel, significantly accelerating such conversions.
The straightforward way to convert an array of int32_t
to float
is to iterate through each element and cast it manually. Here's how it looks:
#include <iostream>
#include <vector>
void int32float(const int32_t *data, float *result, const size_t n) {
for (size_t i = 0; i < n; ++i) {
result[i] = (float) data[i];
}
}
int main() {
std::vector a = {
-1, 0, 1, -2, 3, -4, 5, -6, 7, -8, 9, -10, 11, -12, 13, -14, 15, -16,
};
std::vector<float> result(a.size());
int32float(a.data(), result.data(), a.size());
for (auto value: result) {
std::cout << value << " ";
}
return 0;
}
While this method is simple and portable, it doesn't take advantage of CPU vectorization capabilities.
Here's the optimized implementation using AVX2:
#include <immintrin.h>
void int32float(const int32_t *data, float *result, const size_t n) {
size_t i = 0;
for (; i + 8 <= n; i += 8) {
__m256i va = _mm256_loadu_si256((__m256i*) &data[i]);
__m256 vresult = _mm256_cvtepi32_ps(va);
_mm256_storeu_ps(&result[i], vresult);
}
for (; i < n; ++i) {
result[i] = (float) data[i];
}
}
Explanation:
_mm256_loadu_si256
loads 8int32_t
elements from array._mm256_cvtepi32_ps
converts the 8 signed integers into 8 floating-point values._mm256_storeu_ps
writes these 8 floats into the result array.
The scalar fallback loop ensures that if the number of elements isn't a multiple of 8, the leftovers are processed at the end.
Leave a Comment
Cancel reply