Multiply Array Elements by Index Position using C++ SIMD

Multiply Array Elements by Index Position using C++ SIMD

Multiplying each element of an array by its index is a common operation in data processing, allowing us to scale elements based on their position within the array. A basic approach involves iterating over each element and performing the multiplication sequentially. SIMD can accelerate this process by performing multiple multiplications in parallel.

The scalar version:

#include <iostream>
#include <vector>

void multipleByIndex(const float *data, float *result, const size_t n) {
    for (size_t i = 0; i < n; ++i) {
        result[i] = data[i] * (float) i;
    }
}

int main() {
    std::vector<float> a = {
        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
    };
    std::vector<float> result(a.size());

    multipleByIndex(a.data(), result.data(), a.size());
    for (auto value: result) {
        std::cout << value << " ";
    }

    return 0;
}

This code simply iterates over the input array data, multiplies each element by its index, and stores the result in the array. Output:

0 2 6 12 20 30 42 56 72 90 110 132 156 182 210 240 272 306

Optimized implementation using AVX2:

#include <immintrin.h>

void multipleByIndex(const float *data, float *result, const size_t n) {
    __m256 vindex = _mm256_set_ps(7.0f, 6.0f, 5.0f, 4.0f, 3.0f, 2.0f, 1.0f, 0.0f);
    __m256 vinc = _mm256_set1_ps(8.0f);

    size_t i = 0;
    for (; i + 8 <= n; i += 8) {
        __m256 vdata = _mm256_loadu_ps(&data[i]);
        __m256 vresult = _mm256_mul_ps(vdata, vindex);
        _mm256_storeu_ps(&result[i], vresult);
        vindex = _mm256_add_ps(vindex, vinc);
    }

    for (; i < n; ++i) {
        result[i] = data[i] * (float) i;
    }
}

Here's how it works:

  • _mm256_set_ps initializes vector with the first eight indices.
  • _mm256_set1_ps holds a value of eight, allowing us to increment the vector to prepare for the next set of indices.
  • _mm256_loadu_ps loads eight elements from array.
  • _mm256_mul_ps performs an element-wise multiplication.
  • _mm256_storeu_ps stores the results in array.
  • _mm256_add_ps increments the index values by eight, setting up for the next loop iteration.

Remaining elements are handled with a scalar loop at the end.

Leave a Comment

Cancel reply

Your email address will not be published.