Reversing an array is a classic example that introduces essential CUDA concepts such as memory management, thread indexing, and kernel launches. Unlike serial programming on the CPU, where operations are performed one after another, CUDA enables thousands of lightweight threads to execute concurrently on the GPU, allowing for massive parallelism.
The provided code defines a CUDA kernel function, reverse
, which is executed by many GPU threads in parallel. Each thread is responsible for computing the reverse position of a single element from the input array and writing it to the corresponding position in the output array.
In the main function, we begin by initializing host vectors (arrays on the CPU) and then allocate memory on the device (GPU) using cudaMalloc
. Data is transferred from host to device using cudaMemcpy
, allowing the GPU to access the input array. The kernel is launched with a calculated number of blocks and threads to ensure that every element is processed. Once the kernel completes its execution, the reversed result is copied back to the host, printed to the console, and the allocated GPU memory is freed to prevent memory leaks.
#include <iostream>
#include <vector>
__global__ void reverse(const float *data, float *result, const size_t n) {
unsigned int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < n) {
result[i] = data[n - i - 1];
}
}
int main() {
std::vector<float> a = {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
};
std::vector<float> result(a.size());
size_t bytes = a.size() * sizeof(float);
float *da, *dresult;
cudaMalloc(&da, bytes);
cudaMalloc(&dresult, bytes);
cudaMemcpy(da, a.data(), bytes, cudaMemcpyHostToDevice);
size_t blockSize = 256;
size_t numBlocks = (a.size() + blockSize - 1) / blockSize;
reverse<<< numBlocks, blockSize >>>(da, dresult, a.size());
cudaMemcpy(result.data(), dresult, bytes, cudaMemcpyDeviceToHost);
for (auto value: result) {
std::cout << value << " ";
}
cudaFree(da);
cudaFree(dresult);
return 0;
}
Output:
18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Leave a Comment
Cancel reply