Get CUDA Device Peak Memory Bandwidth using C++

Get CUDA Device Peak Memory Bandwidth using C++

Peak memory bandwidth is an important metric in high-performance computing, as it represents the maximum rate at which data can be transferred between a GPU's memory and its processing units. Peak memory bandwidth is typically measured in gigabytes per second (GB/s) in CUDA devices. It can be calculated from the memory clock rate and the memory bus width. This tutorial shows to get CUDA device peak memory bandwidth using C++.

The cudaGetDeviceProperties function can be used to retrieve information about the CUDA device, including its memory clock rate and memory bus width. Using this information, peak memory bandwidth can be calculated.

The factor of 2.0 accounts for the double data rate (DDR) of the RAM per memory clock cycle. Memory bus width is divided by 8.0 to convert from bits to bytes. Division by 1.0e6 is used to convert from kilohertz to hertz and bytes to gigabytes.

After calculation, peak memory bandwidth is printed to the console.

#include <iostream>
#include <cuda_runtime.h>

int main() 
    int deviceCount;

    for (int i = 0; i < deviceCount; ++i) {
        cudaDeviceProp prop{};
        cudaGetDeviceProperties(&prop, i);

        double bandwidth = 2.0 * prop.memoryClockRate * (prop.memoryBusWidth / 8.0) / 1.0e6;

        std::cout << "Device Number: " << i << std::endl;
        std::cout << " Device Name: " << << std::endl;
        std::cout << " Peak Memory Bandwidth (GB/s): " << bandwidth << std::endl;

    return 0;

Here's an example of the output:

Device Number: 0
 Device Name: NVIDIA GeForce RTX 3070 Laptop GPU
 Peak Memory Bandwidth (GB/s): 384.064

Leave a Comment

Cancel reply

Your email address will not be published.