Get NVIDIA GPU Peak Memory Bandwidth using NVML and C++

October 26, 2023
C++
0 Comments
299 Views

NVIDIA GPUs are widely famed for their exceptional computing power, especially in tasks related to parallel processing, machine learning, and scientific simulations. One key metric that defines the performance of a GPU is its peak memory bandwidth, which represents the maximum rate at which data can be transferred between a GPU's memory and its processing units. Peak memory bandwidth on NVIDIA GPU devices is typically expressed in gigabytes per second (GB/s) and can be calculated from the memory clock rate and the memory bus width. This tutorial shows to get NVIDIA GPU peak memory bandwidth using NVML and C++.

The code leverages the NVML to access vital information about NVIDIA GPU devices. Within the loop, it utilizes the nvmlDeviceGetClockInfo function to retrieve the memory clock rate and the nvmlDeviceGetMemoryBusWidth function to obtain the memory bus width. These values are then used to calculate the peak memory bandwidth.

The factor of 2.0 accounts for the double data rate (DDR) of the RAM per memory clock cycle. To convert from bits to bytes, the memory bus width is divided by 8.0. Additionally, dividing by 1.0e3 is employed to convert from megahertz to hertz and from bytes to gigabytes.

Following the computation, the resulting peak memory bandwidth is displayed in the console output.

#include <iostream>
#include <nvml.h>

int main()
{
    nvmlInit();

    uint32_t deviceCount;
    nvmlDeviceGetCount(&deviceCount);

    for (uint32_t i = 0; i < deviceCount; ++i) {
        nvmlDevice_t device;
        nvmlDeviceGetHandleByIndex(i, &device);

        char name[NVML_DEVICE_NAME_V2_BUFFER_SIZE];
        nvmlDeviceGetName(device, name, NVML_DEVICE_NAME_V2_BUFFER_SIZE);

        uint32_t memoryClockRate;
        nvmlDeviceGetClockInfo(device, NVML_CLOCK_MEM, &memoryClockRate);

        uint32_t memoryBusWidth;
        nvmlDeviceGetMemoryBusWidth(device, &memoryBusWidth);

        double bandwidth = 2.0 * memoryClockRate * (memoryBusWidth / 8.0) / 1.0e3;

        std::cout << "Device Number: " << i << std::endl;
        std::cout << " Device Name: " << name << std::endl;
        std::cout << " Peak Memory Bandwidth (GB/s): " << bandwidth << std::endl;
    }

    nvmlShutdown();

    return 0;
}

Here's an example of the output:

Device Number: 0
 Device Name: NVIDIA GeForce RTX 3070 Laptop GPU
 Peak Memory Bandwidth (GB/s): 384

Related

Leave a Comment