Get CUDA Device Properties using C++

Get CUDA Device Properties using C++

Understanding the properties and capabilities of the GPU devices being used is important for optimizing CUDA applications. One important aspect of CUDA device management is retrieving the properties of the GPU device, such as its name, memory bus width, total global memory, memory clock rate, and more. This information can be used to make informed decisions on how to best utilize the device for a specific application. This tutorial shows to get CUDA device properties using C++.

The following code retrieves and prints various properties of each CUDA device in the system, including device name, compute capability, total global memory, maximum threads per block, etc. You can access other properties from the cudaDeviceProp structure as needed, based on your requirements.

Let's dive in the code. We use the cudaGetDeviceCount function to retrieve the number of available CUDA devices and store it in the deviceCount variable. We declare a cudaDeviceProp structure for storing the properties of the current device. The cudaGetDeviceProperties function retrieves the properties of the current device and store them in the structure. Finally, various device properties are printed to the console.

#include <iostream>
#include <cuda_runtime.h>

int main()
{
    int deviceCount;
    cudaGetDeviceCount(&deviceCount);

    for (int i = 0; i < deviceCount; ++i) {
        cudaDeviceProp prop{};
        cudaGetDeviceProperties(&prop, i);

        std::cout << "Device Number: " << i << std::endl;
        std::cout << " Device Name: " << prop.name << std::endl;
        std::cout << " Compute Capability: " << prop.major << "." << prop.minor << std::endl;
        std::cout << " Total Global Memory (bytes): " << prop.totalGlobalMem << std::endl;
        std::cout << " Max Threads per Block: " << prop.maxThreadsPerBlock << std::endl;
        std::cout << " Memory Clock Rate (kHz): " << prop.memoryClockRate << std::endl;
        std::cout << " Memory Bus Width (bits): " << prop.memoryBusWidth << std::endl;
        std::cout << " L2 Cache Size (bytes): " << prop.l2CacheSize << std::endl;
    }

    return 0;
}

Here's an example of the output you might see in the console:

Device Number: 0
 Device Name: NVIDIA GeForce RTX 3070 Laptop GPU
 Compute Capability: 8.6
 Total Global Memory (bytes): 8361017344
 Max Threads per Block: 1024
 Memory Clock Rate (kHz): 6001000
 Memory Bus Width (bits): 256
 L2 Cache Size (bytes): 4194304

Leave a Comment

Cancel reply

Your email address will not be published.