CUDA Device Info Class – Stephen Conover

I adapted the NVIDIA CUDA 6.5 Device Query Example to encapsulate it in a cleaner class structure. The code is undocumented at this time but it is fairly straightforward in that it presents a the parameters for each CUDA device in the system. The CCUDAInfo class is small; it contains the count of the devices and an array of the devices themselves. The CCUDADeviceInfo class contains the bulk of the useful information. Both classes have ostream << operators overloaded and throw an exception if CUDA fails. CUDA must be initialized before using the class.

The class can be used as follows:

#include <iostream>
#include <cuda.h>
#include <helper_cuda_drvapi.h>
#include <drvapi_error_string.h>

#include "CCUDAInfo.h"

int main(int argc, char **argv)
{
	 
   std::cout << "Starting ... \n";
   
   // Init CUDA for application:
   CUresult error_id = cuInit(0);

    if (error_id != CUDA_SUCCESS)
    {
        std::cerr << "cuInit Failed. Returned " << error_id << ": " << getCudaDrvErrorString(error_id) << std::endl;
        printf("Result = FAIL\n");
        exit(EXIT_FAILURE);
    }

	// Load and display the CUDA Info Class:
	try
	{
		CCUDAInfo cinfo;
		std::cout << cinfo << "\n";
	}
	catch(std::exception &ex)
	{
		std::cout << "Error: " << ex.what() << "\n";
	}

	return 0;
}

With the following output:

Starting ...
CUDA Driver Version: 6.5
Device Count: 1
*** DEVICE 0 ***
Name: GeForce GT 650M
Compute Capability: 3.0
Clock Rate: 835000
Compute Mode: 0
CUDA CORES: 384
Cores Per MP: 192
Device ID: 0
ECC Enabled: No
Is Tesla: No
Kernel Timeout Enabled: Yes
L2 Cache Size: 262144
Max Block Dim: 1024, 1024, 64
Max Grid Dim: 2147483647, 65535, 65535
Max 1D Texture Size: 65536
Max 1D Layered Texture Size: 16384, 2048
Max 2D Texture Size: 65536, 65536
Max 2D Layers Texture Size: 16384, 16384, 2048
Max 3D Texture Size: 4096, 4096, 4096
Max Threads Per Block: 1024
Max Threads Per Multiprocessor: 2048
Memory Bus Width: 128
Memory Clock Rate: 2 Ghz
Memory Pitch Bytes: 2147483647
Multiprocessor Count: 2
PCI Bus ID: 1
PCI Device ID: 0
Registers Per Block: 65536
Shared Memory Per Block: 49152
Total Constant Memory Bytes: 65536
Total Global Memory Bytes: 1073741824
Warp Size: 32
Supports Concurrent Kernels: Yes
Supports GPU Overlap: Yes
Supports Integrated GPU Sharing Host Memory: No
Supports Map Host Memory: Yes
Supports Unified Addressing: No
Surface Alignment Required: Yes

The files can be found here.

Leave a Reply Cancel reply