Matching SM architectures (CUDA arch and CUDA gencode) for various NVIDIA cards

I’ve seen some confusion regarding NVIDIA’s nvcc sm flags and what they’re used for:
When compiling with NVCC, the arch flag (‘-arch‘) specifies the name of the NVIDIA GPU architecture that the CUDA files will be compiled for.
Gencodes (‘-gencode‘) allows for more PTX generations, and can be repeated many times for different architectures.

When should different ‘gencodes’ or ‘cuda arch’ be used?

When you compile CUDA code, you should always compile only one ‘-arch‘ flag that matches your most used GPU cards. This will enable faster runtime, because code generation will occur during compilation.
If you only mention ‘-gencode‘, but omit the ‘-arch‘ flag, the GPU code generation will occur on the JIT compiler by the CUDA driver.
When you want to speed up CUDA compilation, you want to reduce the amount of irrelevant ‘-gencode‘ flags. However, sometimes you may wish to have better CUDA backwards compatibility by adding more comprehensive ‘-gencode‘ flags.
Find out which GPU you have, and which CUDA version you have first.

How to check which CUDA version is installed on Linux

There are several ways and steps you could check which CUDA version is installed on your Linux box

Identify the CUDA location and version with NVCC

Run which nvcc to find if nvcc is installed properly.
You should see something like /usr/bin/nvcc. If that appears, your NVCC is installed in the standard directory.


~ $ which nvcc
/usr/bin/nvcc

If you have installed the CUDA toolkit but which nvcc returns no results, you might need to add the directory to your path.
You can check nvcc --version to get the CUDA compiler version, which matches the toolkit version:


~ $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

This means that we have CUDA version 8.0.61 installed.

Identify CUDA version from CUDA code

Use the cudaDriverGetVersion() API call. You can find an example of using cudaDriverGetVersion() here.

Identifying which CUDA driver version is installed and active in the kernel


~ $ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  367.48  Sat Sep  3 18:21:08 PDT 2016
GCC version:  gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)

Identifying which GPU card is installed and what version

Using nvidia-smi is the best way I’ve found to get a holistic view of everything – both GPU card model and driver version.
The driver version is 367.48 as seen below, and the cards are two Tesla K40m.


~ $ nvidia-smi
Tue Jun  6 12:43:17 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40m          On   | 0000:04:00.0     Off |                   0* |
| N/A   48C    P0    67W / 235W |  12MiB / 11439MiB    |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K40m          On   | 0000:42:00.0     Off |                   0* |
| N/A   54C    P0    68W / 235W |  0MiB / 11439MiB     |      0%      Default |
+-------------------------------+----------------------+----------------------+
 
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================

Troubleshooting

After installing a new version of CUDA, there are some situations that require rebooting the machine to have the driver versions load properly. It is my recommendation to reboot after performing the kernel-headers upgrade/install process, and after installing CUDA – to verify that everything is loaded correctly.

Supported SM and Gencode variations

Below are the supported sm variations and sample cards from that generation

Supported on CUDA 7 and later

Fermi (CUDA 3.2 until CUDA 8) (deprecated from CUDA 9):
- SM20 or SM_20, compute_30 – Older cards such as GeForce 400, 500, 600, GT-630
Kepler (CUDA 5 and later):
- SM30 or SM_30, compute_30 – Kepler architecture (generic – Tesla K40/K80, GeForce 700, GT-730)
  Adds support for unified memory programming
- SM35 or SM_35, compute_35 – More specific Tesla K40Adds support for dynamic parallelism. Shows no real benefit over SM30 in my experience.
- SM37 or SM_37, compute_37 – More specific Tesla K80Adds a few more registers. Shows no real benefit over SM30 in my experience
Maxwell (CUDA 6 and later):
- SM50 or SM_50, compute_50 – Tesla/Quadro M series
- SM52 or SM_52, compute_52 – Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X
- SM53 or SM_53, compute_53 – Tegra (Jetson) TX1 / Tegra X1
Pascal (CUDA 8 and later)
- SM60 or SM_60, compute_60 – GP100/Tesla P100 – DGX-1 (Generic Pascal)
- SM61 or SM_61, compute_61 – GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030, Titan Xp, Tesla P40, Tesla P4, Discrete GPU on the NVIDIA Drive PX2
- SM62 or SM_62, compute_62 – Integrated GPU on the NVIDIA Drive PX2, Tegra (Jetson) TX2
Volta (CUDA 9 and later)
- SM70 or SM_70, compute_70 – Tesla V100, GTX 1180 (GV104)
- SM71 or SM_71, compute_71 – probably not implemented
- SM72 or SM_72, compute_72 – currently unknown
Turing (CUDA 10 and later)
- SM75 or SM_75, compute_75 – RTX 2080, Titan RTX, Quadro R8000

Sample Flags

According to NVIDIA:

The arch= clause of the -gencode= command-line option to nvcc specifies the front-end compilation target and must always be a PTX version. The code= clause specifies the back-end compilation target and can either be cubin or PTX or both. Only the back-end target version(s) specified by the code= clause will be retained in the resulting binary; at least one must be PTX to provide Volta compatibility.

Sample flags for generation on CUDA 7 for maximum compatibility:


-arch=sm_30 \
 -gencode=arch=compute_20,code=sm_20 \
 -gencode=arch=compute_30,code=sm_30 \
 -gencode=arch=compute_50,code=sm_50 \
 -gencode=arch=compute_52,code=sm_52 \
 -gencode=arch=compute_52,code=compute_52

Sample flags for generation on CUDA 8 for maximum compatibility:


-arch=sm_30 \
 -gencode=arch=compute_20,code=sm_20 \
 -gencode=arch=compute_30,code=sm_30 \
 -gencode=arch=compute_50,code=sm_50 \
 -gencode=arch=compute_52,code=sm_52 \
 -gencode=arch=compute_60,code=sm_60 \
 -gencode=arch=compute_61,code=sm_61 \
 -gencode=arch=compute_61,code=compute_61

Sample flags for generation on CUDA 9 for maximum compatibility with Volta cards.
Note the removed SM_20:


-arch=sm_50 \
 -gencode=arch=compute_50,code=sm_50 \
 -gencode=arch=compute_52,code=sm_52 \
 -gencode=arch=compute_60,code=sm_60 \
 -gencode=arch=compute_61,code=sm_61 \
 -gencode=arch=compute_70,code=sm_70 \
 -gencode=arch=compute_70,code=compute_70

Sample flags for generation on CUDA 10 for maximum compatibility with Turing cards:


-arch=sm_50   \
 -gencode=arch=compute_50,code=sm_50 \
 -gencode=arch=compute_52,code=sm_52 \
 -gencode=arch=compute_60,code=sm_60 \
 -gencode=arch=compute_61,code=sm_61 \
 -gencode=arch=compute_70,code=sm_70 \
 -gencode=arch=compute_75,code=sm_75 \<br /> -gencode=arch=compute_75,code=compute_75

Search This Blog

TECH TIPS - Do's and Dont's

NVIDIA graphic card and CUDA architecture pairs

Matching SM architectures (CUDA arch and CUDA gencode) for various NVIDIA cards

When should different ‘gencodes’ or ‘cuda arch’ be used?

How to check which CUDA version is installed on Linux

Identify the CUDA location and version with NVCC

Identify CUDA version from CUDA code

Identifying which CUDA driver version is installed and active in the kernel

Identifying which GPU card is installed and what version

Troubleshooting

Supported SM and Gencode variations

Supported on CUDA 7 and later

Sample Flags

Comments

Post a Comment

Popular posts from this blog

SOX - Sound eXchange - How to use SOX for audio processing tasks in research.

Sox of Silence - Original post - http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/

How to get video or audio duration of a file using ffmpeg?