nvidia – Page 3 – Tarik Billa

GPU-accelerated video processing with ffmpeg

March 5, 2023 by Tarik

FFmpeg provides a subsystem for hardware acceleration, which includes NVIDIA: https://trac.ffmpeg.org/wiki/HWAccelIntro In order to enable support for GPU-assisted encoding with an NVIDIA GPU, you need: A supported GPU Supported drivers for your operating system The NVIDIA Codec SDK ffmpeg configured with –enable-nvenc (default if the drivers are detected while configuring)

CUDA determining threads per block, blocks per grid

March 2, 2023 by Tarik

In general you want to size your blocks/grid to match your data and simultaneously maximize occupancy, that is, how many threads are active at one time. The major factors influencing occupancy are shared memory usage, register usage, and thread block size. A CUDA enabled GPU has its processing capability split up into SMs (streaming multiprocessors), … Read more

Error Message : Cannot find or open the PDB file

February 25, 2023 by Tarik

The PDB file is a Visual Studio specific file that has the debugging symbols for your project. You can ignore those messages, unless you’re hoping to step into the code for those dlls with the debugger (which is doubtful, as those are system dlls). In other words, you can and should ignore them, as you … Read more

Horrible redraw performance of the DataGridView on one of my two screens

January 29, 2023 by Tarik

You just need to make a custom class based off of DataGridView so you can enable its DoubleBuffering. That’s it! class CustomDataGridView: DataGridView { public CustomDataGridView() { DoubleBuffered = true; } } As long as all of my instances of the grid are using this custom version, all is well. If I ever run into … Read more

nvidia-smi Volatile GPU-Utilization explanation?

January 14, 2023 by Tarik

It is a sampled measurement over a time period. For a given time period, it reports what percentage of time one or more GPU kernel(s) was active (i.e. running). It doesn’t tell you anything about how many SMs were used, or how “busy” the code was, or what it was doing exactly, or in what … Read more

Streaming multiprocessors, Blocks and Threads (CUDA)

January 14, 2023 by Tarik

The thread / block layout is described in detail in the CUDA programming guide. In particular, chapter 4 states: The CUDA architecture is built around a scalable array of multithreaded Streaming Multiprocessors (SMs). When a CUDA program on the host CPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to … Read more

NVIDIA vs AMD: GPGPU performance

January 2, 2023 by Tarik

Metaphorically speaking ati has a good engine compared to nvidia. But nvidia has a better car 😀 This is mostly because nvidia has invested good amount of its resources (in money and people) to develop important libraries required for scientific computing (BLAS, FFT), and then a good job again in promoting it. This may be … Read more

What is a bank conflict? (Doing Cuda/OpenCL programming)

December 31, 2022 by Tarik

For nvidia (and amd for that matter) gpus the local memory is divided into memorybanks. Each bank can only address one dataset at a time, so if a halfwarp tries to load/store data from/to the same bank the access has to be serialized (this is a bank conflict). For gt200 gpus there are 16 banks … Read more

Is it possible to run CUDA on AMD GPUs?

December 30, 2022 by Tarik

Nope, you can’t use CUDA for that. CUDA is limited to NVIDIA hardware. OpenCL would be the best alternative. Khronos itself has a list of resources. As does the StreamComputing.eu website. For your AMD specific resources, you might want to have a look at AMD’s APP SDK page. Note that at this time there are … Read more

How do I select which GPU to run a job on?

December 18, 2022 by Tarik

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly. To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using export CUDA_VISIBLE_DEVICES=1 or CUDA_VISIBLE_DEVICES=1 ./cuda_executable The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation. … Read more