opencl – Tarik Billa

OpenCL vs OpenMP performance [closed]

December 16, 2023 by Tarik

The benchmarks I’ve seen indicate that OpenCL and OpenMP running on the same hardware are usually comparable in performance, or OpenMP has slightly better performance. However, I haven’t seen any benchmarks that I would consider conclusive, because they’ve been mostly lacking in detailed explanations of their methodology. However, there are a few useful things to … Read more

OpenCL, Vulkan, Sycl

August 25, 2023 by Tarik

How does OpenCL relates to vulkan ? I understand that OpenCL is higher level and abstracts the devices, but does ( or could ) it uses Vulkan internally ? They’re not related to each other at all. Well, they do technically use the same intermediate shader language, but Vulkan forbids the Kernel execution model, and … Read more

Causes for CL_INVALID_WORK_GROUP_SIZE

August 18, 2023 by Tarik

CL_DEVICE_MAX_WORK_GROUP_SIZE should return a single size_t value (for example 512, but I don’t know what it’d be on your system). This is the maximum number of work-items in a work-group, not the maximum in each dimension. So in your case you are trying to make a 2D work-group with 32*32 = 1024 work-items, and presumably … Read more

How to get a “random” number in OpenCL

August 9, 2023 by Tarik

I was solving this “no random” issue for last few days and I came up with three different approaches: Xorshift – I created generator based on this one. All you have to do is provide one uint2 number (seed) for whole kernel and every work item will compute his own rand number // ‘randoms’ is … Read more

Debugger for OpenCL [closed]

July 25, 2023 by Tarik

You may also want to look into CodeXL: https://gpuopen.com/compute-product/codexl/ CodeXL was originally developed by AMD, but was later released as an open-source project.

How do I use local memory in OpenCL?

June 8, 2023 by Tarik

Check out the samples in the NVIDIA or AMD SDKs, they should point you in the right direction. Matrix transpose would use local memory for example. Using your squaring kernel, you could stage the data in an intermediate buffer. Remember to pass in the additional parameter. __kernel square( __global float *input, __global float *output, __local … Read more

Questions about global and local work size

May 17, 2023 by Tarik

In general you can choose global_work_size as big as you want, while local_work_size is constraint by the underlying device/hardware, so all query results will tell you the possible dimensions for local_work_size instead of the global_work_size. the only constraint for the global_work_size is that it must be a multiple of the local_work_size (for each dimension). The … Read more

Barriers in OpenCL

May 12, 2023 by Tarik

As you have stated, barriers may only synchronize threads in the same workgroup. There is no way to synchronize different workgroups in a kernel. Now to answer your question, the specification was not clear to me either, but it seems to me that section 6.11.9 contains the answer: CLK_LOCAL_MEM_FENCE – The barrier function will either … Read more