What is actually a Queue family in Vulkan?

Question

To understand queue families, you first have to understand queues.

A queue is something you submit command buffers to, and command buffers submitted to a queue are executed in order[*1] relative to each other. Command buffers submitted to different queues are unordered relative to each other unless you explicitly synchronize them with VkSemaphore. You can only submit work to a queue from one thread at a time, but different threads can submit work to different queues simultaneously.

Each queue can only perform certain kinds of operations. Graphics queues can run graphics pipelines started by vkCmdDraw* commands. Compute queues can run compute pipelines started by vkCmdDispatch*. Transfer queues can perform transfer (copy) operations from vkCmdCopy*. Sparse binding queues can change the binding of sparse resources to memory with vkQueueBindSparse (note this is an operation submitted directly to a queue, not a command in a command buffer). Some queues can perform multiple kinds of operations. In the spec, every command that can be submitted to a queue have a “Command Properties” table that lists what queue types can execute the command.

A queue family just describes a set of queues with identical properties. So in your example, the device supports three kinds of queues:

One kind can do graphics, compute, transfer, and sparse binding operations, and you can create up to 16 queues of that type.
Another kind can only do transfer operations, and you can only create one queue of this kind. Usually this is for asynchronously DMAing data between host and device memory on discrete GPUs, so transfers can be done concurrently with independent graphics/compute operations.
Finally, you can create up to 8 queues that are only capable of compute operations.

Some queues might only correspond to separate queues in the host-side scheduler, other queues might correspond to actual independent queues in hardware. For example, many GPUs only have one hardware graphics queue, so even if you create two VkQueues from a graphics-capable queue family, command buffers submitted to those queues will progress through the kernel driver’s command buffer scheduler independently, but will execute in some serial order on the GPU. But some GPUs have multiple compute-only hardware queues, so two VkQueues for a compute-only queue family might actually proceed independently and concurrently all the way through the GPU. Vulkan doesn’t expose this.

Bottom line, decide how many queues you can usefully use, based on how much concurrency you have. For many apps, a single “universal” queue is all they need. More advanced ones might have one graphics+compute queue, a separate compute-only queue for asynchronous compute work, and a transfer queue for async DMA. Then map what you’d like onto what’s available; you may need to do your own multiplexing, e.g. on a device that doesn’t have a compute-only queue family, you might create multiple graphics+compute queues instead, or serialize your async compute jobs onto your single graphics+compute queue yourself.

[*1] Oversimplifying a bit. They start in order, but are allowed to proceed independently after that and complete out of order. Independent progress of different queues is not guaranteed though. I’ll leave it at that for this question.

Leave a Comment Cancel reply