It looks like PyTorch’s caching allocator reserves some fixed amount of memory even if there are no tensors, and this allocation is triggered by the first CUDA memory access
(torch.cuda.empty_cache() deletes unused tensor from the cache, but the cache itself still uses some memory).
Even with a tiny 1-element tensor, after del and torch.cuda.empty_cache(), GPUtil.showUtilization(all=True) reports exactly the same amount of GPU memory used as for a huge tensor (and both torch.cuda.memory_cached() and torch.cuda.memory_allocated() return zero).