The most fundamental thing here probably is that you don’t want to transmit static images but only changes to the images, which essentially is analogous to video stream.
My best guess is some very efficient (and heavily specialized and optimized) motion compensation algorithm, because most of the actual change in generic desktop usage is linear movement of elements (scrolling text, moving windows, etc. opposed to transformation of elements).
The DirectX 3D performance of 1 FPS seems to confirm my guess to some extent.