This is because VSQRTPS
(AVX instruction) takes exactly twice as many cycles as SQRTPS
(SSE instruction) on a Sandy Bridge processor. See Agner Fog’s optimize guide: instruction tables, page 88.
Instructions like square root and division don’t benefit from AVX. On the other hand, additions, multiplications, etc., do.