Using AVX intrinsics instead of SSE does not improve speed — why?

This is because VSQRTPS (AVX instruction) takes exactly twice as many cycles as SQRTPS (SSE instruction) on a Sandy Bridge processor. See Agner Fog’s optimize guide: instruction tables, page 88.

Instructions like square root and division don’t benefit from AVX. On the other hand, additions, multiplications, etc., do.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)