I am not going to try to answer the question about SIMD speedup, but provide some detailed comments on poor coding in the scalar version that carried over to the vector version, in a way that doesn’t fit in an SO comment.
This code in Intersect(Circle)
is just absurd:
// Step 3: compute the substitutions, check if there is a collision.
float a = dx * dx + dy * dy;
sum of squares -> a
is guaranteed non-negative
float b = r + s;
float c = p * p + q * q - cr * cr;
float DSqrt = b * b - 4 * a * c;
This isn’t the square root of D, it’s the square of D.
// no collision possible! Commented out to make the benchmark more fair
//if (DSqrt < 0)
//{ return false; }
// Step 4: compute the substitutions.
float D = (float)Math.Sqrt(DSqrt);
sqrt
has a limited domain. Avoiding the call for negative input doesn’t just save the average cost of the square root, it prevents you from having to handle NaNs which are very, very slow.
Also, D
is non-negative, since Math.Sqrt
returns either the positive branch or NaN.
float t0 = (-b + D) / (2 * a);
float t1 = (-b - D) / (2 * a);
The difference between these two is t0 - t1 = D / a
. The ratio of two non-negative variables is also non-negative. Therefore t1
is never larger.
float ti = Math.Min(t0, t1);
This call always selects t1
. Computing t0
and testing which is larger is a waste.
if(ti > 0 && ti < t)
{
t = ti;
return true;
}
Recalling that ti
is always t1
, and a
is non-negative, the first test is equivalent to -b - D > 0
or b < -D
.
In the SIMD version, Vector.SquareRoot
documentation does not describe the behavior when inputs are negative. And Vector.LessThan
does not describe the behavior when inputs are NaN.