C# and SIMD: High and low speedups. What is happening?

Question

I am not going to try to answer the question about SIMD speedup, but provide some detailed comments on poor coding in the scalar version that carried over to the vector version, in a way that doesn’t fit in an SO comment.

This code in Intersect(Circle) is just absurd:

// Step 3: compute the substitutions, check if there is a collision.
float a = dx * dx + dy * dy;

sum of squares -> a is guaranteed non-negative

float b = r + s;
float c = p * p + q * q - cr * cr;

float DSqrt = b * b - 4 * a * c;

This isn’t the square root of D, it’s the square of D.

// no collision possible! Commented out to make the benchmark more fair
//if (DSqrt < 0)
//{ return false; }

// Step 4: compute the substitutions.
float D = (float)Math.Sqrt(DSqrt);

sqrt has a limited domain. Avoiding the call for negative input doesn’t just save the average cost of the square root, it prevents you from having to handle NaNs which are very, very slow.

Also, D is non-negative, since Math.Sqrt returns either the positive branch or NaN.

float t0 = (-b + D) / (2 * a);
float t1 = (-b - D) / (2 * a);

The difference between these two is t0 - t1 = D / a. The ratio of two non-negative variables is also non-negative. Therefore t1 is never larger.

float ti = Math.Min(t0, t1);

This call always selects t1. Computing t0 and testing which is larger is a waste.

if(ti > 0 && ti < t)
{
    t = ti;
    return true;
}

Recalling that ti is always t1, and a is non-negative, the first test is equivalent to -b - D > 0 or b < -D.

In the SIMD version, Vector.SquareRoot documentation does not describe the behavior when inputs are negative. And Vector.LessThan does not describe the behavior when inputs are NaN.

Leave a Comment Cancel reply