Efficient floating-point division with constant integer divisors
Let me restart for the third time. We are trying to accelerate q = x / y where y is an integer constant, and q, x, and y are all IEEE 754-2008 binary32 floating-point values. Below, fmaf(a,b,c) indicates a fused multiply-add a * b + c using binary32 values. The naive algorithm is via a … Read more