Even shorter, slightly sweeter:
(x[1:] + x[:-1]) / 2
This is faster:
>>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "x[:-1] + numpy.diff(x)/2" 100 loops, best of 3: 6.03 msec per loop >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "(x[1:] + x[:-1]) / 2" 100 loops, best of 3: 4.07 msec per loop
This is perfectly accurate:
Consider each element in
x[1:] + x[:-1]. So consider
x₁, the first and second elements.
x₀ + x₁is calculated to perfect precision and then rounded, in accordance to IEEE. It would therefore be the correct answer if that was all that was needed.
(x₀ + x₁) / 2is just half of that value. This can almost always be done by reducing the exponent by one, except in two cases:
x₀ + x₁overflows. This will result in an infinity (of either sign). That’s not what is wanted, so the calculation will be wrong.
x₀ + x₁underflows. As the size is reduced, rounding will be perfect and thus the calculation will be correct.
In all other cases, the calculation will be correct.
x[:-1] + numpy.diff(x) / 2. This, by inspection of the source, evaluates directly to
x[:-1] + (x[1:] - x[:-1]) / 2
and so consider again
x₁ - x₀will have severe “problems” with underflow for many values. This will also lose precision with large cancellations. It’s not immediately clear that this doesn’t matter if the signs are the same, though, as the error effectively cancels out on addition. What does matter is that rounding occurs.
(x₁ - x₀) / 2will be no less rounded, but then
x₀ + (x₁ - x₀) / 2involves another rounding. This means that errors will creep in. Proof:
import numpy wins = draws = losses = 0 for _ in range(100000): a = numpy.random.random() b = numpy.random.random() / 0.146 x = (a+b)/2 y = a + (b-a)/2 error_mine = (a-x) - (x-b) error_theirs = (a-y) - (y-b) if x != y: if abs(error_mine) < abs(error_theirs): wins += 1 elif abs(error_mine) == abs(error_theirs): draws += 1 else: losses += 1 else: draws += 1 wins / 1000 #>>> 12.44 draws / 1000 #>>> 87.56 losses / 1000 #>>> 0.0
This shows that for the carefully chosen constant of
1.46, a full 12-13% of answers are wrong with the
diffvariant! As expected, my version is always right.
Now consider underflow. Although my variant has overflow problems, these are much less big a deal than cancellation problems. It should be obvious why the double-rounding from the above logic is very problematic. Proof:
... a = numpy.random.random() b = -numpy.random.random() ... wins / 1000 #>>> 25.149 draws / 1000 #>>> 74.851 losses / 1000 #>>> 0.0
Yeah, it gets 25% wrong!
In fact, it doesn’t take much pruning to get this up to 50%:
... a = numpy.random.random() b = -a + numpy.random.random()/256 ... wins / 1000 #>>> 49.188 draws / 1000 #>>> 50.812 losses / 1000 #>>> 0.0
Well, it’s not that bad. It’s only ever 1 least-significant-bit off as long as the signs are the same, I think.
So there you have it. My answer is the best unless you’re finding the average of two values whose sum exceeds
1.7976931348623157e+308 or is smaller than