Collatz Conjecture Python – Incorrect Output Above 2 Trillion (Only!)

Question

This line:

n = int(n/2)

… converts n to a float, divides that float by 2, then converts back to an int by throwing away the fractional part.

For integers up to 2**52, converting to float is lossless, but for anything larger, it has to round to the nearest 53-bit number, which loses information.

Of course 2 trillion is well under that 2**53 limit for float precision—but the Collatz sequence starting at N frequently goes much, much higher than N. It’s not at all implausible that many numbers around 2 trillion have sequences that go past 2**53, while very few numbers below it do. It’s even possible that a whole long sequence of numbers starting at exactly 2 trillion go past 2**53 but not a single number below it does. But I have no idea how to prove such a thing without building the entire sequence for every number up to 2 trillion. (If there is a proof, it would probably lean heavily on the existing partial proofs of the conjecture under various different conditions, which are above my paygrade…)

Anyway, the solution is simple: you want to use integer division:

n = n // 2

Here’s an example to demonstrate:

>>> n = 2**53 + 3
>>> n
9007199254740995
>>> int(n/2)
4503599627370498
>>> n//2
4503599627370497

To verify that this is actually happening in your code, try this:

def collatz(n):
    overflow = False
    i = 0
    while n > 1:
        if n > 2**53:
            overflow=True
        if n % 2 == 0:
            n = int(n/2)
            i += 1
        else:
            n = int((3*n)+1)
            i += 1
    return i, overflow

if __name__ == '__main__':
    import sys
    for arg in sys.argv[1:]:
        num = int(arg.replace(',', ''))
        result, overflow = collatz(num)
        print(f'{arg:>30}: {result:10,} {overflow}')

When I run this:

$ python3 collatz.py 989,345,275,647 1,122,382,791,663 1,444,338,092,271 1,899,148,184,679 2,081,751,768,559 2,775,669,024,745 3,700,892,032,993 3,743,559,068,799

… it gives me:

           989,345,275,647:      1,348 False
         1,122,382,791,663:      1,356 False
         1,444,338,092,271:      1,408 False
         1,899,148,184,679:      1,411 False
         2,081,751,768,559:        385 True
         2,775,669,024,745:        388 True
         3,700,892,032,993:        391 True
         3,743,559,068,799:        497 True

So, we went past 2**53 in exactly the same cases where we got the wrong answer.

And to verify the fix, change the int(n/2) to n//2:

           989,345,275,647:      1,348 False
         1,122,382,791,663:      1,356 False
         1,444,338,092,271:      1,408 False
         1,899,148,184,679:      1,411 False
         2,081,751,768,559:      1,437 True
         2,775,669,024,745:      1,440 True
         3,700,892,032,993:      1,443 True
         3,743,559,068,799:      1,549 True

So, why is it always off by the same amount?

Well, that’s mostly just a coincidence of the specific numbers you happen to be using.

When you pass 2**53 via 3n+1, you’re going to convert either the last bit, or the last 2 bits, to 0, which means you normally cut off a big part of the chain and replace it with just 1 or 2 divisions. But there are obviously going to be a few numbers where the chain you end up jumping to is longer than the correct one. In fact, it only took me 3 tries to find one: 3,743,559,068,799,123 should take 326 steps, but it takes 370.

I suspect (but again, I can’t even imagine how to prove) that many big numbers are going to end up in that same range around 375, a little shorter as they get (logarithmically) bigger. Why? Well, there’s only so many numbers you can round to—and most of them are probably in cycles with each other you start doing truncating division. So, let’s say that almost every number near 2**53 has a rounding cycle length of a bit over 50, and most numbers in the trillions range reach that 2**53 range in a bit over 300 steps… then most of them are going to end up around 375. (Those numbers are pulled out of thin air, of course, but you could do a Monte Carlo simulation to see how far from reality they actually are…)

Leave a Comment Cancel reply