Why is the simpler loop slower?

Question

I checked the source code of the bytecode (python 3.11.6) and found that in the decompiled bytecode, it seems that only JUMP_BACKWARD will execute a warmup function, which will trigger specialization in python 3.11 when executed enough times:

PyObject* _Py_HOT_FUNCTION
_PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int throwflag)
{
    /* ... */
        TARGET(JUMP_BACKWARD) {
            _PyCode_Warmup(frame->f_code);
            JUMP_TO_INSTRUCTION(JUMP_BACKWARD_QUICK);
        }
    /* ... */
}

static inline void
_PyCode_Warmup(PyCodeObject *code)
{
    if (code->co_warmup != 0) {
        code->co_warmup++;
        if (code->co_warmup == 0) {
            _PyCode_Quicken(code);
        }
    }
}

_{Among all bytecodes, only JUMP_BACKWARD and RESUME will call _PyCode_Warmup().}

Specialization appears to speed up multiple bytecodes used, resulting in a significant increase in speed:

void
_PyCode_Quicken(PyCodeObject *code)
{
    /* ... */
            switch (opcode) {
                case EXTENDED_ARG:  /* ... */
                case JUMP_BACKWARD: /* ... */
                case RESUME:        /* ... */
                case LOAD_FAST:     /* ... */
                case STORE_FAST:    /* ... */
                case LOAD_CONST:    /* ... */
            }
    /* ... */
}

After executing once, the bytecode of complex changed, while simple did not:

In [_]: %timeit -n 1 -r 1 complex(10 ** 8)
2.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [_]: dis(complex, adaptive=True)
  5           0 RESUME_QUICK             0

  6           2 NOP

  7           4 LOAD_FAST                0 (n)
              6 POP_JUMP_FORWARD_IF_TRUE     2 (to 12)

  8           8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

  9     >>   12 LOAD_FAST__LOAD_CONST     0 (n)
             14 LOAD_CONST               2 (1)
             16 BINARY_OP_SUBTRACT_INT    23 (-=)
             20 STORE_FAST               0 (n)

  6          22 JUMP_BACKWARD_QUICK     10 (to 4)

In [_]: %timeit -n 1 -r 1 simple(10 ** 8)
4.78 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

In [_]: dis(simple, adaptive=True)
  1           0 RESUME                   0

  2           2 LOAD_FAST                0 (n)
              4 POP_JUMP_FORWARD_IF_FALSE     9 (to 24)

  3     >>    6 LOAD_FAST                0 (n)
              8 LOAD_CONST               1 (1)
             10 BINARY_OP               23 (-=)
             14 STORE_FAST               0 (n)

  2          16 LOAD_FAST                0 (n)
             18 POP_JUMP_BACKWARD_IF_TRUE     7 (to 6)
             20 LOAD_CONST               0 (None)
             22 RETURN_VALUE
        >>   24 LOAD_CONST               0 (None)
             26 RETURN_VALUE

Leave a Comment Cancel reply