I checked the source code of the bytecode (python 3.11.6) and found that in the decompiled bytecode, it seems that only JUMP_BACKWARD
will execute a warmup function, which will trigger specialization in python 3.11 when executed enough times:
PyObject* _Py_HOT_FUNCTION
_PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int throwflag)
{
/* ... */
TARGET(JUMP_BACKWARD) {
_PyCode_Warmup(frame->f_code);
JUMP_TO_INSTRUCTION(JUMP_BACKWARD_QUICK);
}
/* ... */
}
static inline void
_PyCode_Warmup(PyCodeObject *code)
{
if (code->co_warmup != 0) {
code->co_warmup++;
if (code->co_warmup == 0) {
_PyCode_Quicken(code);
}
}
}
Among all bytecodes, only JUMP_BACKWARD
and RESUME
will call _PyCode_Warmup()
.
Specialization appears to speed up multiple bytecodes used, resulting in a significant increase in speed:
void
_PyCode_Quicken(PyCodeObject *code)
{
/* ... */
switch (opcode) {
case EXTENDED_ARG: /* ... */
case JUMP_BACKWARD: /* ... */
case RESUME: /* ... */
case LOAD_FAST: /* ... */
case STORE_FAST: /* ... */
case LOAD_CONST: /* ... */
}
/* ... */
}
After executing once, the bytecode of complex
changed, while simple
did not:
In [_]: %timeit -n 1 -r 1 complex(10 ** 8)
2.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
In [_]: dis(complex, adaptive=True)
5 0 RESUME_QUICK 0
6 2 NOP
7 4 LOAD_FAST 0 (n)
6 POP_JUMP_FORWARD_IF_TRUE 2 (to 12)
8 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
9 >> 12 LOAD_FAST__LOAD_CONST 0 (n)
14 LOAD_CONST 2 (1)
16 BINARY_OP_SUBTRACT_INT 23 (-=)
20 STORE_FAST 0 (n)
6 22 JUMP_BACKWARD_QUICK 10 (to 4)
In [_]: %timeit -n 1 -r 1 simple(10 ** 8)
4.78 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
In [_]: dis(simple, adaptive=True)
1 0 RESUME 0
2 2 LOAD_FAST 0 (n)
4 POP_JUMP_FORWARD_IF_FALSE 9 (to 24)
3 >> 6 LOAD_FAST 0 (n)
8 LOAD_CONST 1 (1)
10 BINARY_OP 23 (-=)
14 STORE_FAST 0 (n)
2 16 LOAD_FAST 0 (n)
18 POP_JUMP_BACKWARD_IF_TRUE 7 (to 6)
20 LOAD_CONST 0 (None)
22 RETURN_VALUE
>> 24 LOAD_CONST 0 (None)
26 RETURN_VALUE