You are measuring empty benchmarks, not empty methods. In other words, measuring the minimal infrastructure code that handles the benchmark itself. This is easy to dissect, because you’d expect only a few instructions on the hot path. JMH’s -prof perfasm
or -prof xperfasm
would give you those hottest instructions in seconds.
I think the effect is due to Thread-Local Handshakes (JEP 312), see:
8u191: 0.389 ± 0.029 ns/op
[so far so good]
3.60% ↗ ...a2: movzbl 0x94(%r8),%r10d
0.63% │ ...aa: add $0x1,%rbp
32.82% │ ...ae: test %eax,0x1765654c(%rip) ; global safepoint poll
58.14% │ ...b4: test %r10d,%r10d
╰ ...b7: je ...a2
11.0.2: 0.585 ± 0.014 ns/op [oops, regression]
0.31% ↗ ...70: movzbl 0x94(%r9),%r10d
0.19% │ ...78: mov 0x108(%r15),%r11 ; reading the thread-local poll addr
25.62% │ ...7f: add $0x1,%rbp
35.10% │ ...83: test %eax,(%r11) ; thread-local safepoint poll
34.91% │ ...86: test %r10d,%r10d
╰ ...89: je ...70
11.0.2, -XX:-ThreadLocalHandshakes: 0.399 ± 0.048 ns/op [back to 8u perf]
5.64% ↗ ...62: movzbl 0x94(%r8),%r10d
0.91% │ ...6a: add $0x1,%rbp
34.36% │ ...6e: test %eax,0x179be88c(%rip) ; global safepoint poll
54.79% │ ...74: test %r10d,%r10d
╰ ...77: je ...62
I think this is largely visible mostly in tight loops like this one.
UPD: Hopefully, more details here.