In Java there is a separate bytecode instruction for allocating multidimensional arrays – multianewarray
.
newArray
benchmark usesmultianewarray
bytecode;newArray2
invokes simplenewarray
in the loop.
The problem is that HotSpot JVM has no fast path* for multianewarray
bytecode. This instruction is always executed in VM runtime. Therefore, the allocation is not inlined in the compiled code.
The first benchmark has to pay performance penalty of switching between Java and VM Runtime contexts. Also, the common allocation code in the VM runtime (written in C++) is not as optimized as inlined allocation in JIT-compiled code, just because it is generic, i.e. not optimized for the particular object type or for the particular call site, it performs additional runtime checks, etc.
Here are the results of profiling both benchmarks with async-profiler. I used JDK 11.0.4, but for JDK 8 the picture looks similar.
In the first case, 99% time is spent inside OptoRuntime::multianewarray2_C
– the C++ code in the VM runtime.
In the second case, the most of the graph is green, meaning that the program runs mostly in Java context, actually executing JIT-compiled code optimized specifically for the given benchmark.
EDIT
* Just to clarify: in HotSpot multianewarray
is not optimized very well by design. It is rather costly to implement such a complex operation in both JIT compilers properly, while the benefits of such optimization would be questionable: allocation of multidimentional arrays is rarely a performance bottleneck in a typical application.