RFR: 8280481: Duplicated stubs to interpreter for static calls [v2]

Thu Jun 30 21:29:39 UTC 2022

On Thu, 30 Jun 2022 08:45:21 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> @vnkozlov, with updating to the latest sources everything passed: https://github.com/eastig/jdk/actions/runs/2583924985
>
> Hi @eastig ,
> Are these memory saving shown on Renaissance with in-lining disabled ? 
> Since static methods resolutions happen at compile time smaller methods may get inlined thus removing emission of stub. 
> 
> We are improving memory footprint of a method in code cache, does it also leads to some improvement in bench mark throughput by any means ?  
> 
> Once the code cache is full runtime attempts allocation extension if reserved code cache has space, followed by a [slow path of disabling the compilation.](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/codeCache.cpp#L595) which may impact the performance.

Hi @jatin-bhateja,

> Are these memory saving shown on Renaissance with in-lining disabled ?

Except the Java heap size tuning, JVM was run in the default configuration. No changes to inlining were done.

> Since static methods resolutions happen at compile time smaller methods may get inlined thus removing emission of stub.

You are correct. You can see this in data: the total number of nmethods with shared stubs. For arm64 a stub to the interpreter is 8 instructions. For x86 it is just 3 instructions: mov, jmp and nop. Or in terms of code size: 32 bytes arm64 vs 16 bytes x86. Arm64 gets more benefits from the patch than x86.

> We are improving memory footprint of a method in code cache, does it also leads to some improvement in bench mark throughput by any means ?

There are a few patches, including this one, which improve memory footprint of a method, each of them separately does not show much performance improvement. However all together they demonstrate performance improvements, especially in benchmarks with tens of thousands of methods. For example DaCapo eclipse shows ~4% improvements on arm64.

> Once the code cache is full runtime attempts allocation extension if reserved code cache has space, followed by a [slow path of disabling the compilation.](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/codeCache.cpp#L595) which may impact the performance.

It's even worse than this. Andrew mentioned CodeCache trashing in PR to change the default CodeCache size from 240M to 127 for arm64. With CodeCache trashing compilation does not stop. You constantly throw away compiled code, jump to the interpreter, recompile and jump to the newly compiled code. We've got evidence from a real service it is not a good idea to reduce the default size. The service had CodeCache trashing. Performance degradation caused by CodeCache trashing is huge and really hard to detect.

-------------

PR: https://git.openjdk.org/jdk/pull/8816