Discussion: 8172978: Remove Interpreter TOS optimization

Wed Feb 15 22:18:50 UTC 2017

Hello all,

We have filed a bug to remove the interpreter stack caching optimization 
for jdk10.  Ideally we can make this change *early* during the jdk10 
development cycle. See below for justification:

Bug: https://bugs.openjdk.java.net/browse/JDK-8172978

Stack caching has been around for a long time and is intended to replace 
some of the load/store (pop/push) operations with corresponding register 
operations. The need for this optimization arose before caching could 
adequately lessen the burden of memory access. We have reevaluated the 
JVM stack caching optimization and have found that it has a high memory 
footprint and is very costly to maintain, but does not provide 
significant measurable or theoretical benefit for us when used with 
modern hardware.

Minimal Theoretical Benefit.
Because modern hardware does not slap us with the same cost for 
accessing memory as it once did, the benefit of replacing memory access 
with register access is far less dramatic now than it once was. 
Additionally, the interpreter runs for a relatively short time before 
relevant code sections are compiled. When the VM starts running compiled 
code instead of interpreted code, performance should begin to move 
asymptotically towards that of compiled code, diluting any performance 
penalties from the interpreter to small performance variations.

No Measurable Benefit.
Please see the results files attached in the bug page.  This change was 
adapted for x86 and sparc, and interpreter performance was measured with 
Specjvm98 (run with -Xint).  No significant decrease in performance was 
observed.

Memory footprint and code complexity.
Stack caching in the JVM is implemented by switching the instruction 
look-up table depending on the tos (top-of-stack) state. At any moment 
there are is an active table consisting of one dispatch table for each 
of the 10 tos states.  When we enter a safepoint, we copy all 10 
safepoint dispatch tables into the active table.  The additional entry 
code makes this copy less efficient and makes any work in the 
interpreter harder to debug.

If we remove this optimization, we will:
   - decrease memory usage in the interpreter,
   - eliminated wasteful memory transactions during safepoints,
   - decrease code complexity (a lot).

Please let me know what you think.
Thanks,
Max