Discussion: 8172978: Remove Interpreter TOS optimization

Wed Feb 15 23:18:32 UTC 2017

Yes, that’s a good idea.  And with AOT it should be even less of a problem.

> On Feb 15, 2017, at 12:18 PM, Max Ockner <max.ockner at oracle.com> wrote:
> 
> Hello all,
> 
> We have filed a bug to remove the interpreter stack caching optimization for jdk10.  Ideally we can make this change *early* during the jdk10 development cycle. See below for justification:
> 
> Bug: https://bugs.openjdk.java.net/browse/JDK-8172978
> 
> Stack caching has been around for a long time and is intended to replace some of the load/store (pop/push) operations with corresponding register operations. The need for this optimization arose before caching could adequately lessen the burden of memory access. We have reevaluated the JVM stack caching optimization and have found that it has a high memory footprint and is very costly to maintain, but does not provide significant measurable or theoretical benefit for us when used with modern hardware.
> 
> Minimal Theoretical Benefit.
> Because modern hardware does not slap us with the same cost for accessing memory as it once did, the benefit of replacing memory access with register access is far less dramatic now than it once was. Additionally, the interpreter runs for a relatively short time before relevant code sections are compiled. When the VM starts running compiled code instead of interpreted code, performance should begin to move asymptotically towards that of compiled code, diluting any performance penalties from the interpreter to small performance variations.
> 
> No Measurable Benefit.
> Please see the results files attached in the bug page.  This change was adapted for x86 and sparc, and interpreter performance was measured with Specjvm98 (run with -Xint).  No significant decrease in performance was observed.
> 
> Memory footprint and code complexity.
> Stack caching in the JVM is implemented by switching the instruction look-up table depending on the tos (top-of-stack) state. At any moment there are is an active table consisting of one dispatch table for each of the 10 tos states.  When we enter a safepoint, we copy all 10 safepoint dispatch tables into the active table.  The additional entry code makes this copy less efficient and makes any work in the interpreter harder to debug.
> 
> If we remove this optimization, we will:
>  - decrease memory usage in the interpreter,
>  - eliminated wasteful memory transactions during safepoints,
>  - decrease code complexity (a lot).
> 
> Please let me know what you think.
> Thanks,
> Max
>