Discussion: 8172978: Remove Interpreter TOS optimization

Thu Feb 16 15:40:49 UTC 2017

Hi Max,

Added a note to your bug. Interesting idea, but I think your data is
a bit incomplete at the moment.

Dan

On 2/15/17 3:18 PM, Max Ockner wrote:
> Hello all,
>
> We have filed a bug to remove the interpreter stack caching 
> optimization for jdk10.  Ideally we can make this change *early* 
> during the jdk10 development cycle. See below for justification:
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8172978
>
> Stack caching has been around for a long time and is intended to 
> replace some of the load/store (pop/push) operations with 
> corresponding register operations. The need for this optimization 
> arose before caching could adequately lessen the burden of memory 
> access. We have reevaluated the JVM stack caching optimization and 
> have found that it has a high memory footprint and is very costly to 
> maintain, but does not provide significant measurable or theoretical 
> benefit for us when used with modern hardware.
>
> Minimal Theoretical Benefit.
> Because modern hardware does not slap us with the same cost for 
> accessing memory as it once did, the benefit of replacing memory 
> access with register access is far less dramatic now than it once was. 
> Additionally, the interpreter runs for a relatively short time before 
> relevant code sections are compiled. When the VM starts running 
> compiled code instead of interpreted code, performance should begin to 
> move asymptotically towards that of compiled code, diluting any 
> performance penalties from the interpreter to small performance 
> variations.
>
> No Measurable Benefit.
> Please see the results files attached in the bug page.  This change 
> was adapted for x86 and sparc, and interpreter performance was 
> measured with Specjvm98 (run with -Xint).  No significant decrease in 
> performance was observed.
>
> Memory footprint and code complexity.
> Stack caching in the JVM is implemented by switching the instruction 
> look-up table depending on the tos (top-of-stack) state. At any moment 
> there are is an active table consisting of one dispatch table for each 
> of the 10 tos states.  When we enter a safepoint, we copy all 10 
> safepoint dispatch tables into the active table.  The additional entry 
> code makes this copy less efficient and makes any work in the 
> interpreter harder to debug.
>
> If we remove this optimization, we will:
>   - decrease memory usage in the interpreter,
>   - eliminated wasteful memory transactions during safepoints,
>   - decrease code complexity (a lot).
>
> Please let me know what you think.
> Thanks,
> Max
>