Discussion: 8172978: Remove Interpreter TOS optimization

Sat Feb 18 10:46:59 UTC 2017

Hi,

I've seen Max has run plenty of tests on our internal performance
infrastructure and everything I've seen there seems to corroborate the
idea that this removal is OK from a performance point of view, the
footprint improvements are small but significant and any negative
performance impact on throughput benchmarks is at noise levels even
with -Xint (it appears many benchmarks time out with this setting
both before and after, though; Max, let's discuss offline how to
deal with that :-))

I expect this will be tested more thoroughly once adapted to all
platforms (which I assume is the intent?), but see no concern from
a performance testing point of view: Do it!

Thanks!

/Claes

On 2017-02-16 16:40, Daniel D. Daugherty wrote:
> Hi Max,
>
> Added a note to your bug. Interesting idea, but I think your data is
> a bit incomplete at the moment.
>
> Dan
>
>
> On 2/15/17 3:18 PM, Max Ockner wrote:
>> Hello all,
>>
>> We have filed a bug to remove the interpreter stack caching
>> optimization for jdk10.  Ideally we can make this change *early*
>> during the jdk10 development cycle. See below for justification:
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8172978
>>
>> Stack caching has been around for a long time and is intended to
>> replace some of the load/store (pop/push) operations with
>> corresponding register operations. The need for this optimization
>> arose before caching could adequately lessen the burden of memory
>> access. We have reevaluated the JVM stack caching optimization and
>> have found that it has a high memory footprint and is very costly to
>> maintain, but does not provide significant measurable or theoretical
>> benefit for us when used with modern hardware.
>>
>> Minimal Theoretical Benefit.
>> Because modern hardware does not slap us with the same cost for
>> accessing memory as it once did, the benefit of replacing memory
>> access with register access is far less dramatic now than it once was.
>> Additionally, the interpreter runs for a relatively short time before
>> relevant code sections are compiled. When the VM starts running
>> compiled code instead of interpreted code, performance should begin to
>> move asymptotically towards that of compiled code, diluting any
>> performance penalties from the interpreter to small performance
>> variations.
>>
>> No Measurable Benefit.
>> Please see the results files attached in the bug page.  This change
>> was adapted for x86 and sparc, and interpreter performance was
>> measured with Specjvm98 (run with -Xint).  No significant decrease in
>> performance was observed.
>>
>> Memory footprint and code complexity.
>> Stack caching in the JVM is implemented by switching the instruction
>> look-up table depending on the tos (top-of-stack) state. At any moment
>> there are is an active table consisting of one dispatch table for each
>> of the 10 tos states.  When we enter a safepoint, we copy all 10
>> safepoint dispatch tables into the active table.  The additional entry
>> code makes this copy less efficient and makes any work in the
>> interpreter harder to debug.
>>
>> If we remove this optimization, we will:
>>   - decrease memory usage in the interpreter,
>>   - eliminated wasteful memory transactions during safepoints,
>>   - decrease code complexity (a lot).
>>
>> Please let me know what you think.
>> Thanks,
>> Max
>>
>