Discussion: 8172978: Remove Interpreter TOS optimization

Sat Feb 18 15:50:55 UTC 2017

If Claes is happy with the perf testing, then I'm happy. :-)

Dan

On 2/18/17 3:46 AM, Claes Redestad wrote:
> Hi,
>
> I've seen Max has run plenty of tests on our internal performance
> infrastructure and everything I've seen there seems to corroborate the
> idea that this removal is OK from a performance point of view, the
> footprint improvements are small but significant and any negative
> performance impact on throughput benchmarks is at noise levels even
> with -Xint (it appears many benchmarks time out with this setting
> both before and after, though; Max, let's discuss offline how to
> deal with that :-))
>
> I expect this will be tested more thoroughly once adapted to all
> platforms (which I assume is the intent?), but see no concern from
> a performance testing point of view: Do it!
>
> Thanks!
>
> /Claes
>
> On 2017-02-16 16:40, Daniel D. Daugherty wrote:
>> Hi Max,
>>
>> Added a note to your bug. Interesting idea, but I think your data is
>> a bit incomplete at the moment.
>>
>> Dan
>>
>>
>> On 2/15/17 3:18 PM, Max Ockner wrote:
>>> Hello all,
>>>
>>> We have filed a bug to remove the interpreter stack caching
>>> optimization for jdk10.  Ideally we can make this change *early*
>>> during the jdk10 development cycle. See below for justification:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8172978
>>>
>>> Stack caching has been around for a long time and is intended to
>>> replace some of the load/store (pop/push) operations with
>>> corresponding register operations. The need for this optimization
>>> arose before caching could adequately lessen the burden of memory
>>> access. We have reevaluated the JVM stack caching optimization and
>>> have found that it has a high memory footprint and is very costly to
>>> maintain, but does not provide significant measurable or theoretical
>>> benefit for us when used with modern hardware.
>>>
>>> Minimal Theoretical Benefit.
>>> Because modern hardware does not slap us with the same cost for
>>> accessing memory as it once did, the benefit of replacing memory
>>> access with register access is far less dramatic now than it once was.
>>> Additionally, the interpreter runs for a relatively short time before
>>> relevant code sections are compiled. When the VM starts running
>>> compiled code instead of interpreted code, performance should begin to
>>> move asymptotically towards that of compiled code, diluting any
>>> performance penalties from the interpreter to small performance
>>> variations.
>>>
>>> No Measurable Benefit.
>>> Please see the results files attached in the bug page.  This change
>>> was adapted for x86 and sparc, and interpreter performance was
>>> measured with Specjvm98 (run with -Xint).  No significant decrease in
>>> performance was observed.
>>>
>>> Memory footprint and code complexity.
>>> Stack caching in the JVM is implemented by switching the instruction
>>> look-up table depending on the tos (top-of-stack) state. At any moment
>>> there are is an active table consisting of one dispatch table for each
>>> of the 10 tos states.  When we enter a safepoint, we copy all 10
>>> safepoint dispatch tables into the active table.  The additional entry
>>> code makes this copy less efficient and makes any work in the
>>> interpreter harder to debug.
>>>
>>> If we remove this optimization, we will:
>>>   - decrease memory usage in the interpreter,
>>>   - eliminated wasteful memory transactions during safepoints,
>>>   - decrease code complexity (a lot).
>>>
>>> Please let me know what you think.
>>> Thanks,
>>> Max
>>>
>>