Discussion: 8172978: Remove Interpreter TOS optimization

Sun Feb 19 22:11:11 UTC 2017

On 2/18/17 11:14 AM, coleen.phillimore at oracle.com wrote:
> When Max gets back from the long weekend, he'll post the platforms in 
> your bug.
>
> It's amazing that for -Xint there's no significant difference. I've 
> seen -Xint performance of 15% slower cause a 2% slowdown with server 
> but that was before tiered compilation.

I should clarify this.  I've seen this slowdown for *different* 
interpreter optimizations, which *can* affect server performance.  I was 
measuring specjvm98 on linux x64.   If there's no significant difference 
for this TOS optimization, there is no chance of a degredation in 
overall performance.

Coleen
>
> The reason for this query was to see what developers for the other 
> platform ports think, since this change would affect all of the 
> platforms.
>
> Thanks,
> Coleen
>
> On 2/18/17 10:50 AM, Daniel D. Daugherty wrote:
>> If Claes is happy with the perf testing, then I'm happy. :-)
>>
>> Dan
>>
>>
>> On 2/18/17 3:46 AM, Claes Redestad wrote:
>>> Hi,
>>>
>>> I've seen Max has run plenty of tests on our internal performance
>>> infrastructure and everything I've seen there seems to corroborate the
>>> idea that this removal is OK from a performance point of view, the
>>> footprint improvements are small but significant and any negative
>>> performance impact on throughput benchmarks is at noise levels even
>>> with -Xint (it appears many benchmarks time out with this setting
>>> both before and after, though; Max, let's discuss offline how to
>>> deal with that :-))
>>>
>>> I expect this will be tested more thoroughly once adapted to all
>>> platforms (which I assume is the intent?), but see no concern from
>>> a performance testing point of view: Do it!
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 2017-02-16 16:40, Daniel D. Daugherty wrote:
>>>> Hi Max,
>>>>
>>>> Added a note to your bug. Interesting idea, but I think your data is
>>>> a bit incomplete at the moment.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 2/15/17 3:18 PM, Max Ockner wrote:
>>>>> Hello all,
>>>>>
>>>>> We have filed a bug to remove the interpreter stack caching
>>>>> optimization for jdk10.  Ideally we can make this change *early*
>>>>> during the jdk10 development cycle. See below for justification:
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8172978
>>>>>
>>>>> Stack caching has been around for a long time and is intended to
>>>>> replace some of the load/store (pop/push) operations with
>>>>> corresponding register operations. The need for this optimization
>>>>> arose before caching could adequately lessen the burden of memory
>>>>> access. We have reevaluated the JVM stack caching optimization and
>>>>> have found that it has a high memory footprint and is very costly to
>>>>> maintain, but does not provide significant measurable or theoretical
>>>>> benefit for us when used with modern hardware.
>>>>>
>>>>> Minimal Theoretical Benefit.
>>>>> Because modern hardware does not slap us with the same cost for
>>>>> accessing memory as it once did, the benefit of replacing memory
>>>>> access with register access is far less dramatic now than it once 
>>>>> was.
>>>>> Additionally, the interpreter runs for a relatively short time before
>>>>> relevant code sections are compiled. When the VM starts running
>>>>> compiled code instead of interpreted code, performance should 
>>>>> begin to
>>>>> move asymptotically towards that of compiled code, diluting any
>>>>> performance penalties from the interpreter to small performance
>>>>> variations.
>>>>>
>>>>> No Measurable Benefit.
>>>>> Please see the results files attached in the bug page. This change
>>>>> was adapted for x86 and sparc, and interpreter performance was
>>>>> measured with Specjvm98 (run with -Xint).  No significant decrease in
>>>>> performance was observed.
>>>>>
>>>>> Memory footprint and code complexity.
>>>>> Stack caching in the JVM is implemented by switching the instruction
>>>>> look-up table depending on the tos (top-of-stack) state. At any 
>>>>> moment
>>>>> there are is an active table consisting of one dispatch table for 
>>>>> each
>>>>> of the 10 tos states.  When we enter a safepoint, we copy all 10
>>>>> safepoint dispatch tables into the active table.  The additional 
>>>>> entry
>>>>> code makes this copy less efficient and makes any work in the
>>>>> interpreter harder to debug.
>>>>>
>>>>> If we remove this optimization, we will:
>>>>>   - decrease memory usage in the interpreter,
>>>>>   - eliminated wasteful memory transactions during safepoints,
>>>>>   - decrease code complexity (a lot).
>>>>>
>>>>> Please let me know what you think.
>>>>> Thanks,
>>>>> Max
>>>>>
>>>>
>>
>