status of VM long loop optimizations - call for action

Fri Dec 10 22:33:30 UTC 2021

Hi Ty,
there is a simple trick to be sure to get the best performance. 

When you create the VarHandle, call withInvokeExactBehavior [1] on it,
the returned VarHandle will throw an error at runtime instead of trying to convert arguments.

Rémi

[1] https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/invoke/VarHandle.html#withInvokeExactBehavior()

----- Original Message -----
> From: "Ty Young" <youngty1997 at gmail.com>
> To: "Maurizio Cimadamore" <maurizio.cimadamore at oracle.com>, "panama-dev at openjdk.java.net'" <panama-dev at openjdk.java.net>
> Sent: Friday, December 10, 2021 11:18:45 PM
> Subject: Re: status of VM long loop optimizations - call for action

> Yeah, I forgot that. Apologies.
> 
> 
> On 12/10/21 4:06 PM, Maurizio Cimadamore wrote:
>> Hi,
>> I don't think the 1ns difference is real - if you look at the error in
>> the second run is higher than that, so it's in the noise.
>>
>> And, since there's no loop, I don't think this specific kind of
>> benchmark should be affected in any way by the VM improvements. What
>> the VM can help with is to remove bound checks when you keep accessing
>> a segment in a loop, as C2 is now able to correctly apply an
>> optimization called "bound check elimination" or BCE. This
>> optimization is routinely applied on Java array access, but it used to
>> fail for memory segments because the bound of a memory segment is
>> stored in a long variable, not an int.
>>
>> That said, note that you are passing inexact arguments to the var
>> handle (e.g. you are passing an int offset instead of a long one; try
>> to use "0L" instead of "0").
>>
>> Maurizio
>>
>>
>> On 10/12/2021 21:34, Ty Young wrote:
>>> A simple write benchmark I had already made for specialized
>>> VarHandles(AKA insertCoordinates) seems to get about 1ns consistently
>>> faster, so I guess these changes helped a bit?
>>>
>>>
>>> Before:
>>>
>>>
>>> Benchmark                                    Mode  Cnt   Score Error
>>> Units
>>> VarHandleBenchmark.genericHandleBenchmark    avgt    5  21.155 ±
>>> 0.145  ns/op
>>> VarHandleBenchmark.specFinalHandleBenchmark  avgt    5   0.678 ±
>>> 0.201  ns/op
>>> VarHandleBenchmark.specHandleBenchmark       avgt    5  17.323 ±
>>> 1.324  ns/op
>>>
>>>
>>> After:
>>>
>>>
>>> Benchmark                                    Mode  Cnt   Score Error
>>> Units
>>> VarHandleBenchmark.genericHandleBenchmark    avgt    5  20.304 ±
>>> 1.466  ns/op
>>> VarHandleBenchmark.specFinalHandleBenchmark  avgt    5   0.652 ±
>>> 0.156  ns/op
>>> VarHandleBenchmark.specHandleBenchmark       avgt    5  17.266 ±
>>> 1.712  ns/op
>>>
>>>
>>> Benchmark:
>>>
>>>
>>>     public static final MemorySegment SEGMENT =
>>> MemorySegment.allocateNative(ValueLayout.JAVA_INT,
>>> ResourceScope.newSharedScope());
>>>
>>>     public static final VarHandle GENERIC_HANDLE =
>>> MemoryHandles.varHandle(ValueLayout.JAVA_INT);
>>>
>>>     public static VarHandle SPEC_HANDLE =
>>> MemoryHandles.insertCoordinates(GENERIC_HANDLE, 0, SEGMENT, 0);
>>>
>>>     public static final VarHandle SPEC_HANDLE_FINAL =
>>> MemoryHandles.insertCoordinates(GENERIC_HANDLE, 0, SEGMENT, 0);
>>>
>>>     @Benchmark
>>>     @BenchmarkMode(Mode.AverageTime)
>>>     @OutputTimeUnit(TimeUnit.NANOSECONDS)
>>>     public void genericHandleBenchmark()
>>>     {
>>>         GENERIC_HANDLE.set(SEGMENT, 0, 5);
>>>     }
>>>
>>>     @Benchmark
>>>     @BenchmarkMode(Mode.AverageTime)
>>>     @OutputTimeUnit(TimeUnit.NANOSECONDS)
>>>     public void specHandleBenchmark()
>>>     {
>>>         SPEC_HANDLE.set(5);
>>>     }
>>>
>>>     @Benchmark
>>>     @BenchmarkMode(Mode.AverageTime)
>>>     @OutputTimeUnit(TimeUnit.NANOSECONDS)
>>>     public void specFinalHandleBenchmark()
>>>     {
>>>         SPEC_HANDLE_FINAL.set(5);
>>>     }
>>>
>>>
>>> Sort of off-topic but... I don't remember anyone saying previously
>>> that insertCoordinates would give that big of a difference(or any at
>>> all!) so it's surprising to me. I was expecting a performance
>>> decrease due to the handle no longer being static-final. Can javac
>>> maybe optimize this so that any case where:
>>>
>>>
>>> GENERIC_HANDLE.set(SEGMENT, 0, 5);
>>>
>>>
>>> is, an optimized VarHandle is created at compile time that is
>>> equivalent to SPEC_HANDLE and inserted there instead?
>>>
>>>
>>> On 12/10/21 4:55 AM, Maurizio Cimadamore wrote:
>>>> (resending since mailing lists were down yesterday - I apologize if
>>>> this results in duplicates).
>>>>
>>>> Hi,
>>>> few days ago some VM enhancements were integrated [1, 2], so it is
>>>> time to take a look again at where we are.
>>>>
>>>> I put together a branch which removes all workarounds (both for long
>>>> loops and for alignment checks):
>>>>
>>>> https://github.com/mcimadamore/jdk/tree/long_loop_workarounds_removal
>>>>
>>>> I also ran memory access benchmarks before/after, to see what the
>>>> difference is like - here's a visual report:
>>>>
>>>> https://jmh.morethan.io/?gists=dfa7075db33f7e6a2690ac80a64aa252,7f894f48460a6a0c9891cbe3158b43a7
>>>>
>>>>
>>>> Overall, I think the numbers are solid. The branch w/o workarounds
>>>> keep up with mainline in basically all cases but one (UnrolledAccess
>>>> - this code pattern needs more work in the VM, but Roland Westrelin
>>>> has identified a possible fix for it). In some cases (parallel
>>>> tests) we see quite a big jump forward.
>>>>
>>>> I think it's hard to say how these results will translate in real
>>>> world - my gut feeling is that the simpler bound checking logic will
>>>> almost invariably result in performance improvements with more
>>>> complex code patterns, despite what synthetic benchmark might say
>>>> (the current logic in mainline is fragile as it has to guard against
>>>> integer overflow, which in turns sometimes kills BCE optimizations).
>>>>
>>>> So I'd be inclined to integrate these changes in 18.
>>>>
>>>> If you gave a project that works agaist the Java 18 API, it would be
>>>> very helpful for us if you could try it on the above branch and
>>>> report back. This will help us make a more informed decision.
>>>>
>>>> Cheers
>>>> Maurizio
>>>>
>>>> [1] - https://bugs.openjdk.java.net/browse/JDK-8276116
>>>> [2] - https://bugs.openjdk.java.net/browse/JDK-8277850
>>>>
>>>>