status of VM long loop optimizations - call for action
Ty Young
youngty1997 at gmail.com
Fri Dec 10 21:34:59 UTC 2021
A simple write benchmark I had already made for specialized
VarHandles(AKA insertCoordinates) seems to get about 1ns consistently
faster, so I guess these changes helped a bit?
Before:
Benchmark Mode Cnt Score Error Units
VarHandleBenchmark.genericHandleBenchmark avgt 5 21.155 ± 0.145
ns/op
VarHandleBenchmark.specFinalHandleBenchmark avgt 5 0.678 ± 0.201
ns/op
VarHandleBenchmark.specHandleBenchmark avgt 5 17.323 ± 1.324
ns/op
After:
Benchmark Mode Cnt Score Error Units
VarHandleBenchmark.genericHandleBenchmark avgt 5 20.304 ± 1.466
ns/op
VarHandleBenchmark.specFinalHandleBenchmark avgt 5 0.652 ± 0.156
ns/op
VarHandleBenchmark.specHandleBenchmark avgt 5 17.266 ± 1.712
ns/op
Benchmark:
public static final MemorySegment SEGMENT =
MemorySegment.allocateNative(ValueLayout.JAVA_INT,
ResourceScope.newSharedScope());
public static final VarHandle GENERIC_HANDLE =
MemoryHandles.varHandle(ValueLayout.JAVA_INT);
public static VarHandle SPEC_HANDLE =
MemoryHandles.insertCoordinates(GENERIC_HANDLE, 0, SEGMENT, 0);
public static final VarHandle SPEC_HANDLE_FINAL =
MemoryHandles.insertCoordinates(GENERIC_HANDLE, 0, SEGMENT, 0);
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void genericHandleBenchmark()
{
GENERIC_HANDLE.set(SEGMENT, 0, 5);
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void specHandleBenchmark()
{
SPEC_HANDLE.set(5);
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void specFinalHandleBenchmark()
{
SPEC_HANDLE_FINAL.set(5);
}
Sort of off-topic but... I don't remember anyone saying previously that
insertCoordinates would give that big of a difference(or any at all!) so
it's surprising to me. I was expecting a performance decrease due to the
handle no longer being static-final. Can javac maybe optimize this so
that any case where:
GENERIC_HANDLE.set(SEGMENT, 0, 5);
is, an optimized VarHandle is created at compile time that is equivalent
to SPEC_HANDLE and inserted there instead?
On 12/10/21 4:55 AM, Maurizio Cimadamore wrote:
> (resending since mailing lists were down yesterday - I apologize if
> this results in duplicates).
>
> Hi,
> few days ago some VM enhancements were integrated [1, 2], so it is
> time to take a look again at where we are.
>
> I put together a branch which removes all workarounds (both for long
> loops and for alignment checks):
>
> https://github.com/mcimadamore/jdk/tree/long_loop_workarounds_removal
>
> I also ran memory access benchmarks before/after, to see what the
> difference is like - here's a visual report:
>
> https://jmh.morethan.io/?gists=dfa7075db33f7e6a2690ac80a64aa252,7f894f48460a6a0c9891cbe3158b43a7
>
>
> Overall, I think the numbers are solid. The branch w/o workarounds
> keep up with mainline in basically all cases but one (UnrolledAccess -
> this code pattern needs more work in the VM, but Roland Westrelin has
> identified a possible fix for it). In some cases (parallel tests) we
> see quite a big jump forward.
>
> I think it's hard to say how these results will translate in real
> world - my gut feeling is that the simpler bound checking logic will
> almost invariably result in performance improvements with more complex
> code patterns, despite what synthetic benchmark might say (the current
> logic in mainline is fragile as it has to guard against integer
> overflow, which in turns sometimes kills BCE optimizations).
>
> So I'd be inclined to integrate these changes in 18.
>
> If you gave a project that works agaist the Java 18 API, it would be
> very helpful for us if you could try it on the above branch and report
> back. This will help us make a more informed decision.
>
> Cheers
> Maurizio
>
> [1] - https://bugs.openjdk.java.net/browse/JDK-8276116
> [2] - https://bugs.openjdk.java.net/browse/JDK-8277850
>
>
>
More information about the panama-dev
mailing list