Apparent regression in memory segment access in JDK 23?
Chris Hegarty
chegar999 at gmail.com
Fri Sep 27 16:17:41 UTC 2024
Thanks for your quick reply Maurizio.
Lemme try out those and I'll report back.
-Chris.
On 27/09/2024 16:19, Maurizio Cimadamore wrote:
> So,
> some concrete things for you to try out, which might help us diagnose
> this further (we think we have some idea of what's going on).
>
> One thing to try, is to disable var handle guards. You can do that by
> passing the following option to the launcher:
>
> ```
> -Djava.lang.invoke.VarHandle.VAR_HANDLE_GUARDS=false
> ```
>
> Another is to force inlining for MethodHandle::asType. This can be done
> with:
>
> ```
> -XX:CompileCommand=inline,java.lang.invoke.MethodHandle::asType
> ```
>
> (and you can repeat that command if you see that other methods get in
> the way).
>
> Hopefully one of these should work.
>
> Cheers
> Maurizio
>
> On 27/09/2024 15:43, Maurizio Cimadamore wrote:
>> Hi Chris,
>> the only thing I can think of is:
>>
>> https://git.openjdk.org/jdk/pull/19251
>>
>> Which I believe you already zeroed in on.
>>
>> We ran all our benchmarks before/after the fix:
>>
>> https://jmh.morethan.io/?sources=https://corsproxy.io/?https://cr.openjdk.org/~mcimadamore/jdk/8331865/loop_over_00_baseline.json,https://corsproxy.io/?https://cr.openjdk.org/~mcimadamore/jdk/8331865/loop_over_01_8331865.json
>>
>> And the results were largely neutral. The patch should not add more
>> checks, but mostly reshuffle the existing checks.
>>
>> But from your inlining traces, it does seem that something fails to
>> inline - which then means bound check elimination will not be applied.
>>
>> Now, it's hard to say: maybe this code was already 95% near some
>> threshold, and the new code shape (after our fix) pushes it over the
>> fence. Or maybe there's something suboptimal in our fix (but our JMH
>> doesn't seem to indicate that).
>>
>> The var handle caching you mention should affect the number of var
>> handles being created, but not peak performance. E.g. if you see more
>> heap being used, that might be a sign that we're creating multiple
>> copies for the same VH. But creating redundant copies should not
>> prevent inlining... so something else is up here.
>>
>> Can you try against the latest 24 ea build? We made other changes in
>> the area (mainly to improve startup of memory segment var handles). It
>> would be interesting to know if they help bring things back into shape
>> (in which case we might consider to backport the startup fixes).
>>
>> Cheers
>> Maurizio
>>
>>
>> On 27/09/2024 15:21, Chris Hegarty wrote:
>>> Hi,
>>>
>>> I'm trying to track down what appears as a regression when accessing
>>> long values from a memory segment, when moving from JDK 22 to JDK 23.
>>> Approx 100-150% slower.
>>>
>>> There are some details in this GH issue [1], but not a lot more than
>>> what is in this email.
>>>
>>> I'm still debugging, but git bisect on JDK 23 builds was not all that
>>> helpful. I did see some changes in b25 and further in b26 (to restore
>>> a varhandle cache).
>>>
>>> I'm still trying to get a basic jmh benchmark, but so far I've been
>>> unable to reproduce this yet. I'll keep trying
>>>
>>> Any thoughts or comments would be gratefully appreciated.
>>>
>>>
>>> JDK 23:
>>> @ 16
>>> org.apache.lucene.util.packed.DirectReader$DirectPackedReader40::get
>>> (34 bytes) inline (hot)
>>> \-> TypeProfile (11846/11846 counts) =
>>> org/apache/lucene/util/packed/DirectReader$DirectPackedReader40
>>> @ 14
>>> org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl::readLong (31 bytes) inline (hot)
>>> \-> TypeProfile (11494/11494 counts) =
>>> org/apache/lucene/store/MemorySegmentIndexInput$SingleSegmentImpl
>>> @ 8 jdk.internal.foreign.AbstractMemorySegmentImpl::get (12
>>> bytes) force inline by annotation
>>> \-> TypeProfile (11022/11022 counts) =
>>> jdk/internal/foreign/MappedMemorySegmentImpl
>>> @ 1
>>> jdk.internal.foreign.layout.ValueLayouts$AbstractValueLayout::varHandle (43 bytes) force inline by annotation
>>> @ 8 java.lang.invoke.VarHandleGuards::guard_LJ_J (84 bytes)
>>> force inline by annotation
>>> @ 3
>>> java.lang.invoke.IndirectVarHandle::checkAccessModeThenIsDirect (8
>>> bytes) force inline by annotation
>>> @ 2
>>> java.lang.invoke.VarHandle::checkAccessModeThenIsDirect (29 bytes)
>>> force inline by annotation
>>> @ 59 java.lang.invoke.VarHandle::getMethodHandle (41
>>> bytes) force inline by annotation
>>> @ 71 java.lang.invoke.MethodHandle::asType (32 bytes)
>>> failed to inline: already compiled into a big method
>>> @ 75 java.lang.invoke.IndirectVarHandle::asDirect (5
>>> bytes) accessor
>>> @ 80 java.lang.invoke.MethodHandle::invokeBasic(LLJ)J (0
>>> bytes) failed to inline: receiver not constant
>>>
>>> JDK 22:
>>> @ 16
>>> org.apache.lucene.util.packed.DirectReader$DirectPackedReader40::get
>>> (34 bytes) inline (hot)
>>> \-> TypeProfile (6998/6998 counts) =
>>> org/apache/lucene/util/packed/DirectReader$DirectPackedReader40
>>> @ 14
>>> org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl::readLong (31 bytes) inline (hot)
>>> \-> TypeProfile (7475/7475 counts) =
>>> org/apache/lucene/store/MemorySegmentIndexInput$SingleSegmentImpl
>>> @ 8 jdk.internal.foreign.AbstractMemorySegmentImpl::get (12
>>> bytes) force inline by annotation
>>> \-> TypeProfile (10270/10270 counts) =
>>> jdk/internal/foreign/MappedMemorySegmentImpl
>>> @ 1
>>> jdk.internal.foreign.layout.ValueLayouts$AbstractValueLayout::varHandle (26 bytes) force inline by annotation
>>> @ 8 java.lang.invoke.VarHandleGuards::guard_LJ_J (84
>>> bytes) force inline by annotation
>>> @ 3
>>> java.lang.invoke.VarHandle::checkAccessModeThenIsDirect (29 bytes)
>>> force inline by annotation
>>> @ 46 java.lang.invoke.VarForm::getMemberName (38 bytes)
>>> force inline by annotation
>>> @ 49 java.lang.invoke.VarHandleSegmentAsLongs::get (52
>>> bytes) force inline by annotation
>>> @ 14
>>> java.lang.invoke.VarHandleSegmentAsLongs::checkAddress (21 bytes)
>>> force inline by annotation
>>> @ 1 java.util.Objects::requireNonNull (14 bytes)
>>> force inline by annotation
>>> @ 15
>>> jdk.internal.foreign.AbstractMemorySegmentImpl::checkAccess (30
>>> bytes) force inline by annotation
>>> @ 26
>>> jdk.internal.foreign.AbstractMemorySegmentImpl::checkBounds (54
>>> bytes) force inline by annotation
>>> @ 16 jdk.internal.util.Preconditions::checkIndex
>>> (22 bytes) (intrinsic)
>>> @ 24
>>> jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl (5 bytes)
>>> accessor
>>> @ 29
>>> jdk.internal.foreign.NativeMemorySegmentImpl::unsafeGetBase (2 bytes)
>>> inline (hot)
>>> @ 40
>>> java.lang.invoke.VarHandleSegmentAsLongs::offsetPlain (39 bytes)
>>> force inline by annotation
>>> @ 1
>>> jdk.internal.foreign.NativeMemorySegmentImpl::unsafeGetOffset (5
>>> bytes) accessor
>>> @ 13
>>> jdk.internal.foreign.NativeMemorySegmentImpl::maxAlignMask (2 bytes)
>>> inline (hot)
>>> @ 48
>>> jdk.internal.misc.ScopedMemoryAccess::getLongUnaligned (18 bytes)
>>> force inline by annotation
>>> @ 6
>>> jdk.internal.misc.ScopedMemoryAccess::getLongUnalignedInternal (36
>>> bytes) force inline by annotation
>>> @ 5
>>> jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw (33 bytes)
>>> force inline by annotation
>>> @ 15 jdk.internal.misc.Unsafe::getLongUnaligned (12
>>> bytes) inline (hot)
>>> @ 5 jdk.internal.misc.Unsafe::getLongUnaligned
>>> (173 bytes) (intrinsic)
>>> @ 8 jdk.internal.misc.Unsafe::convEndian (16
>>> bytes) inline (hot)
>>> @ 21 java.lang.ref.Reference::reachabilityFence (1
>>> bytes) force inline by annotation
>>>
>>> -Chris
>>>
>>> [1] https://github.com/elastic/elasticsearch/issues/113030
>>>
More information about the panama-dev
mailing list