RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2]
Ioi Lam
iklam at openjdk.java.net
Wed Dec 15 06:26:06 UTC 2021
On Wed, 15 Dec 2021 04:19:35 GMT, Ioi Lam <iklam at openjdk.org> wrote:
>> We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other.
>>
>> When CDS is disabled, we do not see such variations.
>>
>> In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details).
>>
>> We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects.
>>
>> My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore.
>>
>> As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size.
>>
>> I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks.
>
> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision:
>
> added comments about the location of vtable_len
> Hi Ioi,
>
> The fix looks fine.
>
> this is interesting to me, because in the context of Lilliput ([openjdk/lilliput#13](https://github.com/openjdk/lilliput/pull/13)) I was kind of counting on CDS to intermix Klass and non-class metadata, since that way CDS uses the larger Klass alignment gaps. In fact, I have this wild idea to shape metaspace in that form, merging Klass and non-class metadata into one larger class space. It would be really good to have a better idea of these interactions.
>
> What tool did you use to measure the dcache misses?
>
> Cheers, Thomas
Hi Thomas,
@ericcaspole did the measurements so he will have more information, but I believe he used https://github.com/jvm-profiling-tools/async-profiler to generate traces like this (which I pasted into the bug report):
Column 1: cycles (125424 events)
Column 2: l1d_pend_miss.pending_cycles (56716 events)
Column 3: CYCLE_ACTIVITY.CYCLES_L2_MISS (66170 events)
0.08% 0.02% 0.03% 0x00007f488cda2dc8: mov 0x10(%r10),%r11d
12.26% 16.97% 16.23% 0x00007f488cda2dcc: lea 0x1b8(%r10,%r11,8),%r11
@vnkozlov I found that most Klasses in CDS are preceded by a Method. Does the jitted code write into a Method often?
-------------
PR: https://git.openjdk.java.net/jdk/pull/6838
More information about the hotspot-dev
mailing list