RFR: 8278020: ~13% variation in Renaissance-Scrabble [v2]

Ioi Lam iklam at openjdk.java.net
Wed Dec 15 06:26:06 UTC 2021


On Wed, 15 Dec 2021 04:19:35 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> We found that when CDS is enabled, there is a ~13% variation in the Renaissance-Scrabble benchmark between different builds of the JDK. In one example, only two core-lib classes, unrelated to the benchmark, changed between two builds, but one build is consistently faster than the other.
>> 
>> When CDS is disabled, we do not see such variations.
>> 
>> In the slow case, there seems to be frequent dcache misses when loading the `Klass::_vtable_len` field, which is at offset 24 from the beginning of the Klass (see [bug report](https://bugs.openjdk.java.net/browse/JDK-8278020) for details). 
>> 
>> We suspect that the problem is with the layout of the CDS archive. Specifically, in CDS, Klass objects are inter-mixed with other metadata objects (such as Methods). In contrast, when CDS is disabled, (on 64-bit platforms with compressed klass pointers), Klass objects are allocated in their own space, separated from other metadata objects.
>> 
>> My theory is: when CDS is enabled, perhaps the modification of an object that sits immediately above the Klass invalidates the cacheline that holds `Klass::_vtable_len`. In a different JDK build, the exact addresses of the metadata objects in the CDS archive may be slightly nudged so we don't see the cacheline effect anymore.
>> 
>> As an experiment, I swapped `Klass::_vtable_len` with `Klass::_modifier_flags` (which was at offset 164 before this patch), and the variation stopped. Both fields are 32 bits in size.
>> 
>> I have no concrete proof that my theory is correct, but this change seems to be harmless. @ericcaspole has run all the benchmarks in Oracle's CI and found consistent improvement with Renaissance-Scrabble, and no degradation in other benchmarks.
>
> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision:
> 
>   added comments about the location of vtable_len

> Hi Ioi,
> 
> The fix looks fine.
> 
> this is interesting to me, because in the context of Lilliput ([openjdk/lilliput#13](https://github.com/openjdk/lilliput/pull/13)) I was kind of counting on CDS to intermix Klass and non-class metadata, since that way CDS uses the larger Klass alignment gaps. In fact, I have this wild idea to shape metaspace in that form, merging Klass and non-class metadata into one larger class space. It would be really good to have a better idea of these interactions.
> 
> What tool did you use to measure the dcache misses?
> 
> Cheers, Thomas

Hi Thomas,

@ericcaspole did the measurements so he will have more information, but I believe he used https://github.com/jvm-profiling-tools/async-profiler to generate traces like this (which I pasted into the bug report):


 Column 1: cycles (125424 events)
 Column 2: l1d_pend_miss.pending_cycles (56716 events)
 Column 3: CYCLE_ACTIVITY.CYCLES_L2_MISS (66170 events)

  0.08%   0.02%   0.03%     0x00007f488cda2dc8:  mov  0x10(%r10),%r11d
 12.26%  16.97%  16.23%     0x00007f488cda2dcc:  lea  0x1b8(%r10,%r11,8),%r11


@vnkozlov I found that most Klasses in CDS are preceded by a Method. Does the jitted code write into a Method often?

-------------

PR: https://git.openjdk.java.net/jdk/pull/6838


More information about the hotspot-dev mailing list