RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v6]

Aleksey Shipilev shade at openjdk.org
Fri Oct 6 17:21:14 UTC 2023


On Thu, 5 Oct 2023 06:06:01 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`.
>> 
>> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift.
>> 
>> 
>>   8b7b62:   48 8d 05 7f 1b c3 00    lea    0xc31b7f(%rip),%rax        # 14e96e8 <UseCompressedClassPointers>
>>   8b7b69:   0f b6 00                movzbl (%rax),%eax
>>   8b7b6c:   84 c0                   test   %al,%al
>>   8b7b6e:   0f 84 9c 00 00 00       je     8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260>
>>   8b7b74:   48 8d 15 05 62 c6 00    lea    0xc66205(%rip),%rdx        # 151dd80 <_ZN23CompressedKlassPointers6_shiftE>
>>   8b7b7b:   8b 7b 08                mov    0x8(%rbx),%edi
>>   8b7b7e:   8b 0a                   mov    (%rdx),%ecx
>>   8b7b80:   48 8d 15 01 62 c6 00    lea    0xc66201(%rip),%rdx        # 151dd88 <_ZN23CompressedKlassPointers5_baseE>
>>   8b7b87:   48 d3 e7                shl    %cl,%rdi
>>   8b7b8a:   48 03 3a                add    (%rdx),%rdi
>> 
>> 
>>   Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag.
>> 
>> 
>>   8ba302:   48 8d 05 97 9c c2 00    lea    0xc29c97(%rip),%rax        # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE>
>>   8ba309:   48 8b 08                mov    (%rax),%rcx
>>   8ba30c:   f6 c5 01                test   $0x1,%ch        # use compressed klass pointers?
>>   8ba30f:   0f 84 9b 00 00 00       je     8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260>
>>   8ba315:   8b 7b 08                mov    0x8(%rbx),%edi
>>   8ba318:   48 d3 e7                shl    %cl,%rdi        # shift 
>>   8ba31b:   66 31 c9                xor    %cx,%cx         # zero out lower 16 bits of base
>>   8ba31e:   48 01 cf                add    %rcx,%rdi       # add base 
>>   8ba321:   8b 4f 08                mov    0x8(%rdi),%ecx
>> 
>> ---
>> 
>> Performance measurements: 
>> 
>> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances.
>> 
>> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ˜4%. Still, in general, numbers seemed to go down rather than up.
>> 
>> ---
>> 
>> Future extensions: 
>> 
>> This patch uses the fact that the encoding base is aligned to metaspace reser...
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits:
> 
>  - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++
>  - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++
>  - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++
>  - APH feedback
>  - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++
>  - fix -UseCCP case
>  - use 16 bit alignment
>  - with raw bit ops

A few stylistic comments.

What is confusing to me is that combo flag initialization is basically conditional on `UseCompressedClassPointers` (i.e. assert in new `set_base_and_shift`). But at the same time, we ask for `CompressedKlassPointers::use_compressed_klass_pointers()` in `oop`methods. This works "only" because the -UseCCP generates the same combo as the initial value of `0`? Seems fragile. I wonder if we want to initialize combo unconditionally.

src/hotspot/share/oops/compressedKlass.cpp line 40:

> 38: 
> 39: void CompressedKlassPointers::set_base_and_shift(address thebase, int theshift) {
> 40:   assert(UseCompressedClassPointers, "Why are we here?");

All other Hotspot asserts have this form:

Suggestion:

  assert(UseCompressedClassPointers, "only for compressed klass code");

src/hotspot/share/oops/compressedKlass.cpp line 46:

> 44: 
> 45:   // we keep a composite word, `_combo`, containing base+shift+UseCCP, to load
> 46:   // all three information with a single 64-bit load.

Suggestion:

  // Encode all three base+shift+UseCCP into a single 64-bit word.
  // This would allow optimizing the fast-path with a single load.

src/hotspot/share/oops/compressedKlass.cpp line 53:

> 51:   _combo = (uint64_t)_base | (uint64_t)_shift | (1 << bitpos_useccp);
> 52: 
> 53:   // validate combo.

Suggestion:

src/hotspot/share/oops/compressedKlass.cpp line 56:

> 54:   assert(base() == _base, "combo encoding");
> 55:   assert(shift() == _shift, "combo encoding");
> 56:   assert(use_compressed_class_pointers() == UseCompressedClassPointers, "combo encoding");

Pre-condition assert means `UseCompressedClassPointers` is always `true` here, can simplify the assert.

src/hotspot/share/oops/compressedKlass.hpp line 66:

> 64:   // compiler.
> 65:   // - Bit  [0-7]   shift
> 66:   // - Bit  8       UseCompressedOops

Suggestion:

  // - Bit  8       UseCompressedClassPointers

-------------

PR Review: https://git.openjdk.org/jdk/pull/15389#pullrequestreview-1662323705
PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348986318
PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348990132
PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348990824
PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348989259
PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1348994530


More information about the hotspot-dev mailing list