RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v9]
Aleksey Shipilev
shade at openjdk.org
Wed Nov 8 10:43:03 UTC 2023
On Thu, 26 Oct 2023 16:11:04 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`.
>>
>> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift.
>>
>>
>> 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 <UseCompressedClassPointers>
>> 8b7b69: 0f b6 00 movzbl (%rax),%eax
>> 8b7b6c: 84 c0 test %al,%al
>> 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260>
>> 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE>
>> 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi
>> 8b7b7e: 8b 0a mov (%rdx),%ecx
>> 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE>
>> 8b7b87: 48 d3 e7 shl %cl,%rdi
>> 8b7b8a: 48 03 3a add (%rdx),%rdi
>>
>>
>> Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag.
>>
>>
>> 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE>
>> 8ba309: 48 8b 08 mov (%rax),%rcx
>> 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers?
>> 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260>
>> 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi
>> 8ba318: 48 d3 e7 shl %cl,%rdi # shift
>> 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base
>> 8ba31e: 48 01 cf add %rcx,%rdi # add base
>> 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx
>>
>> ---
>>
>> Performance measurements:
>>
>> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances.
>>
>> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ˜4%. Still, in general, numbers seemed to go down rather than up.
>>
>> ---
>>
>> Future extensions:
>>
>> This patch uses the fact that the encoding base is aligned to metaspace reser...
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits:
>
> - Renamed _combo
> - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++
> - simplify assert
> - add comment
> - Update src/hotspot/share/oops/compressedKlass.hpp
>
> Co-authored-by: Aleksey Shipilëv <shipilev at amazon.de>
> - Update src/hotspot/share/oops/compressedKlass.cpp
>
> Co-authored-by: Aleksey Shipilëv <shipilev at amazon.de>
> - Update src/hotspot/share/oops/compressedKlass.cpp
>
> Co-authored-by: Aleksey Shipilëv <shipilev at amazon.de>
> - Update src/hotspot/share/oops/compressedKlass.cpp
>
> Co-authored-by: Aleksey Shipilëv <shipilev at amazon.de>
> - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++
> - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++
> - ... and 6 more: https://git.openjdk.org/jdk/compare/9864951d...56cde2a9
Looks okay, but I still have a few questions.
src/hotspot/share/oops/compressedKlass.cpp line 36:
> 34: size_t CompressedKlassPointers::_range = 0;
> 35: // Note: initialization value is unchanged for -UseCompressedClassPointers, so
> 36: // the bit mirroring UseCompressedClassPointers is off and maches the switch.
Suggestion:
// the bit mirroring UseCompressedClassPointers is off and matches the switch.
src/hotspot/share/oops/compressedKlass.cpp line 45:
> 43: assert(theshift == 0 || theshift == LogKlassAlignmentInBytes, "invalid shift for klass ptrs");
> 44: _base = thebase;
> 45: _shift = theshift;
Do we even need `_base` and `_shift` as separate fields after this change then?
src/hotspot/share/oops/compressedKlass.hpp line 67:
> 65: // - Bit [0-7] shift
> 66: // - Bit 8 UseCompressedClassPointers
> 67: // - Bits [16-64] the base.
Suggestion:
// - Bit [0-7] shift
// - Bit 8 UseCompressedClassPointers
// - Bits [16-64] heap base
src/hotspot/share/oops/compressedKlass.hpp line 68:
> 66: // - Bit 8 UseCompressedClassPointers
> 67: // - Bits [16-64] the base.
> 68: static uint64_t _compressionInfo;
C++ style:
Suggestion:
static uint64_t _compression_info;
-------------
PR Review: https://git.openjdk.org/jdk/pull/15389#pullrequestreview-1719981601
PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1386397717
PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1386414395
PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1386416204
PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1386415450
More information about the hotspot-dev
mailing list