RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v3]
Thomas Stuefe
stuefe at openjdk.org
Tue Aug 29 10:14:26 UTC 2023
> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`.
>
> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift.
>
>
> 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 <UseCompressedClassPointers>
> 8b7b69: 0f b6 00 movzbl (%rax),%eax
> 8b7b6c: 84 c0 test %al,%al
> 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260>
> 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE>
> 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi
> 8b7b7e: 8b 0a mov (%rdx),%ecx
> 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE>
> 8b7b87: 48 d3 e7 shl %cl,%rdi
> 8b7b8a: 48 03 3a add (%rdx),%rdi
>
>
> Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag.
>
>
> 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE>
> 8ba309: 48 8b 08 mov (%rax),%rcx
> 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers?
> 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260>
> 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi
> 8ba318: 48 d3 e7 shl %cl,%rdi # shift
> 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base
> 8ba31e: 48 01 cf add %rcx,%rdi # add base
> 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx
>
> ---
>
> Performance measurements:
>
> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances.
>
> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ˜4%. Still, in general, numbers seemed to go down rather than up.
>
> ---
>
> Future extensions:
>
> This patch uses the fact that the encoding base is aligned to metaspace reserve alignment (16 Mb). We only use 16 of those 24 bits of alignment shadow and could us...
Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
- APH feedback
- Merge branch 'master' into optimize-narrow-klass-decoding-in-c++
- fix -UseCCP case
- use 16 bit alignment
- with raw bit ops
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/15389/files
- new: https://git.openjdk.org/jdk/pull/15389/files/12d19a06..09a7971e
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=02
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=01-02
Stats: 6820 lines in 258 files changed: 4661 ins; 797 del; 1362 mod
Patch: https://git.openjdk.org/jdk/pull/15389.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/15389/head:pull/15389
PR: https://git.openjdk.org/jdk/pull/15389
More information about the hotspot-dev
mailing list