[16] RFR(XS) 8076985: Allocation path: biased locking + compressed oops code quality
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Jul 2 00:23:56 UTC 2020
https://bugs.openjdk.java.net/browse/JDK-8076985
https://cr.openjdk.java.net/~kvn/8076985/webrev.00/
First, this is about how C2 generates code for *constant* class pointers.
A little history here. When we implemented compressed oops and class pointers we had PermGen and classes were Java
objects. We used the same decoding/encoding code for oops and classes - we used the same register containing Heap Base
address. It was profitable to decode constant class and reuse it [1]. Also we greatly benefited on SPARC since decoding
32-bit constant required 4 instructions instead of up to 7 instructions to load 64-bit constant.
Now compressed class decoding is different and always takes 2 instructions on x86 [2] if either base or shift is not 0.
As result we generated 3 instructions to get full class pointer from compressed 32-bit constant (example for base = 0,
shift = 3):
movl $0x200001d5,%r11d
movabs $0x0,%r10
lea (%r10,%r11,8),%r10
Also when we store compressed class pointer into new object header we don't use register anymore on x86 - keeping it in
register does not help now:
movl $0x200001d5,0x8(%rax)
Aleksey suggested to have only one instruction to load full 64-bit class pointer:
movq $0x100000EA8,%r10
It frees one register and uses 10 bytes instead of up to 20 bytes of code on x86.
In JDK 9 SAP contributed nice change [3] to have choice when to use 'compressed class pointer + decoding' or full
'64-bit constant class pointer'. It significantly simplified changes for this RFE.
I ran performance testing but did not see difference - we don't use biased locking now and as result we don't need to
load prototype header from class. But there are other places where we need load from class.
Thanks,
Vladimir K
[1] https://bugs.openjdk.java.net/browse/JDK-6709093
http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/44abbb0d4c18
To generate instead of this:
movl R11, narrowoop: precise klass jnt/scimark2/Random: 0x000000000083b418:Constant:exact * # compressed ptr
movq R10, precise klass jnt/scimark2/Random: 0x000000000083b418:Constant:exact * # ptr
movq R10, [R10 + #176 (32-bit)] # ptr
movq [RAX], R10 # ptr
movl [RAX + #8 (8-bit)], R11 # compressed ptr
generate this:
movl R11, narrowoop: precise klass Point: 0x00000000007ad518:Constant:exact * # compressed ptr
movq R10, [R12 + R11 << 3 + #176] (compressed oop addressing) # ptr
movq [R8], R10 # ptr
movl [R8 + #8 (8-bit)], R11 # compressed ptr
[2] http://hg.openjdk.java.net/jdk/jdk/file/c5ed42533134/src/hotspot/cpu/x86/macroAssembler_x86.cpp#l4609
[3] https://bugs.openjdk.java.net/browse/JDK-8155729
More information about the hotspot-compiler-dev
mailing list