Merging gc state tests for two dependent loads
Roman Kennke
rkennke at redhat.com
Fri Mar 25 15:21:26 UTC 2022
Hello Yude,
I have analyzed the full disassembly of that method:
https://gist.github.com/rkennke/e2610ec8b87e481edc2ebfe66f98c1a0
The problem is here:
0x00007f2621981d2b: mov 0x2b0(%r15),%rsi
0x00007f2621981d32: mov (%rsi),%rbx
<--- Seemingly missing LRB here --->
0x00007f2621981d35: mov 0x58(%rbx),%r13
;*invokestatic currentThread {reexecute=0 rethrow=0 return_oop=0}
; -
java.lang.ThreadLocal::set at 0 (line 219)
0x00007f2621981d39: testb $0x1,0x20(%r15)
0x00007f2621981d3e: jne 0x00007f2621981df1
The merged LRBs are found at 0x00007f2621981df1:
;; B14: # out( B27 B15 ) <- in( B1 ) Freq: 0,000999987
<--- LRB mid-path for %rbx --->
0x00007f2621981df1: mov %rbx,%r10
0x00007f2621981df4: shr $0x16,%r10
0x00007f2621981df8: cmpb $0x0,0x10000(%r10)
0x00007f2621981e00: jne 0x00007f2621981f34
;*invokestatic currentThread {reexecute=0 rethrow=0 return_oop=0}
; -
java.lang.ThreadLocal::set at 0 (line 219)
;; B15: # out( B28 B16 ) <- in( B14 B27 ) Freq: 0,000999987
<--- LRB midpath for %13 --->
0x00007f2621981e06: mov %r13,%r10
0x00007f2621981e09: shr $0x16,%r10
0x00007f2621981e0d: cmpb $0x0,0x10000(%r10)
0x00007f2621981e15: jne 0x00007f2621981f50
Now, the *slow* paths are here:
; B27: # out( B15 ) <- in( B14 ) Freq: 9,99974e-07
0x00007f2621981f34: mov %rbx,%rdi
0x00007f2621981f37: movabs $0x7f2634778de0,%r10
0x00007f2621981f41: callq *%r10
0x00007f2621981f44: mov %rax,%rbx
<--- Reloads %r13 if necessary --->
0x00007f2621981f47: mov 0x58(%rbx),%r13
0x00007f2621981f4b: jmpq 0x00007f2621981e06
IOW, the fast-path speculatively assumes that GC is inactive, and fixes
it in the slow-path, by re-doing the "mov 0x58(%rbx),%r13" when necessary.
@Roland can you confirm this?
Thanks,
Roman
> Copied from 8283243 for visibility:
>
> --------
>
> Shenandoah will merge two gc state tests (from two LRBs) if they satisfy certain control flow constraint (in ShenandoahBarrierC2Support::merge_back_to_back_tests). I think there could be an issue if the two loads the two LRBs guard are dependent.
>
> To reproduce, I ran java -XX:-UseCompressedOops -XX:+UseShenandoahGC -XX:CompileCommand=print,*ThreadLocal.set -jar SPECjvm2008.jar -coe -ict -ikv -i 2 -bt 8 derby
>
> The C2 outcome for ThreadLocal.set has:
>
> ..
> 0: mov 0x268(%r15),%rsi ; Thread t = Thread.currentThread(); // get oop handle
> 1: mov (%rsi),%r12 ; Thread t = Thread.currentThread(); // get oop out of oop handle
> 2: mov 0x58(%r12),%rbx ; ThreadLocalMap map = getMap(t);
> 3: nopl 0x0(%rax)
> 4: testb $0x1,0x20(%r15) ; gc state test is too late here
> 5: jne 0x00007f0a3508f6e7
> ..
>
> There is no barrier between line 1 and line 2. There was a barrier, but it was merged with the barrier for line 2, (the gc state tests are merged) and the merged test is placed at line 4. The problem is line 2 could visit from-space object and get outdated data.
>
> If I comment out the call to ShenandoahBarrierC2Support::merge_back_to_back_tests, the outcome is more like what I expected:
>
> ..
> 0: mov 0x268(%r15),%rsi
> 1: mov (%rsi),%r12
> 2: nopw 0x0(%rax,%rax,1)
> 3: testb $0x1,0x20(%r15) ; first gc state test
> 4: jne 0x00007ff55d077207
> 5: mov 0x58(%r12),%rbx
> 6: testb $0x1,0x20(%r15) ; second gc state test
> 7: jne 0x00007ff55d077235
> ..
>
> I think the merging didn't consider that the second load could use the result from the first load (in this case, r12). Can anyone confirm if this is an issue?
More information about the shenandoah-dev
mailing list