LRB and 32-bit compressed oops
Aleksey Shipilev
shade at redhat.com
Mon Mar 25 17:17:34 UTC 2019
Hi again,
I was following up on experiments with LRB vs non-LRB, and spotted the thing about 32-bit compressed
oops.
Run the gc-bench test that writes a single int:
https://icedtea.classpath.org/hg/gc-bench/
$ ~/trunks/shenandoah-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -jar
target/benchmarks.jar -jvmArgs "-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC"
writes.Plain.test_int -prof perfasm:printMargin=30 2>&1 | tee lrb.perfasm
Run with -Xmx20g, thus enabling compressed oops, you shall see this:
[Verified Entry Point]
6.94% 0x00007f60c0497050: mov %eax,-0x14000(%rsp)
5.80% 0x00007f60c0497057: push %rbp
0.30% 0x00007f60c0497058: sub $0x10,%rsp
11.81% 0x00007f60c049705c: mov 0xc(%rsi),%r11d
0.82% 0x00007f60c0497060: mov %r11,%r9
0.48% 0x00007f60c0497063: shl $0x3,%r9
.......................... LRB fastpath check ..........................
5.29% 0x00007f60c0497067: testb $0x1,0x20(%r15)
5.49% ╭ 0x00007f60c049706c: jne 0x00007f60c0497086
.........│......... LRB fastpath ends, store to %r9 follows ............
0.87% │↗ ↗↗ 0x00007f60c049706e: movl $0x2a,0xc(%r9)
7.59% ││ ││ 0x00007f60c0497076: add $0x10,%rsp
6.12% ││ ││ 0x00007f60c049707a: pop %rbp
1.01% ││ ││ 0x00007f60c049707b: mov 0x108(%r15),%r10
0.63% ││ ││ 0x00007f60c0497082: test %eax,(%r10)
6.73% ││ ││ 0x00007f60c0497085: retq
---------││-││----------- LRB midpath starts --------------------------
.........│|.|│............ checking in-cset ...........................
↘│ ││ 0x00007f60c0497086: mov %r9,%r10
│ ││ 0x00007f60c0497089: shr $0x17,%r10
│ ││ 0x00007f60c049708d: movabs $0x7f60d00919f0,%r8
│ ││ 0x00007f60c0497097: cmpb $0x0,(%r8,%r10,1)
╰ ││ 0x00007f60c049709c: je 0x00007f60c049706e
............││............ checking is-forwarded ......................
││ 0x00007f60c049709e: mov -0x8(%r12,%r11,8),%r9
││ 0x00007f60c04970a3: lea (%r12,%r11,8),%r10
││ 0x00007f60c04970a7: cmp %r10,%r9
╰│ 0x00007f60c04970aa: jne 0x00007f60c049706e
.............│............... slow path call ..........................
│ 0x00007f60c04970ac: mov %r9,%rdi
│ 0x00007f60c04970af: movabs $0x7f60d7775030,%r10
│ 0x00007f60c04970b9: callq *%r10
│ 0x00007f60c04970bc: mov %rax,%r9
╰ 0x00007f60c04970bf: jmp 0x00007f60c049706e
This is actually good code. But if you add -Xmx1g, thus enabling 32-bit compressed oops, you would
expect decode to go away in favor of just using the (extended) 32-bit value. Shifts are indeed gone,
but register moves are still there. And that, I think, wastes registers, see:
[Verified Entry Point]
6.85% 0x00007fb1284982d0: mov %eax,-0x14000(%rsp)
5.71% 0x00007fb1284982d7: push %rbp
3.14% 0x00007fb1284982d8: sub $0x10,%rsp
7.32% 0x00007fb1284982dc: mov 0xc(%rsi),%r11d
2.46% 0x00007fb1284982e0: mov %r11,%r9 <---- !!!!
.......................... LRB fastpath check ..........................
2.97% 0x00007fb1284982e3: testb $0x1,0x20(%r15)
3.45% ╭ 0x00007fb1284982e8: jne 0x00007fb128498302
.........│......... LRB fastpath ends, store to %r9 follows ............
3.51% │↗ ↗↗ 0x00007fb1284982ea: movl $0x2a,0xc(%r9)
7.30% ││ ││ 0x00007fb1284982f2: add $0x10,%rsp
3.12% ││ ││ 0x00007fb1284982f6: pop %rbp
2.91% ││ ││ 0x00007fb1284982f7: mov 0x108(%r15),%r10
3.23% ││ ││ 0x00007fb1284982fe: test %eax,(%r10)
4.63% ││ ││ 0x00007fb128498301: retq
---------││-││----------- LRB midpath starts --------------------------
.........│|.|│............ checking in-cset ...........................
↘│ ││ 0x00007fb128498302: mov %r9,%r10
│ ││ 0x00007fb128498305: shr $0x13,%r10
│ ││ 0x00007fb128498309: movabs $0x7fb13808d770,%r8
│ ││ 0x00007fb128498313: cmpb $0x0,(%r8,%r10,1)
╰ ││ 0x00007fb128498318: je 0x00007fb1284982ea
............││............ checking is-forwarded ......................
││ 0x00007fb12849831a: mov -0x8(%r11),%r9
││ 0x00007fb12849831e: mov %r11,%r10 <---- !!!!
││ 0x00007fb128498321: cmp %r10,%r9
╰│ 0x00007fb128498324: jne 0x00007fb1284982ea
.............│............... slow path call ..........................
│ 0x00007fb128498326: mov %r9,%rdi
│ 0x00007fb128498329: movabs $0x7fb13f963030,%r10
│ 0x00007fb128498333: callq *%r10
│ 0x00007fb128498336: mov %rax,%r9
╰ 0x00007fb128498339: jmp 0x00007fb1284982ea
32-bit compressed oops mode is interesting, because it is the microservice range. Not sure it is LRB
problem, or a generic C2 one.
Thanks,
-Aleksey
More information about the shenandoah-dev
mailing list