RFR: 8309953: Strengthen and optimize oopDesc age methods
David Holmes
dholmes at openjdk.org
Wed Jun 14 00:09:57 UTC 2023
On Tue, 13 Jun 2023 20:04:48 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
> See the RFE for discussion. Basically, there is little reason to do two loads of mark word, when we can do one.
>
> Sample generated code for `oopDesc::age` can be seen if we turn that method from `inline` to the regular method:
>
>
> # Before
>
> 000000000080f440 <oopDesc::age>:
> 80f440: ff 83 00 d1 sub sp, sp, #32
> 80f444: fd 7b 01 a9 stp x29, x30, [sp, #16]
> 80f448: fd 43 00 91 add x29, sp, #16
> 80f44c: 08 00 40 f9 ldr x8, [x0] ; <-- first mark load
> 80f450: 89 27 00 d0 adrp x9, 0xd01000
> 80f454: 1f 20 03 d5 nop
> 80f458: 29 95 4a b9 ldr w9, [x9, #2708]
> 80f45c: 0a 05 40 92 and x10, x8, #0x3
> 80f460: 5f 09 00 f1 cmp x10, #2
> 80f464: ea 17 9f 1a cset w10, eq
> 80f468: 1f 01 40 f2 tst x8, #0x1
> 80f46c: e8 17 9f 1a cset w8, eq
> 80f470: 3f 09 00 71 cmp w9, #2
> 80f474: 48 01 88 1a csel w8, w10, w8, eq
> 80f478: 1f 05 00 71 cmp w8, #1
> 80f47c: 21 01 00 54 b.ne 0x80f4a0
> 80f480: 08 00 40 f9 ldr x8, [x0] ; <-- second mark load
> 80f484: e8 07 00 f9 str x8, [sp, #8]
> 80f488: e0 23 00 91 add x0, sp, #8
> 80f48c: c4 ed fd 97 bl 0x78ab9c
> 80f490: 00 18 03 53 ubfx w0, w0, #3, #4
> 80f494: fd 7b 41 a9 ldp x29, x30, [sp, #16]
> 80f498: ff 83 00 91 add sp, sp, #32
> 80f49c: c0 03 5f d6 ret
> 80f4a0: 00 00 40 f9 ldr x0, [x0]
> 80f4a4: 00 18 03 53 ubfx w0, w0, #3, #4
> 80f4a8: fd 7b 41 a9 ldp x29, x30, [sp, #16]
> 80f4ac: ff 83 00 91 add sp, sp, #32
> 80f4b0: c0 03 5f d6 ret
>
> # After
>
> 000000000080f480 <oopDesc::age>:
> 80f480: ff 83 00 d1 sub sp, sp, #32
> 80f484: fd 7b 01 a9 stp x29, x30, [sp, #16]
> 80f488: fd 43 00 91 add x29, sp, #16
> 80f48c: 00 00 40 f9 ldr x0, [x0] ; <-- load mark once
> 80f490: e0 07 00 f9 str x0, [sp, #8]
> 80f494: 88 27 00 d0 adrp x8, 0xd01000
> 80f498: 1f 20 03 d5 nop
> 80f49c: 08 95 4a b9 ldr w8, [x8, #2708]
> 80f4a0: 09 04 40 92 and x9, x0, #0x3
> 80f4a4: 3f 09 00 f1 cmp x9, #2
> 80f4a8: e9 17 9f 1a cset w9, eq
> 80f4ac: 1f 00 40 f2 tst x0, #0x1
> 80f4b0: ea 17 9f 1a cset w10, eq
> 80f4b4: 1f 09 00 71 cmp w8, #2
> 80f4b8: 28 01 8a 1a csel w8, w9, w10, eq
> 80f4bc: 1f 05 00 71 cmp w8, #1
> 80f4c0: 61 00 00 54 b.ne 0x80f4cc
> 80f4c4: e0 23 00 91 add x0, sp, #8
> 80f4c8: c5 ed fd 97 bl 0x78abdc
> 80f4cc: 00 18 03 53 ubfx w0, w...
I think this issue is overstated as the code is not intended to be thread-safe in the way suggested. So it is just a micro-optimisation, the value of which has not been shown, and which makes the source code somewhat clunky IMO.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14456#issuecomment-1590230832
More information about the hotspot-dev
mailing list