RFR: 8309953: Strengthen and optimize oopDesc age methods

David Holmes dholmes at openjdk.org
Wed Jun 14 00:09:57 UTC 2023


On Tue, 13 Jun 2023 20:04:48 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> See the RFE for discussion. Basically, there is little reason to do two loads of mark word, when we can do one. 
> 
> Sample generated code for `oopDesc::age` can be seen if we turn that method from `inline` to the regular method:
> 
> 
> # Before
> 
> 000000000080f440 <oopDesc::age>:
>   80f440: ff 83 00 d1   sub     sp, sp, #32
>   80f444: fd 7b 01 a9   stp     x29, x30, [sp, #16]
>   80f448: fd 43 00 91   add     x29, sp, #16
>   80f44c: 08 00 40 f9   ldr     x8, [x0]          ; <-- first mark load
>   80f450: 89 27 00 d0   adrp    x9, 0xd01000 
>   80f454: 1f 20 03 d5   nop     
>   80f458: 29 95 4a b9   ldr     w9, [x9, #2708]
>   80f45c: 0a 05 40 92   and     x10, x8, #0x3
>   80f460: 5f 09 00 f1   cmp     x10, #2
>   80f464: ea 17 9f 1a   cset    w10, eq
>   80f468: 1f 01 40 f2   tst     x8, #0x1
>   80f46c: e8 17 9f 1a   cset    w8, eq
>   80f470: 3f 09 00 71   cmp     w9, #2
>   80f474: 48 01 88 1a   csel    w8, w10, w8, eq
>   80f478: 1f 05 00 71   cmp     w8, #1
>   80f47c: 21 01 00 54   b.ne    0x80f4a0
>   80f480: 08 00 40 f9   ldr     x8, [x0]          ; <-- second mark load
>   80f484: e8 07 00 f9   str     x8, [sp, #8]
>   80f488: e0 23 00 91   add     x0, sp, #8
>   80f48c: c4 ed fd 97   bl      0x78ab9c
>   80f490: 00 18 03 53   ubfx    w0, w0, #3, #4
>   80f494: fd 7b 41 a9   ldp     x29, x30, [sp, #16]
>   80f498: ff 83 00 91   add     sp, sp, #32
>   80f49c: c0 03 5f d6   ret     
>   80f4a0: 00 00 40 f9   ldr     x0, [x0]
>   80f4a4: 00 18 03 53   ubfx    w0, w0, #3, #4
>   80f4a8: fd 7b 41 a9   ldp     x29, x30, [sp, #16]
>   80f4ac: ff 83 00 91   add     sp, sp, #32
>   80f4b0: c0 03 5f d6   ret    
> 
> # After
> 
> 000000000080f480 <oopDesc::age>:
>   80f480: ff 83 00 d1   sub     sp, sp, #32
>   80f484: fd 7b 01 a9   stp     x29, x30, [sp, #16]
>   80f488: fd 43 00 91   add     x29, sp, #16
>   80f48c: 00 00 40 f9   ldr     x0, [x0]          ; <-- load mark once
>   80f490: e0 07 00 f9   str     x0, [sp, #8]
>   80f494: 88 27 00 d0   adrp    x8, 0xd01000  
>   80f498: 1f 20 03 d5   nop     
>   80f49c: 08 95 4a b9   ldr     w8, [x8, #2708]
>   80f4a0: 09 04 40 92   and     x9, x0, #0x3
>   80f4a4: 3f 09 00 f1   cmp     x9, #2
>   80f4a8: e9 17 9f 1a   cset    w9, eq
>   80f4ac: 1f 00 40 f2   tst     x0, #0x1
>   80f4b0: ea 17 9f 1a   cset    w10, eq
>   80f4b4: 1f 09 00 71   cmp     w8, #2
>   80f4b8: 28 01 8a 1a   csel    w8, w9, w10, eq
>   80f4bc: 1f 05 00 71   cmp     w8, #1
>   80f4c0: 61 00 00 54   b.ne    0x80f4cc 
>   80f4c4: e0 23 00 91   add     x0, sp, #8
>   80f4c8: c5 ed fd 97   bl      0x78abdc
>   80f4cc: 00 18 03 53   ubfx    w0, w...

I think this issue is overstated as the code is not intended to be thread-safe in the way suggested. So it is just a micro-optimisation, the value of which has not been shown, and which makes the source code somewhat clunky IMO.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14456#issuecomment-1590230832


More information about the hotspot-dev mailing list