RFR: 8290965: PPC64: Implement post-call NOPs [v4]
Martin Doerr
mdoerr at openjdk.org
Thu Jan 11 13:21:28 UTC 2024
On Thu, 11 Jan 2024 08:57:52 GMT, Richard Reingruber <rrich at openjdk.org> wrote:
>> #### Implementation of post call nops (PCNs) on ppc64.
>>
>> Depends on https://github.com/openjdk/jdk/pull/17150
>>
>> About post call nops:
>>
>> - instruction(s) at return addresses of compiled java calls
>> - emitted iff vm continuations are enabled to support virtual threads
>> - encode data that can be be used to find the corresponding CodeBlob and oop map faster
>> - mt-safe patchable to trigger deoptimization
>>
>> Background:
>>
>> - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack).
>> Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames.
>> - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN.
>>
>> Post call nops on ppc64
>>
>> - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1])
>> x86_64: 1 instruction, 8 bytes
>> aarch64: 3 instruction, 12 bytes
>> [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B
>> https://openpowerfoundation.org/specifications/isa/
>>
>> - 26 bits data payload
>> x86_64: 32 bits; aarch64: 32 bits
>> - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64).
>> x86_64: 8 bits; aarch64: 8 bits
>> - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment.
>> x86_64: 24 bits; aarch64: 24 bits
>> - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw<ConfigT>::patch_caller_links`)
>>
>> - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs.
>> The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development.
>>
>> - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame cons...
>
> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
>
> Review Martin
Thanks for the updates! The constructors should still be used with care, but I think your code is at least as good as other platforms (rather better IMHO).
-------------
Marked as reviewed by mdoerr (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/17171#pullrequestreview-1815566892
More information about the hotspot-compiler-dev
mailing list