RFR: 8290965: PPC64: Implement post-call NOPs [v5]

Richard Reingruber rrich at openjdk.org
Tue Jan 16 07:05:30 UTC 2024


> #### Implementation of post call nops (PCNs) on ppc64.
> 
> Depends on https://github.com/openjdk/jdk/pull/17150
> 
> About post call nops:
> 
> - instruction(s) at return addresses of compiled java calls
> - emitted iff vm continuations are enabled to support virtual threads
> - encode data that can be be used to find the corresponding CodeBlob and oop map faster
> - mt-safe patchable to trigger deoptimization
> 
> Background:
> 
> - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack).
>   Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames.
> - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN.
> 
> Post call nops on ppc64
> 
> - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1])
>   x86_64: 1 instruction, 8 bytes
>   aarch64: 3 instruction, 12 bytes
>   [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B
>        https://openpowerfoundation.org/specifications/isa/
> 
> - 26 bits data payload
>   x86_64: 32 bits; aarch64: 32 bits
> - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64).
>   x86_64: 8 bits; aarch64: 8 bits
> - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment.
>   x86_64: 24 bits; aarch64: 24 bits
> - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw<ConfigT>::patch_caller_links`)
> 
> - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs.
>   The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development.
> 
> - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame constructor passing `frame::kind::code_blob`.
> 
> #### Statistics
> 
> 
> | SpecJVM2008...

Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits:

 - Merge branch 'master'
 - Review Martin
 - Merge branch 'master'
 - Fix comment
   
   Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>
 - 8290965: PPC64: Implement post-call NOPs
 - 8322294: Cleanup NativePostCallNop

-------------

Changes: https://git.openjdk.org/jdk/pull/17171/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17171&range=04
  Stats: 132 lines in 13 files changed: 96 ins; 0 del; 36 mod
  Patch: https://git.openjdk.org/jdk/pull/17171.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17171/head:pull/17171

PR: https://git.openjdk.org/jdk/pull/17171


More information about the hotspot-compiler-dev mailing list