RFR: 8290965: PPC64: Implement post-call NOPs [v2]

Richard Reingruber rrich at openjdk.org
Tue Jan 9 14:17:23 UTC 2024


On Sat, 23 Dec 2023 11:56:10 GMT, Richard Reingruber <rrich at openjdk.org> wrote:

>> #### Implementation of post call nops (PCNs) on ppc64.
>> 
>> Depends on https://github.com/openjdk/jdk/pull/17150
>> 
>> About post call nops:
>> 
>> - instruction(s) at return addresses of compiled java calls
>> - emitted iff vm continuations are enabled to support virtual threads
>> - encode data that can be be used to find the corresponding CodeBlob and oop map faster
>> - mt-safe patchable to trigger deoptimization
>> 
>> Background:
>> 
>> - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack).
>>   Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames.
>> - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN.
>> 
>> Post call nops on ppc64
>> 
>> - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1])
>>   x86_64: 1 instruction, 8 bytes
>>   aarch64: 3 instruction, 12 bytes
>>   [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B
>>        https://openpowerfoundation.org/specifications/isa/
>> 
>> - 26 bits data payload
>>   x86_64: 32 bits; aarch64: 32 bits
>> - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64).
>>   x86_64: 8 bits; aarch64: 8 bits
>> - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment.
>>   x86_64: 24 bits; aarch64: 24 bits
>> - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw<ConfigT>::patch_caller_links`)
>> 
>> - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs.
>>   The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development.
>> 
>> - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame cons...
>
> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix comment
>   
>   Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>

> _Mailing list message from [Andrew Haley](mailto:aph-open at littlepinkcloud.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at mail.openjdk.org):_
> 
> On 12/20/23 20:36, Richard Reingruber wrote:
> 
> > | test/jdk/java/lang/Thread/virtual/stress/Skynet.java | ppc64le   | x86_64    |
> > |------------------------------------------------------|-----------|-----------|
> > | PCN lookup success                                   | 306955525 | 247185016 |
> > | PCN lookup failure                                   |    500975 |    421098 |
> > | PCN decode success   (C2)                            | 306951893 | 247181691 |
> > | PCN decode failure                                   |      3168 |        59 |
> > | PCN patch success                                    |      2080 |      2662 |
> > | PCN patch cb offset failure                          |         0 |         0 |
> > | PCN patch oopmap slot failure                        |         0 |         0 |
> 
> These data are really interesting. How did you gather them? Thanks.

This is the code for the stats based on master: https://github.com/openjdk/jdk/commit/c376fcc9099251a3f62edc246748f26d0a54e2c0
This is the version for this pr: https://github.com/openjdk/jdk/commit/ae2b6ba70bfdca6a58f9af6b3a675c0f2aec7d85
(Actually these are a cleaner reimplementations of the original code)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17171#issuecomment-1883125887


More information about the hotspot-compiler-dev mailing list