RFR: 8290965: PPC64: Implement post-call NOPs [v2]
Martin Doerr
mdoerr at openjdk.org
Wed Dec 27 18:39:55 UTC 2023
On Sat, 23 Dec 2023 11:56:10 GMT, Richard Reingruber <rrich at openjdk.org> wrote:
>> #### Implementation of post call nops (PCNs) on ppc64.
>>
>> Depends on https://github.com/openjdk/jdk/pull/17150
>>
>> About post call nops:
>>
>> - instruction(s) at return addresses of compiled java calls
>> - emitted iff vm continuations are enabled to support virtual threads
>> - encode data that can be be used to find the corresponding CodeBlob and oop map faster
>> - mt-safe patchable to trigger deoptimization
>>
>> Background:
>>
>> - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack).
>> Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames.
>> - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN.
>>
>> Post call nops on ppc64
>>
>> - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1])
>> x86_64: 1 instruction, 8 bytes
>> aarch64: 3 instruction, 12 bytes
>> [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B
>> https://openpowerfoundation.org/specifications/isa/
>>
>> - 26 bits data payload
>> x86_64: 32 bits; aarch64: 32 bits
>> - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64).
>> x86_64: 8 bits; aarch64: 8 bits
>> - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment.
>> x86_64: 24 bits; aarch64: 24 bits
>> - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw<ConfigT>::patch_caller_links`)
>>
>> - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs.
>> The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development.
>>
>> - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame cons...
>
> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix comment
>
> Co-authored-by: Andrew Haley <aph-open at littlepinkcloud.com>
Usage of CMPI/CMPLI looks great. Assuming `kind::nmethod` by default will likely work, but I wonder if we could avoid that without measurable performance loss (see comments below).
src/hotspot/cpu/ppc/frame_ppc.hpp line 398:
> 396: enum class kind {
> 397: native, // The frame's pc is not necessarily in the CodeCache.
> 398: // CodeCache::find_blob_fast(void* pc) can yield wrong results in this case and must not be used.
I'd probably call it `unknown`.
src/hotspot/cpu/ppc/frame_ppc.hpp line 414:
> 412: // Constructors
> 413: inline frame(intptr_t* sp, intptr_t* fp, address pc);
> 414: inline frame(intptr_t* sp, address pc, kind knd = kind::nmethod);
I think using `kind::nmethod` by default is potentially dangerous. The pc may be outside of the code cache and calling find_blob_fast would be unreliable. It's used by pns for debugging code. It doesn't look performance critical and we could use a conservative default.
I guess that we don't see issues because native code doesn't set bit 9 in CMPI/CMPLI.
src/hotspot/cpu/ppc/frame_ppc.inline.hpp line 44:
> 42: }
> 43:
> 44: if (_cb == nullptr ) {
Please remove the whitespace!
src/hotspot/cpu/ppc/frame_ppc.inline.hpp line 45:
> 43:
> 44: if (_cb == nullptr ) {
> 45: _cb = knd == kind::nmethod ? CodeCache::find_blob_fast(_pc) : CodeCache::find_blob(_pc);
`(knd == kind::nmethod)` would look better.
src/hotspot/cpu/ppc/frame_ppc.inline.hpp line 92:
> 90: _on_heap(false), DEBUG_ONLY(_frame_index(-1) COMMA) _unextended_sp(nullptr), _fp(nullptr) {}
> 91:
> 92: inline frame::frame(intptr_t* sp) : frame(sp, nullptr, kind::nmethod) {}
Same here. Potentially dangerous default value. Not performance critical AFAICS.
src/hotspot/cpu/ppc/frame_ppc.inline.hpp line 105:
> 103: : _sp(sp), _pc(pc), _cb(cb), _oop_map(nullptr),
> 104: _on_heap(false), DEBUG_ONLY(_frame_index(-1) COMMA) _unextended_sp(unextended_sp), _fp(fp) {
> 105: setup(kind::nmethod);
I think `kind::nmethod` should only be used if cb != nullptr which is not checked, here. Is this one performance critical?
src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1191:
> 1189: }
> 1190: // We use CMPI/CMPLI instructions to encode post call nops.
> 1191: // We set bit 9 to distinguish post call nops from real CMPI/CMPI instructions
Should be CMPI/CMPLI. Maybe add that CMPI and CMPLI opcodes only differ in one bit which we use to encode data.
src/hotspot/cpu/ppc/nativeInst_ppc.hpp line 519:
> 517: // | |4 bits | | 22 bits |
> 518: //
> 519: // Bit 9 is alwys 1 for PCNs to distinguish them from CMPI/CMPLI
`always`, maybe distinguish from "regular CMPI/CMPLI".
-------------
PR Review: https://git.openjdk.org/jdk/pull/17171#pullrequestreview-1797379733
PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1437165683
PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1437192737
PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1437161661
PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1437161983
PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1437192999
PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1437167565
PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1437171154
PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1437172336
More information about the hotspot-compiler-dev
mailing list