RFR: 8286302: Port JEP 425 to PPC64 [v5]
Martin Doerr
mdoerr at openjdk.org
Thu Nov 17 19:52:25 UTC 2022
On Wed, 16 Nov 2022 10:24:24 GMT, Richard Reingruber <rrich at openjdk.org> wrote:
>> Hi,
>>
>> this is the port of [JEP 425: Virtual Threads (Preview)](https://openjdk.org/jeps/425)) to PPC64.
>> More precisely it is the port of vm continuations in hotspot to PPC64. It allows to run with `-XX:+VMContinuations` which is a prerequisit for 'real' virtual threads (oxymoron?).
>>
>> Most of the shared code changes are related to a new platform dependent constant `frame::metadata_words_at_top`. It is either added or subtracted to a frame address or size.
>>
>> The following is supposed to explain (without real life details) why it is needed in addition to the existing `frame::metadata_words`. The diagram shows a frame at `SP` and its stack arguments. The caller frame is located at `callers_SP`.
>>
>>
>> X86 / AARCH64 PPC64:
>>
>> : : : :
>> : : : :
>> | | | |
>> |-----------------| |-----------------|
>> | | | |
>> | stack arguments | | stack arguments |
>> | |<- callers_SP | |
>> =================== |-----------------|
>> | | | |
>> | metadata at bottom | | metadata at top |
>> | | | |<- callers_SP
>> |-----------------| ===================
>> | | | |
>> | | | |
>> | | | |
>> | | | |
>> | |<- SP | |
>> =================== |-----------------|
>> | |
>> | metadata at top |
>> | |<- SP
>> ===================
>>
>>
>> On X86 and AARCH64 metadata (that's return address, saved SP/FP, etc.) is stored at the frame bottom (`metadata at bottom`). On PPC64 it is stored at the frame top (`metadata at top`) where it affects size and address calculations. Shared code deals with this by making use of the platform dependent constant `frame::metadata_words_at_top`.
>>
>> * size required to 'freeze' a single frame with its stack arguments in a `StackChunk`:
>> `sizeof(frame) + sizeof(stack arguments) + frame::metadata_words_at_top`
>>
>> * address of stack arguments:
>> `callers_SP + frame::metadata_words_at_top`
>>
>> * The value of `frame::metadata_words_at_top` is 0 words on X86 and AARCH64. On PPC64 it is 4 words.
>>
>> Please refer to comments I've added for more details (e.g. the comment on StackChunkFrameStream<frame_kind>::frame_size()). Recently I've given a talk about vm continuations and the PPC64 port to my colleagues. It's rather an overview than a deep dive. [The slides](http://cr.openjdk.java.net/~rrich/webrevs/2022/8286302/202210_loom_ppc64_port.pdf) might serve as well as an intro to the matter.
>>
>> The pr includes the new test jdk/jdk/internal/vm/Continuation/BasicExp.java which I wrote while doing the port. The test cases vary from simple to not-so-simple. One of the main features is that a CompilationPolicy passed as argument controls which frames are compiled/interpreted when freezing by defining a sliding window of compiled / interpreted frames which produces interesting transitions with and without stack arguments. There is overlap with Basic.java and Fuzz.java. Let me know wether to remove or keep BasicExp.java. Runtime with fastdebug: 2m on Apple M1 Pro and 3m on Intel Xeon E5-2660 v3 @ 2.60GHz. Note that -XX:+VerifyContinuations is explicitly set as a found it very useful, it increases the runtime quite a bit though.
>>
>> Testing: the change passed our CI testing: most JCK and JTREG tests, also in Xcomp mode, SPECjvm2008, SPECjbb2015, Renaissance Suite and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. These tests do include hotspot_loom and jdk_loom JTREG tests which I've also run with TEST_VM_OPTS="-XX:+VerifyContinuations" on X86_64, PPC64, and AARCH64.
>>
>> Thanks, Richard.
>
> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
>
> Cleanup BasicExp.java
Great work! Looks very good to me. I'm requesting precise node sizes in C2 (let's keep this invariant: nodes should either have a precisely precomputed size or use dynamic computation). The rest are just minor comments and suggestions.
I need more time to study all explanations and to review the test, but I can't see anything which prevents us from shipping it with JDK 20.
Thanks for the ASCII art pictures!
src/hotspot/cpu/aarch64/frame_aarch64.hpp line 110:
> 108: // between a callee frame and its stack arguments, where it is part
> 109: // of the caller/callee overlap
> 110: metadata_words_at_top = 0,
I like the `_at_bottom` and `_at_top` versions. The old version could possibly get replaced completely in a separate RFE (this change is already large enough).
src/hotspot/cpu/ppc/continuationFreezeThaw_ppc.inline.hpp line 277:
> 275: // esp points one slot below the last argument
> 276: intptr_t* x86_64_like_unextended_sp = f.interpreter_frame_esp() + 1 - frame::metadata_words_at_top;
> 277: sp = fp - (f.fp() - x86_64_like_unextended_sp);
Nice workaround for the strange `unextended_sp` which doesn't fit well for other platforms!
src/hotspot/cpu/ppc/continuationFreezeThaw_ppc.inline.hpp line 498:
> 496: // align fp
> 497: int padding = fp - align_down(fp, frame::frame_alignment);
> 498: fp -= padding;
Additional whitespaces should better get removed.
src/hotspot/cpu/ppc/continuationFreezeThaw_ppc.inline.hpp line 520:
> 518:
> 519: if ((bottom && argsize > 0) || caller.is_interpreted_frame()) {
> 520: frame_sp -= argsize + frame::metadata_words_at_top;
whitespace
src/hotspot/cpu/ppc/frame_ppc.hpp line 370:
> 368:
> 369: union {
> 370: intptr_t* _fp; // frame pointer
whitespace
src/hotspot/cpu/ppc/frame_ppc.inline.hpp line 113:
> 111: // In thaw, non-heap frames use this constructor to pass oop_map. I don't know why.
> 112: assert(_on_heap || _cb != nullptr, "these frames are always heap frames");
> 113: if (cb != NULL) {
Better use `nullptr`.
src/hotspot/cpu/ppc/nativeInst_ppc.cpp line 448:
> 446: }
> 447:
> 448: // Inserts an undefined instruction at a given pc
More precisely: An instruction which is specified to cause a SIGILL.
src/hotspot/cpu/ppc/ppc.ad line 14386:
> 14384:
> 14385: format %{ "CALL,static $meth \t// ==> " %}
> 14386: size(8);
Please make the sizes precise. They are only 8 Byte when continuations are enabled.
src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 1803:
> 1801:
> 1802: // Read interpreter arguments into registers (this is an ad-hoc i2c adapter)
> 1803: __ ld(reg_cont_obj, Interpreter::stackElementSize*3, R15_esp);
Would you like to align the indentation?
src/hotspot/cpu/ppc/stackChunkFrameStream_ppc.inline.hpp line 194:
> 192: + 1 // for the mirror oop
> 193: + ((intptr_t*)f.interpreter_frame_monitor_begin()
> 194: - (intptr_t*)f.interpreter_frame_monitor_end())/BasicObjectLock::size();
whitespace
src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp line 634:
> 632: }
> 633:
> 634: __ restore_interpreter_state(R11_scratch1, false /*bcp_and_mdx_only*/, true /*restore_top_frame_sp*/);
Nice cleanup!
src/hotspot/share/runtime/continuationFreezeThaw.cpp line 488:
> 486: assert(!Interpreter::contains(_cont.entryPC()), "");
> 487: static const int doYield_stub_frame_size = NOT_PPC64(frame::metadata_words)
> 488: PPC64_ONLY(frame::abi_reg_args_size >> LogBytesPerWord);
Unfortunate that we still need to distinguish, here. But, ok. I don't have a better idea atm.
src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1024:
> 1022: if (f.is_interpreted_frame()) {
> 1023: assert(hf.is_heap_frame(), "should be");
> 1024: ContinuationHelper::InterpretedFrame::patch_sender_sp(hf, caller);
Thanks for removing `unextended_sp`, here. It's not generic enough for shared code.
-------------
Changes requested by mdoerr (Reviewer).
PR: https://git.openjdk.org/jdk/pull/10961
More information about the hotspot-compiler-dev
mailing list