RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v34]

Richard Reingruber rrich at openjdk.org
Mon May 22 22:03:04 UTC 2023


On Mon, 22 May 2023 21:36:12 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification".
>> 
>> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `<stdarg.h>` (2.2.4 Variable Argument Lists).
>> 
>> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only.
>> 
>> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017)
>> 
>> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests.
>> 
>> I had to make changes to shared code and code for other platforms:
>> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons:
>> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that.
>> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64.
>> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian!
>> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat...
>
> Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits:
> 
>  - Adaptation for JDK-8308276.
>  - Merge remote-tracking branch 'origin' into PPC64_Panama
>  - Add comment about Register Save Area.
>  - Replace abstract method useABIv2().
>  - Cleanup imports, improve comments, updates from other platforms.
>  - Add NONZERO check for downcall_stub_address_offset_in_bytes().
>  - Replace NULL by nullptr.
>  - libTestHFA: Add explicit type conversion to avoid build warning.
>  - Add test case for passing a double value in a GP register. Use better instructions for moving between FP and GP reg. Improve comments.
>  - Merge remote-tracking branch 'origin' into PPC64_Panama
>  - ... and 31 more: https://git.openjdk.org/jdk/compare/939344b8...08a5c143

Hi Martin,

there seems to be a mismatch between this pr and the [64-bit ELF ABI V2 for PPC](https://openpowerfoundation.org/specifications/64bitelfabi/). In fact all dynamically generated calls that have to conform to ABI V2 are affected. I'm giving a short summary of our discussion this afternoon.

Very briefly: ABI V2 states that a Parameter Save Area (PSA) shall be allocated _unless_ all parameters can be passed in registers as indicated by the caller's prototype, whereas the port always allocates a PSA of 8 double words.
(Details under "Parameter Save Area" in "2.2.3.3. Optional Save Areas" of ELF ABI V2)

It is not wrong what we're doing. It is like we didn't know the prototype of the call targets. But for most calls [1] we are wasting stack space (and confusing everybody that tries to match the implementation with the spec).
Interestingly ABI V1 states that a PSA of at least 8 double words is always needed. Looks like we've missed that change.

I have conducted a little experiment and compiled the following test program using Compiler Explorer [2]


#include <stdint.h>

int64_t test_callee(int64_t p1, int64_t p2, int64_t p3, int64_t p4);

int64_t test_call(int64_t p1, int64_t p2, int64_t p3, int64_t p4) {
    return test_callee(p1, p2, p3, p4);
}


This is the -O2 output for ELF ABI V2 (little endian)
Note: the stdu allocates just the minimal frame of 4 double words without PSA.


test_call:                              # @test_call
.Lfunc_gep0:
        addis 2, 12, .TOC.-.Lfunc_gep0 at ha
        addi 2, 2, .TOC.-.Lfunc_gep0 at l
        mflr 0
        stdu 1, -32(1)
        std 0, 48(1)
        bl test_callee
        nop
        addi 1, 1, 32
        ld 0, 16(1)
        mtlr 0
        blr


This is the -O2 output for ELF ABI V1 (big endian)


test_call:                              # @test_call
        .quad   .Lfunc_begin0
        .quad   .TOC. at tocbase
        .quad   0
.Lfunc_begin0:
        mflr 0
        stdu 1, -112(1)
        std 0, 128(1)
        bl test_callee
        nop
        addi 1, 1, 112
        ld 0, 16(1)
        mtlr 0
        blr

Note: the stdu allocates a much larger frame because it accomodates a PSA of 8 double words.

I'd suggest to keep the current well tested version but add comments to the code that a PSA is always allocated even though ABI V2 does not require it. This should also be explained in the JBS item.

Furthermore RFEs should be filed to adopt ABI V2 in the FFM API port and in the hotspot port to PPC64le. There's quite a bit of room for improvement there. Ironically I've very recently fixed [JDK-8306111](https://bugs.openjdk.org/browse/JDK-8306111) citing the ABI V2 spec without realizing that the fix is not needed for V2, just for V1.


[1] Exceptions that _do_ require a PSA are calls with a really long or variable parameter list or if a prototype is not available

[2] [Experiment with "Compiler Explorer"](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYgATKVpMGoAPrnkpJfWQE8Ayo3QBhVLQCuLBhNJOAMngMmAByngBGmMQgABykAA6oCoT2DK4eXj4JSXYCAUGhLBFRsdaYtilCBEzEBGme3lxWmDY5DJXVBHkh4ZExVlU1dRmNCgOdgd2FvdEAlFao7sTI7BwApHoAzIHIHlgA1KsbzqP4ggB0CIfYqxoAgje3gQQAbJKmBHsEmKOmorT0mAgT1e7z2cUae2Bbw%2BcQMkMEIJhG1I8Je0LBkhmhwAQg8HlDQV8fn9aECEejwSiCTC4dSwcjUYiMTMDgB2XF3PZcvbETAERYMT7fAi/MQAiCUsFwuIMuKYnF41kAEQeHDmtE4AFZeN4OFpSKhOM49goFktMAdNnpeARNGq5gBrECSACcZw0Gi4Ls1zxdklZXGeekkyI1HEkOrtBs4vAUIA0pFterVpDgsBgiBQqBYcTokXIlDQObzUWQcTiyFeNiMJlIWAAbnhlgA1PCYADuAHk4oxODw%2BHQvsQ4xAwlGwoFqgBPPu8CfMYhTzthbRlJP9otsQSdhi0GfJuuYFjGYDiA/4XnlevfKOYVRldxfWfkQTNKO0PBhYjT1xYKMEYg8BYWc5ioIxgAUVsO27XtuF4fhBBEMR2CkGRBEUFR1APXRGhrMwLEMT840gOZUDiVo4w4ABaTtrQNa9iEArBiIgOZSnKBwICcIYGl8Bh0C6AoikyRJkgEHiROyFJBJ6KIRmaNcKjGCT5JaJSOhkqY5P6DoVJ0mpNOErg2LNZZ9HVLVIwPQ0ODBVB20iKsLR2E89ggAD3AYB0WQgXBCBIS0Nj0GYbTtGY5gQTAmCwKJW
 NIJ0NlZM5NUMTgI1IYDNQTXV9Rs2N40TMLUwzCAkCLXN6DICgIHKksUHLStJBc2sGybTAoK7HtdX7GhaCHEcxwPedp2fYbF2XVdbGfTdGAIHc9yjLBjxMM99QvRTr0o/U7wfJ84JfL4w31D8vx/DAVn1ACgJAvhwMgttOtgnrZCQ8RUIQ%2BQlDUKNdAMPCQHMX5CLCFjSPIlJKJoujUAYpibxIpo1M47i3HqCRYn8CYhN6QN4lE1oVNiLIxIYQyceeRHFIEdpBlR4ZKY4toxjJ7TRl0uneLZgysdkiQTMWMzgtSjhtVIHLeBsuyHOICtXj2ZrgDcjyvJ8vyiGIQLgtC5NwtISLot6OKw3SzLsqjPKrAKpMtF1p0srOF0XWiDQQw0TU9GiRLNVZYWNis3KY0KnWLI4OjxejDhtZtuYGKSBxJCAA%3D%3D)

src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 205:

> 203:                 stack = stackAlloc(4, 4);
> 204:             } else {
> 205:                 stack = stackAlloc(is32Bit ? 4 : 8, STACK_SLOT_SIZE);

This looks like a stack slot is always allocated. Please explain that for ABI V2 this is actually only required if it is know from a prototype that not all parameters can be passed in registers and that we plan to change this.

-------------

Changes requested by rrich (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/12708#pullrequestreview-1437623272
PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201198779


More information about the hotspot-dev mailing list