aarch64 call arranger misclassifies structures

Nick Gasson nick.gasson at arm.com
Mon Oct 7 13:28:53 UTC 2024


Hi Maurizio,

> The FFM code seems to be very conservative: if there’s a member layout
> in the struct/union that’s not a value layout (padding, struct/union,
> sequence), then the struct/union is not an HFA. It is possible that’s
> too restrictive, but we’d need some authoritative opinion on how the
> rules have to be interpreted (cc’ing Nick).

Yes you're right it's too conservative at the moment and should
recursively consider the members of any embedded structs/unions.

> So, there seems to be some issue here, although, given the very vague
> definition of HFA in the ABI, it is not clear to me whether we’re
> staring at a FFM bug, an llvm bug, or a gcc bug.

I asked some internal ABI experts about this and apparently it's a known
inconsistency between Clang and GCC, and both seem to interpret the ABI
incorrectly in some cases.  The intent is that these HFA rules apply
after mapping the language level types to the fundamental data types
defined by the ABI, and a zero-length array or struct has no members
after this mapping so something like the following _should_ be an HFA:

  typedef union {
      struct {
          struct {
              int x;
          } arr1[0];
      };
      float arr2[3];
  } T22;

Which is currently the case when compiled with Clang but not GCC.
Although neither of them translate the following case as an HFA:

  typedef union {
      struct {
          struct {
              long long x;
          } arr1[0];
      };
      float arr2[3];
  } T23;

The SVE vector ABI that was introduced more recently has a similar
concept of a "Pure Scalable Type" and the spec there was clarified to
avoid these ambiguities:

> Composite Types have at least one member and the type of each member
> is either a Fundamental Data Type or another Composite Type.  Since
> all Fundamental Data Types have nonzero size, it follows that all
> members of a Composite Type have nonzero size.
>
> Any language-level members that have zero size must therefore
> disappear in the language-to-ABI mapping and do not affect whether the
> containing type is a Pure Scalable Type.

For the example which has different behaviour when compiled as C vs C++:

  typedef struct {
      struct {
          struct {
          } arr1[1];
      };
      float arr2[3];
  } T18;

This happens because the C++ standard defines an empty struct to have
non-zero size and a unique address so there is actually an implicit
one-byte element in the struct when compiled as C++ which is visible to
the HFA classification logic.

Also note that zero-length arrays are actually undefined in the C
standard.

--
Nick


On 07/10/24 09:46 am, Maurizio Cimadamore wrote:
> Hi,
> thanks for the report.
>
> By reading the Aarch64 ABI:
>
>> 5.9.5 Homogeneous Aggregates
>> A Homogeneous Aggregate is a composite type where all of the
>> Fundamental Data Types of the members
>> that compose the type are the same. The test for homogeneity is
>> applied after data layout is completed and
>> without regard to access control or other source language
>> restrictions. Note that for short-vector types the
>> fundamental types are 64-bit vector and 128-bit vector; the type of
>> the elements in the short vector does not
>> form part of the test for homogeneity.
>> A Homogeneous Aggregate has a Base Type, which is the Fundamental Data
>> Type of each Member. The
>> overall size is the size of the Base Type multiplied by the number
>> uniquely addressable Members; its
>> alignment will be the alignment of the Base Type.
>> 5.9.5.1 Homogeneous Floating-point Aggregates (HFA)
>> A Homogeneous Floating-point Aggregate (HFA) is a Homogeneous
>> Aggregate with a Fundamental Data
>> Type that is a Floating-Point type and at most four uniquely
>> addressable members.
>> 5.9.5.2 Homogeneous Short-Vector Aggregates (HVA)
>> A Homogeneous Short-Vector Aggregate (HVA) is a Homogeneous Aggregate
>> with a Fundamental Data
>> Type that is a Short-Vector type and at most four uniquely addressable
>> members.
>
> I’m frankly not clear at all at what should happen in some of the cases
> you have in your examples. For instance, this:
>
> |// HFA typedef union { struct { struct { int x; } arr1[0]; }; float
>  arr2[3]; } T22; |
>
> Works as an HFA, I suppose, because “x” is not an addressable
> member. But then…
>
> |// not HFA, int registers typedef union { struct { struct { long long
>  x; } arr1[0]; }; float arr2[3]; } T23; |
>
> This also only has one “addressable member” (whatever that means). But
> clang uses float registers in the latter, not the former.
>
> Also, switching godbolt to use GCC I don’t see any difference in the
> assembly generated for the two examples. They both seem to use int
> registers (e.g. mov, not fmov).
>
> So, there seems to be some issue here, although, given the very vague
> definition of HFA in the ABI, it is not clear to me whether we’re
> staring at a FFM bug, an llvm bug, or a gcc bug.
>
> The FFM code seems to be very conservative: if there’s a member layout
> in the struct/union that’s not a value layout (padding, struct/union,
> sequence), then the struct/union is not an HFA. It is possible that’s
> too restrictive, but we’d need some authoritative opinion on how the
> rules have to be interpreted (cc’ing Nick).
>
> Cheers
> Maurizio
>
> On 06/10/2024 10:16, Владимир Козелков wrote:
>
>> Hello!
>> Under the pull request <https://github.com/openjdk/jdk/pull/21041> I
>> was advised to write about new issues to this email.
>>
>> I'm building my own port of the Panama project to Android, which uses
>> LibLLVM for native code generation, so I've been doing a lot of
>> research into how the calling convention works there.
>>
>> Based on the data I received, I created my own classifier
>> <https://github.com/vova7878/PanamaPort/blob/master/AndroidPanama/src/main/java/com/v7878/foreign/_LLVMCallingConvention.java>
>> of structures for the arranger with an eye on LLVM (In most cases, it
>> calculates registers itself)
>>
>> Initially my implementation rejected structures and sequences of size
>> 0, but then I added support for them back. Everything worked pretty
>> well until I dug into the godbolt tests and compared my implementation
>> to the reference implementation from OpenJDK.
>>
>> It seems to me
>> thatjdk.internal.foreign.abi.aarch64.TypeClass.isHomogeneousFloatAggregate
>> <https://github.com/openjdk/jdk/blob/260d4658aefe370d8994574c20057de07fd6f197/src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/TypeClass.java#L85>
>> does not work correctly for union types and sequences with nested
>> GroupLayout. Additionally, I made examples in godbolt that show how it
>> determines HFA
>>
>> C++ https://godbolt.org/z/fcsMvaKhd
>> C https://godbolt.org/z/er8Yev7ME
>>
>> I don't have a PC on aarch64 to run tests on java, but it looks to me
>> like OpenJDK implementation is not correct for some cases
>
>


More information about the panama-dev mailing list