RFR: 7903649: Field and global variables of array type should have indexed accessors

Tue Jan 30 11:42:39 UTC 2024

On Tue, 30 Jan 2024 10:18:00 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> P.S. I guess what I'm implying here is: maybe we need a way to make getting the length of an array field easier as well
>
> Good observation... unfortunately I'm not too sure how to address this - in the sense that a single field can have N length accessors (one per dimension) - well, or at least N static fields. We could simply expose the field layout (and leave it to the client to decode the dimensions) - but that's not too usable, and it also feels a bit redundant (since the field layout can be derived, as you show, from the struct layout). Also, for globals we ditched layouts (because of accessor naming troubles), so the same approach wouldn't work there.
> 
> It feels like doubling down on layouts is the right way to go - and it would be generally useful to maybe have a layout per field/global variable. But the logistics of it are complicated, especially for global variables (where you'd need a extra holder class for each global variable to store the layout), and naming can be a pain too.

Thinking a bit more... while the hardwired array lengths are not ideal, I guess I still see the glass as half-full: the javadoc of the accessor shows the array declaration in full, and that will have the correct sizes displayed. Yes, it's a bit of a pity that the sizes are only captured in the javadoc, but I'm not sure we can fix that w/o moving to a more complex translation scheme. Also, the indices passed to the array accessor are validated dynamically (as the array handle is derived from the layout), so there's at least some dynamic safety.

If we want to capture the dimensions in the generated code, we need at least a single `long[]` array. But then the problem is with naming: if we call this `<fieldName>$dims`, we go back to dollar names (which we tried to back away from with recent changes). The biggest issue of using simpler names (as we do now) is that we don't really have a great story for adding extra accessors in a way that is easily discoverable for clients (we have this same issue for exposing layouts of global variables, or function descriptors of native functions).

The only half-smart idea I have in this respect is to add, for each symbol S, a static class with the same name which contains all the relevant constants. E.g.

class Foo_h {
     int x() { ... }
     void x(int val) { ... }
     static class x {
          static final LAYOUT = ...
          ...
      }
}

That is, make the holder class we generate _public_, so that it can be a placeholder for constants that can be useful to access from clients of the generated code. At least for the header class, this approach doesn't add any overhead, given that we emit an holder class per function/global variable _anyway_. For structs, we currently do not generate holder classes, so this would increase overhead in the generated code. But, perhaps, a more regular (even if dumber) translation scheme for _all_ variable/fields might result in more predictability/discoverability? (separately, it feels like the code generation for structs is doing quite a bit of stuff in its static initializers, like calling `MemoryLayout::byteOffset`, or `MemoryLayout::varHandle`, so perhaps moving this logic away into an holder class might be beneficial anyway).

-------------

PR Review Comment: https://git.openjdk.org/jextract/pull/198#discussion_r1471064012