add support for secondary carriers to memory access API
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Jul 2 21:13:19 UTC 2021
Hi,
the Foreign Memory Access API does not support certain carrier types,
such as boolean or MemoryAddress. Some in-depth discussion on support
for booleans is captured in this excellent email [1] from John. I've
recently picked up those comments again, to see if the Memory Access API
can be enhanced to support more carriers.
Let's start with booleans. Providing support for boolean carriers would
be a nice to have, since the C language (since C99) has builtin support
for boolean types; while prior to C99, booleans were usually modeled as
ints in C code, C99 changes that and adds a new `bool` type which is
represented more efficiently (depending on the ABI). AFAIK, on all
64-bit platforms (both Arm and x64), a `bool`/`_Bool` is always
represented with _one_ byte, which is fortunate, given that this is also
the way in which the JVM represents booleans.
There are, however, important distinctions between C booleans and JVM
booleans:
* in the JVM, a boolean value can be only 1 or 0. No other value is
allowed (as per JVMS). This invariant is preserved despite the bytecode
operation operating on boolean arrays (baload/bastore) is the _same_
used for byte arrays (that is, the JVM applies a normalization step when
storing values into boolean arrays)
* because of the above, the JVM test for "truth" is typically (value &
1), whereas in C the same test is typically expressed as (value != 0).
So, can we "just" add support for booleans to the memory access API? Not
quite, at least not w/o some tweaks. Given the distinctions mentioned
above, it is almost always a bad idea to use the Memory Access API to
access a Java boolean[]. If we allowed that, we would need to worry
about code like the one below:
```
boolean[] array = ...
MemorySegment segment = MemorySegment.ofArray(array);
MemoryAccess.setFloat(segment, 4.2); // ???
```
That is, if we can _alias_ a boolean[] with a memory segment, then we
can just write a float value into the boolean[], which surely will
violate the invariant that each element of a boolean[] must be either 1
or 0.
So, unless we want to do some heroics - e.g. detect that the segment is
indeed backed by a boolean[], and then apply a normalization step on top
- which are likely going to introduce extra costs and overheads, a much
saner solution seems to be to just _forbid_ memory segment views of heap
boolean[]. That is, no MemorySegment,ofArray(boolean[])).
That's good, but there's a bunch of related APIs which we need to
consider as well:
1) MemorySegment.toBooleanArray()
2) MemoryCopy.toArray(.... boolean[] ... )
3) MemoryCopy.fromArray(.... boolean[] ... )
These API points all allow, in one way or another, to transfer the
contents of a memory segments into a boolean array on the heap (and
back). Now, for the reasons mentioned above, bulk-copying a segment into
a boolean[] is almost always a bad idea, since there's no guarantee that
the values in the source memory segment will be normalized according to
the JVMS. We could, in principle, add a copy operation which, instead of
using a simple bulk transfer, used a loop, and then normalized elements
one by one. Overall, I'm not convinced it makes sense to mix "true" bulk
transfer operations with "loopy" transfers under the same API names - as
that would create confusion on the performance expectations the users
would have when calling these API methods. So, that seems to suggest
that supporting the (1) and (2) is also a no go.
But, if we disallowed these, then I don't think there's much point in
supporting (3) alone, even though, in itself, it would not be
problematic. In other words, I propose _not_ to add any of the above
three methods to the Foreign Memory Access API.
So, what can you do with booleans in this world? You can:
* create a VarHandle that dereferences a memory (byte) location, viewing
it as a Java boolean
* use boolean as a target carrier when defining a downcall method handle
I think this is still a nice outcome in terms of usability, especially
when considering code that wants to interop with C libraries using the
C99 `bool` type.
For MemoryAddress, I think similar arguments apply, although adapting a
memory address into a long and back is not as problematic as it is for
booleans. The main issue with MemoryAddress is that, again, bulk
operations are only really "bulk" at the surface; for instance, moving
data from a memory segment into a MemoryAddress[] is done (a) by
bulk-copying a memory segment into a long[] (or int[] if on 32-bits!)
and then by (b) adapting each long back into a MemoryAddress instance.
Again, to encourage a world where API methods have a similar performance
model, I think that bulk operations involving MemoryAddress[] are
misleading, and go beyond the scope of the Foreign Memory Access API.
Circling back, it seem we have reached a state where the Foreign Memory
Access API provides support for _two_ kinds of carriers:
* primary carriers: byte, short, char, int, float, double
* secondary carriers: boolean, MemoryAddress
While the story, so far, has been to focus on the primary carriers, I
think there is room to expand that story and add support for "secondary"
carriers as well, albeit with the limitations described above (e.g. no
support for bulk copy).
A draft PR for this work is available here:
https://github.com/openjdk/panama-foreign/pull/564
Maurizio
[1] -
https://mail.openjdk.java.net/pipermail/panama-dev/2021-March/012580.html
More information about the panama-dev
mailing list