add support for secondary carriers to memory access API

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Fri Jul 2 21:13:19 UTC 2021


Hi,
the Foreign Memory Access API does not support certain carrier types, 
such as boolean or MemoryAddress. Some in-depth discussion on support 
for booleans is captured in this excellent email [1] from John. I've 
recently picked up those comments again, to see if the Memory Access API 
can be enhanced to support more carriers.

Let's start with booleans. Providing support for boolean carriers would 
be a nice to have, since the C language (since C99) has builtin support 
for boolean types; while prior to C99, booleans were usually modeled as 
ints in C code, C99 changes that and adds a new `bool` type which is 
represented more efficiently (depending on the ABI). AFAIK, on all 
64-bit platforms (both Arm and x64), a `bool`/`_Bool` is always 
represented with _one_ byte, which is fortunate, given that this is also 
the way in which the JVM represents booleans.

There are, however, important distinctions between C booleans and JVM 
booleans:

* in the JVM, a boolean value can be only 1 or 0. No other value is 
allowed (as per JVMS). This invariant is preserved despite the bytecode 
operation operating on boolean arrays (baload/bastore) is the _same_ 
used for byte arrays (that is, the JVM applies a normalization step when 
storing values into boolean arrays)
* because of the above, the JVM test for "truth" is typically (value & 
1), whereas in C the same test is typically expressed as (value != 0).

So, can we "just" add support for booleans to the memory access API? Not 
quite, at least not w/o some tweaks. Given the distinctions mentioned 
above, it is almost always a bad idea to use the Memory Access API to 
access a Java boolean[]. If we allowed that, we would need to worry 
about code like the one below:

```
boolean[] array = ...
MemorySegment segment = MemorySegment.ofArray(array);
MemoryAccess.setFloat(segment, 4.2); // ???
```

That is, if we can _alias_ a boolean[] with a memory segment, then we 
can just write a float value into the boolean[], which surely will 
violate the invariant that each element of a boolean[] must be either 1 
or 0.

So, unless we want to do some heroics - e.g. detect that the segment is 
indeed backed by a boolean[], and then apply a normalization step on top 
- which are likely going to introduce extra costs and overheads, a much 
saner solution seems to be to just _forbid_ memory segment views of heap 
boolean[]. That is, no MemorySegment,ofArray(boolean[])).

That's good, but there's a bunch of related APIs which we need to 
consider as well:

1) MemorySegment.toBooleanArray()
2) MemoryCopy.toArray(.... boolean[] ... )
3) MemoryCopy.fromArray(.... boolean[] ... )

These API points all allow, in one way or another, to transfer the 
contents of a memory segments into a boolean array on the heap (and 
back). Now, for the reasons mentioned above, bulk-copying a segment into 
a boolean[] is almost always a bad idea, since there's no guarantee that 
the values in the source memory segment will be normalized according to 
the JVMS. We could, in principle, add a copy operation which, instead of 
using a simple bulk transfer, used a loop, and then normalized elements 
one by one. Overall, I'm not convinced it makes sense to mix "true" bulk 
transfer operations with "loopy" transfers under the same API names - as 
that would create confusion on the performance expectations the users 
would have when calling these API methods. So, that seems to suggest 
that supporting the (1) and (2) is also a no go.

But, if we disallowed these, then I don't think there's much point in 
supporting (3) alone, even though, in itself, it would not be 
problematic. In other words, I propose _not_ to add any of the above 
three methods to the Foreign Memory Access API.

So, what can you do with booleans in this world? You can:

* create a VarHandle that dereferences a memory (byte) location, viewing 
it as a Java boolean
* use boolean as a target carrier when defining a downcall method handle

I think this is still a nice outcome in terms of usability, especially 
when considering code that wants to interop with C libraries using the 
C99 `bool` type.

For MemoryAddress, I think similar arguments apply, although adapting a 
memory address into a long and back is not as problematic as it is for 
booleans. The main issue with MemoryAddress is that, again, bulk 
operations are only really "bulk" at the surface; for instance, moving 
data from a memory segment into a MemoryAddress[] is done (a) by 
bulk-copying a memory segment into a long[] (or int[] if on 32-bits!) 
and then by (b) adapting each long back into a MemoryAddress instance. 
Again, to encourage a world where API methods have a similar performance 
model, I think that bulk operations involving MemoryAddress[] are 
misleading, and go beyond the scope of the Foreign Memory Access API.

Circling back, it seem we have reached a state where the Foreign Memory 
Access API provides support for _two_ kinds of carriers:

* primary carriers: byte, short, char, int, float, double
* secondary carriers: boolean, MemoryAddress

While the story, so far, has been to focus on the primary carriers, I 
think there is room to expand that story and add support for "secondary" 
carriers as well, albeit with the limitations described above (e.g. no 
support for bulk copy).

A draft PR for this work is available here:

https://github.com/openjdk/panama-foreign/pull/564

Maurizio

[1] - 
https://mail.openjdk.java.net/pipermail/panama-dev/2021-March/012580.html




More information about the panama-dev mailing list