[foreign-jextract] RFR: Refresh docs to reflect Java 18 API changes

Tue Jan 18 14:52:09 UTC 2022

On Tue, 18 Jan 2022 13:59:16 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> This patch updates the memory access and linker docs to reflect the Java 18 API changes.

Looks good! Some suggestions inline.

doc/panama_ffi.md line 7:

> 5: **Maurizio Cimadamore**
> 6: 
> 7: Panama support foreign functions through the Foreign Memory Access API, which has been available as an [incubating](https://openjdk.java.net/jeps/11) API since Java [16](https://openjdk.java.net/jeps/389). The central abstraction in the Foreign Linker API is the *foreign linker*, which allows clients to construct *downcall* method handles — that is, method handles whose invocation targets a native function defined in some native library. In other words, Panama foreign function support is completely expressed in terms of Java code and no intermediate native code is required.

Suggestion:

Panama supports foreign functions through the Foreign Linker API, which has been available as an [incubating](https://openjdk.java.net/jeps/11) API since Java [16](https://openjdk.java.net/jeps/389). The central abstraction in the Foreign Linker API is the *foreign linker*, which allows clients to construct *downcall* method handles — that is, method handles whose invocation targets a native function defined in some native library. In other words, Panama foreign function support is completely expressed in terms of Java code and no intermediate native code is required.

doc/panama_ffi.md line 13:

> 11: Before we dive into the specifics of the foreign function support, it would be useful to briefly recap some of the main concepts we have learned when exploring the [foreign memory access support](panama_memaccess.md). The Foreign Memory Access API allows client to create and manipulate *memory segments*. A memory segment is a view over a memory source (either on- or off-heap) which is spatially bounded, temporally bounded and thread-confined. The guarantees ensure that dereferencing a segment that has been created by Java code is always *safe*, and can never result in a VM crash, or, worse, in silent memory corruption.
> 12: 
> 13: Now, in the case of memory segments, the above properties (spatial bounds, temporal bounds and confinement) can be known *in full* when the segment is created. But when we interact with native libraries we often receiving *raw* pointers; such pointers have no spatial bounds (does a `char*` in C refer to one `char`, or a `char` array of a given size?), no notion of temporal bounds, nor thread-confinement. Raw addresses in our interop support are modeled using the `MemoryAddress` abstraction.

Suggestion:

Now, in the case of memory segments, the above properties (spatial bounds, temporal bounds and confinement) can be known *in full* when the segment is created. But when we interact with native libraries we often receive *raw* pointers; such pointers have no spatial bounds (does a `char*` in C refer to one `char`, or a `char` array of a given size?), no notion of temporal bounds, nor thread-confinement. Raw addresses in our interop support are modeled using the `MemoryAddress` abstraction.

doc/panama_ffi.md line 109:

> 107: ```java
> 108: interface CLinker {
> 109:     MethodHandle downcallHandle(NativeSymbol func, MethodType type, FunctionDescriptor function);

Suggestion:

    MethodHandle downcallHandle(NativeSymbol func, FunctionDescriptor function);

doc/panama_ffi.md line 176:

> 174: Here we are using a native segment allocator to convert a Java string into an off-heap memory segment which contains a `NULL` terminated C string. We then pass that segment to the method handle and retrieve our result in a Java `long`. Note how all this is possible *without* any piece of intervening native code — all the interop code can be expressed in (low level) Java. Note also how we use an explicit resource scope to control the lifecycle of the allocated C string, which ensures timely deallocation of the memory segment holding the native string.
> 175: 
> 176: The `CLinker` interface also supports linking of native function without an address known at link time; when that happens, an address (of type `Addressable`) must be provided when the method handle returned by the linker is invoked — this is very useful to support *virtual calls*. For instance, the above code can be rewritten as follows:

Suggestion:

The `CLinker` interface also supports linking of native functions without an address known at link time; when that happens, an address (of type `Addressable`) must be provided when the method handle returned by the linker is invoked — this is very useful to support *virtual calls*. For instance, the above code can be rewritten as follows:

doc/panama_ffi.md line 194:

> 192: It is important to note that, albeit the interop code is written in Java, the above code can *not* be considered 100% safe. There are many arbitrary decisions to be made when setting up downcall method handles such as the one above, some of which might be obvious to us (e.g. how many parameters does the function take), but which cannot ultimately be verified by the Panama runtime. After all, a symbol in a dynamic library is nothing but a numeric offset and, unless we are using a shared library with debugging information, no type information is attached to a given library symbol. This means that the Panama runtime has to *trust* the function descriptor passed in<a href="#3"><sup>3</sup></a>; for this reason, access to the foreign linker is a restricted operation, which can only be performed if the requesting module is listed in the `--enable-native-access` command-line flag.
> 193: 
> 194: If a native function returns a raw pointer (of type `MemoryAddress`), it is then up to the client to make sure that the address is being accessed and disposed of correctly, compatibly with the requirements of the underlying native library. If a native function returns a struct by value, a *fresh*, memory segment is allocated off-heap and returned to the callee. In such cases, the foreign linker API will request an additional prefix `SegmentAllocator` (see above) parameter which will be used by `CLinker` to allocate the returned segment. The allocation will likely associate the segment with a *resource scope* that is known to the callee and which can then be used to release the memory associated with that segment. 

Suggestion:

If a native function returns a raw pointer (of type `MemoryAddress`), it is then up to the client to make sure that the address is being accessed and disposed of correctly, compatibly with the requirements of the underlying native library. If a native function returns a struct by value, a *fresh*, memory segment is allocated off-heap and returned to the caller. In such cases, the downcall method handle will feature an additional prefix `SegmentAllocator` (see above) parameter which will be used by the downcall method handle to allocate the returned segment. The allocation will likely associate the segment with a *resource scope* that is known to the caller and which can then be used to release the memory associated with that segment.

doc/panama_ffi.md line 288:

> 286: try (ResourceScope scope = ResourceScope.newConfinedScope()) {
> 287:     SegmentAllocator malloc = SegmentAllocator.nativeAllocator(scope);
> 288: 	printf.invoke(malloc.allocateUtf8String("%d plus %d equals %d"), 2, 2, 4); //prints "2 plus 2 equals 4"

Suggestion:

    SegmentAllocator malloc = SegmentAllocator.nativeAllocator(scope);
    printf.invoke(malloc.allocateUtf8String("%d plus %d equals %d"), 2, 2, 4); //prints "2 plus 2 equals 4"

doc/panama_memaccess.md line 113:

> 111: In addition to spatial bounds, memory segments also feature temporal bounds as well as thread-confinement. In the examples shown so far, we have always used the API in its simpler form, leaving the runtime to handle details such as whether it was safe or not to reclaim memory associated with a given memory segment. But there are cases where this behavior is not desirable: consider the case where a large memory segment is mapped from a file (this is possible using `MemorySegment::map`); in this case, an application would probably prefer to deterministically release (e.g. unmap) the memory associated with this segment, to ensure that memory doesn't remain available for longer than in needs to (and therefore potentially impacting the performance of the application).
> 112: 
> 113: Memory segments support deterministic deallocation, through an abstraction called `ResourceScope`. A resource scope models the lifecycle associated with one or more resources (in this document, by resources we mean mostly memory segments); a resource scope has a state: it starts off in the *alive* state, which means that all the resources it manages can be safely accessed — and, at the user request, it can be *closed*. After a resource scope is closed, access to resources managed by that scope is no longer allowed. Resource scopes support the `AutoCloseable` interface, and can therefore be used with the *try-with-resources* construct, as demonstrated in the following code:

Suggestion:

Memory segments support deterministic deallocation, through an abstraction called `ResourceScope`. A resource scope models the lifecycle associated with one or more resources (in this document, by resources we mean mostly memory segments); a resource scope has a state: it starts off in the *alive* state, which means that all the resources it manages can be safely accessed — and, at the user's request, it can be *closed*. After a resource scope is closed, access to resources managed by that scope is no longer allowed. Resource scopes implement the `AutoCloseable` interface, and can therefore be used with the *try-with-resources* construct, as demonstrated in the following code:

doc/panama_memaccess.md line 223:

> 221: That said, it is sometimes necessary to create a segment out of an existing memory source, which might be managed by native code. This is the case, for instance, if we want to create a segment out of a memory region managed by a *custom allocator*.
> 222: 
> 223: The ByteBuffer API allows such a move, through a JNI [method](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#NewDirectByteBuffer), namely `NewDirectByteBuffer`. This native method can be used to wrap a long address in a fresh byte direct buffer instance which is then returned to unsuspecting Java code.

Suggestion:

The ByteBuffer API allows such a move, through a JNI [method](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#NewDirectByteBuffer), namely `NewDirectByteBuffer`. This native method can be used to wrap a long address in a fresh direct byte buffer instance which is then returned to unsuspecting Java code.

-------------

Marked as reviewed by jvernee (Committer).

PR: https://git.openjdk.java.net/panama-foreign/pull/630