RFR: 8254162: Implementation of Foreign-Memory Access API (Third Incubator) [v9]

Tue Oct 13 10:27:43 UTC 2020

> This patch contains the changes associated with the third incubation round of the foreign memory access API incubation
> (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways:
> * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from
>   multiple threads
> * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee
>   that the memory will be deallocated, eventually
> * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class
>   has been added, which defines several useful dereference routines; these are really just thin wrappers around memory
>   access var handles, but they make the barrier of entry for using this API somewhat lower.
> 
> A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not
> the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link
> to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit
> of dereference.  This has all changed as per this API refresh;  now a `MemoryAddress` is just a dumb carrier which
> wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as
> dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability
> in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`;
> secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can
> use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done
> by calling `MemoryAddress::asSegmentRestricted`).  A list of the API, implementation and test changes is provided
> below. If  you have any questions, or need more detailed explanations, I (and the  rest of the Panama team) will be
> happy to point at existing discussions,  and/or to provide the feedback required.   A big thank to Erik Osterlund,
> Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible; also I'd
> like to thank Paul Sandoz, whose insights on API design have been very helpful in this journey.  Thanks  Maurizio
> Javadoc:   http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html
> Specdiff:
> 
> http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html
> 
> CSR:
> 
> https://bugs.openjdk.java.net/browse/JDK-8254163
> 
> 
> 
> ### API Changes
> 
> * `MemorySegment`
>   * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below)
>   * added a no-arg factory for a native restricted segment representing entire native heap
>   * rename `withOwnerThread` to `handoff`
>   * add new `share` method, to create shared segments
>   * add new `registerCleaner` method, to register a segment against a cleaner
>   * add more helpers to create arrays from a segment e.g. `toIntArray`
>   * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors)
>   * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`)
> * `MemoryAddress`
>   * drop `segment` accessor
>   * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative
>     to a given segment
> * `MemoryAccess`
>   * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a
>     carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access
>     base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte
>     offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs.
>     `getByteAtOffset` vs `getByteAtIndex`).
> * `MemoryHandles`
>   * drop `withOffset` combinator
>   * drop `withStride` combinator
>   * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which
>     it is easy to derive all the other handles using plain var handle combinators.
> * `Addressable`
>   * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both
>     `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients
>     can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`.
> * `MemoryLayouts`
>   * A new layout, for machine addresses, has been added to the mix.
> 
> 
> 
> ### Implementation changes
> 
> There are two main things to discuss here: support for shared segments, and the general simplification of the memory
> access var handle support.
> #### Shared segments
> 
> The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to
> achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment
> is shared, it would be possible for a thread to close it while another is accessing it.  After considering several
> options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he
> reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world
> (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a
> close operation. For this to work, it  is crucial that no GC safepoints can occur between a segment liveness check and
> the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call).
> It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints.  Sadly, none of
> these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First,
> we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to
> whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of
> stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]).  The question is, then,
> once we detect that a thread is accessing the very segment we're about to close, what should happen? We first
> experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it
> fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the
> machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to
> minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread
> is accessing the segment being closed.  As written in the javadoc, this doesn't mean that clients should just catch and
> try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should
> be treated as such.  In terms of gritty implementation, we needed to centralize memory access routines in a single
> place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in
> addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped`
> annotation, which tells the VM that something important is going on.  To achieve this, we created a new (autogenerated)
> class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access,
> like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is
> tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during
> access (which is important when registering segments against cleaners).  Of course, to make memory access safe, memory
> access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead
> of unsafe, so that a liveness check can be triggered (in case a scope is present).  `ScopedMemoryAccess` has a
> `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed
> successfully.  The implementation of `MemoryScope` (now significantly simplified from what we had before), has two
> implementations, one for confined segments and one for shared segments; the main difference between the two is what
> happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared
> segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or
> `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING`
> state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail.  ####
> Memory access var handles overhaul  The key realization here was that if all memory access var handles took a
> coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle
> form.  This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var
> handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that
> e.g. additional offset is injected into a base memory access var handle.  This also helped in simplifying the
> implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level
> access on the innards of the memory access var handle. All that code is now gone.  #### Test changes  Not much to see
> here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test,
> since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner`
> functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the
> microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared
> segment case.  [1] - https://openjdk.java.net/jeps/393 [2] - https://openjdk.java.net/jeps/389 [3] -
> https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html [4] - https://openjdk.java.net/jeps/312

Maurizio Cimadamore has updated the pull request with a new target base due to a merge or a rebase. The incremental
webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits
since the last revision:

 - Merge branch 'master' into 8254162
 - Simplify example in the toplevel javadoc
 - Tweak support for mapped memory segments
 - Tweak referenced to MemoryAddressProxy in Utils.java
 - Fix performance issue with "small" segment mismatch
 - Address review comments
 - Fix indent in GensrcScopedMemoryAccess.gmk
 - Address review comments
 - Add modified files
 - RFR 8254162: Implementation of Foreign-Memory Access API (Third Incubator)

   This patch contains the changes associated with the third incubation round of the foreign memory access API incubation
   (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways:

   * first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from
     multiple threads
   * second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee
     that the memory will be deallocated, eventually
   * third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class
     has been added, which defines several useful dereference routines; these are really just thin wrappers around memory
     access var handles, but they make the barrier of entry for using this API somewhat lower.

   A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not
   the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link
   to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit
   of dereference.

   This has all changed as per this API refresh;  now a `MemoryAddress` is just a dumb carrier which wraps a pair of
   object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is
   concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways -
   first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is
   crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that;
   otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling
   `MemoryAddress::asSegmentRestricted`).

   A list of the API, implementation and test changes is provided below. If  you have any questions, or need more detailed
   explanations, I (and the  rest of the Panama team) will be happy to point at existing discussions,  and/or to provide
   the feedback required.

   A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would
   not have been possible.

   Thanks
   Maurizio

   Javadoc:

   http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html

   Specdiff:

   http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html

   CSR:

   https://bugs.openjdk.java.net/browse/JDK-8254163

   * `MemorySegment`
     * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below)
     * added a no-arg factory for a native restricted segment representing entire native heap
     * rename `withOwnerThread` to `handoff`
     * add new `share` method, to create shared segments
     * add new `registerCleaner` method, to register a segment against a cleaner
     * add more helpers to create arrays from a segment e.g. `toIntArray`
     * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors)
     * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`)
   * `MemoryAddress`
     * drop `segment` accessor
     * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative
       to a given segment
   * `MemoryAccess`
     * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a
       carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access
       base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte
       offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs.
       `getByteAtOffset` vs `getByteAtIndex`).
   * `MemoryHandles`
     * drop `withOffset` combinator
     * drop `withStride` combinator
     * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which
       it is easy to derive all the other handles using plain var handle combinators.
   * `Addressable`
     * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both
       `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients
       can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`.
   * `MemoryLayouts`
     * A new layout, for machine addresses, has been added to the mix.

   There are two main things to discuss here: support for shared segments, and the general simplification of the memory
   access var handle support.

   The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to
   achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment
   is shared, it would be possible for a thread to close it while another is accessing it.

   After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew
   Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we
   could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being
   accessed concurrently to a close operation. For this to work, it  is crucial that no GC safepoints can occur between a
   segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just
   right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across
   safepoints.

   Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of
   creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very
   simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close
   operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312
   [4]).

   The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should
   happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread,
   thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds.
   Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async
   exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that
   another thread is accessing the segment being closed.

   As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a
   bug in the user code, likely arising from lack of synchronization, and should be treated as such.

   In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could
   have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a
   liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM
   that something important is going on.

   To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main
   memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the
   access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix
   to make sure that the scope is kept alive during access (which is important when registering segments against cleaners).

   Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should
   use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is
   present).

   `ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the
   handshake completed successfully.

   The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations,
   one for confined segments and one for shared segments; the main difference between the two is what happens when the
   scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a
   `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending
   on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state,
   `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail.

   The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and
   `long`, all other access types could be derived from this basic var handle form.

   This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles
   (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g.
   additional offset is injected into a base memory access var handle.

   This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators,
   which previously needed low-level access on the innards of the memory access var handle. All that code is now gone.

   Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the
   array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the
   `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the
   microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared
   segment case.

   [1] - https://openjdk.java.net/jeps/393
   [2] - https://openjdk.java.net/jeps/389
   [3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html
   [4] - https://openjdk.java.net/jeps/312

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/548/files
  - new: https://git.openjdk.java.net/jdk/pull/548/files/d14d06a4..8815d941

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=08
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=548&range=07-08

  Stats: 22231 lines in 447 files changed: 12727 ins; 6345 del; 3159 mod
  Patch: https://git.openjdk.java.net/jdk/pull/548.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/548/head:pull/548

PR: https://git.openjdk.java.net/jdk/pull/548