[foreign-memaccess] RFR: JDK-8241772: MemorySegment should offer a spliterator

Maurizio Cimadamore mcimadamore at openjdk.java.net
Fri Mar 27 18:55:29 UTC 2020


This patch rethinks how shared memory segment work; up to now, our story for sharing a segment across multiple threads
has been to support a ref-counting strategy (see MemorySegment::acquire). While this mechanism is very general, and can
be used to support different idioms, it also significantly increases the complexity surface of the memory access API;
also, the constrained nature of the acquire() mechanism is, in some occasions, still not general enough to support all
the use cases - the cases which cannot workaround the confinement guarantees.

Because of this, I believe it is a better course of action to restrict the memory segment sharing capabilities to those
use cases that are well-behaved enough: parallel processing of a segment via divide-and-conquer. This can be achieved
rather succintly by having a segment offering a spliterator, which can then be used to create e.g. a parallel stream
(see example in the test), or to create a fork/join recursive action (again, see test for some examples).

Of course we will need some kind of escape hatch for when a "true" shared segment is required - this could be provided
by either offering a way to register a segment against a Cleaner, and giving up the deterministic deallocation
guarantees in that case; the need for such GC-managed segment has already arisen when dealing with native interop use
cases (e.g. a binding might need to create long-lived constants which are basically never closed).

Alternatively, the escape hatch could be an unsafe segment which is not thread confined, and can be closed from any
thread (this is a solution that, I'm sure, some of the people I've chatted with will appreciate).

Either way, it is important to realize that the long-lived, shared segment is pretty far from the design center of the
memory access API - whose goal is, after all, to improve over some of the things that direct buffers aren't so great
at. If you are after a big, long-lived slab of shared memory, chances are that the ByteBuffer is already good enough at
handling those.

Where the pressure comes from, is, we believe, in those allocation-intensive scenarios, where the cleaner mechanism
(which is the only choice for direct buffers) is simply not enough to guarantee a reasonable throughput. These are also
the cases that are crucial to achieve an efficient native interop support.

So, let's focus on what this API does best - which is managing confined memory segments - and let's build on that to
cover the divide and conquer use cases (through the spliterator machinery).

Sidebar: for those of you worried that the lack of acquire() will make native interop harder, we are planning to
_remove_ confinement restrictions from addresses obtained from native libraries - meaning that if you get a
MemoryAddress from a native library, you will be able to share it with any thread you want, and such threads will be
able to dereference the address w/o any need to workaround the confinement guarantees.

Moving forward, there are other things we need to do after this patch is done - as mentioned, providing a GC-backed
escape hatch would probably be a good addition; but we also have to add more ways to share a segment with other threads
in serial-confinement mode (e.g. one thread at a time); the main operation we're looking at is an 'handoff' operation
which replaces the owner thread with a different one (useful in producer/consumer use cases) and, possibly, also a
detach/attach pair of operations which can be used to temporarily remove ownership from a segment, and then have it
picked up by a second thread - this is is effectively similar to handoff, but where the two threads don't know each
other.

In addition to that, I'm planning to add a bunch of helper methods to SequenceLayout to help with reshaping and
flattening sequence layouts; this operation is very useful when a client needs/want to define a splitearator which
works on multiple elements at a time (see the ParallelSum benchmark for an example). I'll be filing a PR for this
separately.

Cheers
Maurizio

-------------

Commit messages:
 - Fix white spaces
 - * Fix javadoc
 - More fixes
 - Remove comments on TestSpliterator
 - Fix threshold in ParallelSum benchmark
 - Fix ParallelSum benchmark to always use default FJP
 - Fix semantics of tryAdvance
 - Add MemorySegment::spliterator

Changes: https://git.openjdk.java.net/panama-foreign/pull/71/files
 Webrev: https://webrevs.openjdk.java.net/panama-foreign/71/webrev.00
  Issue: https://bugs.openjdk.java.net/browse/JDK-JDK-8241772
  Stats: 700 lines in 12 files changed: 618 ins; 29 del; 53 mod
  Patch: https://git.openjdk.java.net/panama-foreign/pull/71.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-foreign pull/71/head:pull/71

PR: https://git.openjdk.java.net/panama-foreign/pull/71


More information about the panama-dev mailing list