rethinking the role of MemorySegment vs. MemoryAddress

Fri Jul 17 15:47:21 UTC 2020

Hi,
over the last few weeks, as we’ve been running more real-world 
benchmarks [1] against the generated bindings, we’ve been looking again 
at the relationship between the MemorySegment and the MemoryAddress 
abstractions. To recap, in the current state, a memory address can be 
optionally attached to a segment (such an address is said to be 
*checked*) and the segment in turn is attached to some temporal bound 
info (aka MemoryScope). In order to dereference segment memory, the user 
must have a *checked* address and pass it onto a memory access var 
handle. So we have the following chain (for checked addresses, at least):

MemoryAddress -> MemorySegment -> MemoryScope

The non-trivial relationship between memory segments and memory 
addresses raises a bunch of usability issues both for clients of native 
bindings, but also for developers of libraries written on top of the 
foreign memory access API:

* dereference operations are MemoryAddress-centric. This suggests that, 
to dereference memory in a loop, we could either use an indexed var 
handle, or use a non-indexed var handle and keep offsetting the same 
pointer on each iteration (e.g. with MemoryAddress::addOffset). In 
reality, dereferencing w/o an indexed var handle is much slower, since 
it is hard for C2 to see that all the memory address instances being 
created are really just the combination of some fixed address 
(typically, the base address of some segment) plus an offset which is 
derived from the loop induction variable. Here it feels like the API 
should naturally drive the user towards the most efficient idiom, but 
that's currently not the case.

* Various API points (see CSupport::toJavaStringRestricted) which 
operate on MemoryAddress parameters have to defend themselves against 
the possibility that a given memory address might _not_ have a segment 
attached to it. While the distinction between checked and unchecked 
address seems to make sense, it also implies that a client can never be 
quite sure as to whether a given MemoryAddress instance can be operated 
upon freely.

* The cost of dereferencing a random memory location (e.g. an address 
obtained from some native call) is high; to do that we need to create a 
segment (with its own scope instance), and then a memory address based 
on it - that's 3 objects per dereference operation (address, segment and 
scope). While we can hope that Valhalla might be able, down the road, to 
reduce (or completely eliminate) the cost associated with allocation of 
new MemoryAddress instances, it is far less obvious to predict whether 
we will be able to do the same for MemorySegment (we certainly won’t be 
able to do get any help for MemoryScope, which is mutable). Here it's 
important to note that it might be ok in certain cases to pay extra 
cost, if the developer wanted extra safety, for instance. But it seems 
against the principle of the API to force this cost, especially in those 
cases where there's not much to be gained in terms of safety (e.g. C 
string dereference is a good example of this).

* Calls to MemorySegment::baseAddress() are frequent when interacting 
with native bindings, and tend to get in the way of readability; what 
we'd really like would be to just be able to pass a segment where a 
memory address is expected, possibly without distorting the API too much.

After a lot of consideration, we propose to address the above issues in 
the following 2 ways. First,we'd like to rethink memory dereference 
operations around MemorySegments, and demote MemoryAddress to be just a 
dumb carrier for an unsafe (Object/long) addressing pair. That is, 
memory access var handle will go from being like this:

(MemoryAddress) -> T

To something like this:

(MemorySegment, long) -> T

where the long parameter expresses a byte offset (relative to the 
segment) at which memory should be dereferenced. This move makes it very 
clear that, in order to dereference memory, some segment is needed. This 
could be a segment created from the user, but we plan to make some sort 
of EVERYTHING segment available (through restricted access), so that 
users are not _forced_ to create a new segment every time they need to 
dereference an address obtained from native code.  In other words, it’s 
up to the user to decide how close to the metal things should be; the 
closer, the less instances being created - as we move further (e.g. by 
adding spatial bounds) and further (e.g. by also adding temporal bounds) 
we end up paying more and more - which might be totally justifiable in 
certain cases - but it’s ultimately a trade off that seems best left to 
developers. Bonus point: if we go down this path, we no longer need to 
dynamically spin memory access var handles 
(MemoryAccessVarHandleGenerator) as we can just derive more complex 
handles from the basic form shown above using regular var handle 
combinators defined in MemoryHandles.

Secondly, we'd like to introduce a notion of Addressable entities - that 
is, entities that can be mapped down to a MemoryAddress; turns out we 
have quite a few of these (albeit some of them are in disguise):

* MemoryAddress - trivially, can be turned into an address with the 
identity projection

* MemorySegment - can be mapped to an address via its baseAddress() method

* LibraryLookup symbols - right now LibraryLookup returns plain 
addresses, but we could enhance the API to return a library symbol (with 
given name and address)

* VaList, which can also be projected down to an address (VaList::address)

With such an abstraction, we can teach jextract to emit Addressable as a 
carrier instead of MemoryAddress in parameter positions - so that users 
could freely pass either a segment, or an address, or a valist, ... 
without the need to manually convert one into the other.

A branch which implements all the aforementioned changes (including 
jextract changes) is available at [2]. We also attempted to port the 
existing examples in [1] to use the slightly revised API (see [3]). We 
noticed some notable improvements in this new world. First, when 
dereferencing a segment soon after creation, there's no longer need to 
call baseAddress(), since dereference is now expressed in terms of 
memory segments (this is mostly noticeable in the various benchmarks we 
have). For instance, this:

     @Benchmark
     public void segment_loop() {
         MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE);
         for (int i = 0; i < ELEM_SIZE; i++) {
             VH_int.set(segment.baseAddress(), (long) i, i);
         }
         segment.close();
     }

becomes:

@Benchmark
     public void segment_loop() {
         MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE);
         for (int i = 0; i < ELEM_SIZE; i++) {
             VH_int.set(segment, (long) i, i);
         }
         segment.close();
     }

Secondly, since jextract takes advantage of the new Addressable 
abstraction, interacting with native bindings is also easier if you have 
segments which need to be passed as pointers; for instance this:

         String script = "print(sum([33, 55, 66]));
         Py_Initialize();
         try (var s = toCString(script)) {
             var str = s.baseAddress();
             PyRun_SimpleStringFlags(str, NULL);
             Py_Finalize();
         }

becomes:

String script = "print(sum([33, 55, 66]));
       Py_Initialize();
       try (var str = toCString(script)) {
           PyRun_SimpleStringFlags(str, NULL);
           Py_Finalize();
       }

Finally, dereferencing memory locations obtained from native code 
becomes much more straightforward; let's look at how the qsort 
comparator function in StdLibTest can be simplified, from this:

         static int qsortCompare(MemorySegment base, MemoryAddress 
addr1, MemoryAddress addr2) {
             return getIntAtOffset(base.baseAddress(), 
addr1.rebase(base).segmentOffset()) -
                    getIntAtOffset(base.baseAddress(), 
addr2.rebase(base).segmentOffset());
         }

to this:

         static int qsortCompare(MemorySegment base, MemoryAddress 
addr1, MemoryAddress addr2) {
             return getIntAtOffset(base, addr1.segmentOffset(base)) -
                    getIntAtOffset(base, addr2.segmentOffset(base));
         }

While, in isolation, these might look like small simplifications, we 
found that they add up considerably, resulting in more straightforward 
code pretty much across the board, while at the same time providing an 
API which feels _simpler_, as the separation of roles between segments 
and addresses became much cleared. As such, we plan, over the next few 
weeks to start pushing the contents of that branch onto the various 
panama/foreign branches.

Cheers
Maurizio

[1] - https://github.com/sundararajana/panama-jextract-samples
[2] - 
https://github.com/mcimadamore/panama-foreign/tree/segment_address_split
[3] - 
https://github.com/sundararajana/panama-jextract-samples/tree/memoraddress-split%2Baddressable%2Bjextract