rethinking the role of MemorySegment vs. MemoryAddress
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Jul 17 15:47:21 UTC 2020
Hi,
over the last few weeks, as we’ve been running more real-world
benchmarks [1] against the generated bindings, we’ve been looking again
at the relationship between the MemorySegment and the MemoryAddress
abstractions. To recap, in the current state, a memory address can be
optionally attached to a segment (such an address is said to be
*checked*) and the segment in turn is attached to some temporal bound
info (aka MemoryScope). In order to dereference segment memory, the user
must have a *checked* address and pass it onto a memory access var
handle. So we have the following chain (for checked addresses, at least):
MemoryAddress -> MemorySegment -> MemoryScope
The non-trivial relationship between memory segments and memory
addresses raises a bunch of usability issues both for clients of native
bindings, but also for developers of libraries written on top of the
foreign memory access API:
* dereference operations are MemoryAddress-centric. This suggests that,
to dereference memory in a loop, we could either use an indexed var
handle, or use a non-indexed var handle and keep offsetting the same
pointer on each iteration (e.g. with MemoryAddress::addOffset). In
reality, dereferencing w/o an indexed var handle is much slower, since
it is hard for C2 to see that all the memory address instances being
created are really just the combination of some fixed address
(typically, the base address of some segment) plus an offset which is
derived from the loop induction variable. Here it feels like the API
should naturally drive the user towards the most efficient idiom, but
that's currently not the case.
* Various API points (see CSupport::toJavaStringRestricted) which
operate on MemoryAddress parameters have to defend themselves against
the possibility that a given memory address might _not_ have a segment
attached to it. While the distinction between checked and unchecked
address seems to make sense, it also implies that a client can never be
quite sure as to whether a given MemoryAddress instance can be operated
upon freely.
* The cost of dereferencing a random memory location (e.g. an address
obtained from some native call) is high; to do that we need to create a
segment (with its own scope instance), and then a memory address based
on it - that's 3 objects per dereference operation (address, segment and
scope). While we can hope that Valhalla might be able, down the road, to
reduce (or completely eliminate) the cost associated with allocation of
new MemoryAddress instances, it is far less obvious to predict whether
we will be able to do the same for MemorySegment (we certainly won’t be
able to do get any help for MemoryScope, which is mutable). Here it's
important to note that it might be ok in certain cases to pay extra
cost, if the developer wanted extra safety, for instance. But it seems
against the principle of the API to force this cost, especially in those
cases where there's not much to be gained in terms of safety (e.g. C
string dereference is a good example of this).
* Calls to MemorySegment::baseAddress() are frequent when interacting
with native bindings, and tend to get in the way of readability; what
we'd really like would be to just be able to pass a segment where a
memory address is expected, possibly without distorting the API too much.
After a lot of consideration, we propose to address the above issues in
the following 2 ways. First,we'd like to rethink memory dereference
operations around MemorySegments, and demote MemoryAddress to be just a
dumb carrier for an unsafe (Object/long) addressing pair. That is,
memory access var handle will go from being like this:
(MemoryAddress) -> T
To something like this:
(MemorySegment, long) -> T
where the long parameter expresses a byte offset (relative to the
segment) at which memory should be dereferenced. This move makes it very
clear that, in order to dereference memory, some segment is needed. This
could be a segment created from the user, but we plan to make some sort
of EVERYTHING segment available (through restricted access), so that
users are not _forced_ to create a new segment every time they need to
dereference an address obtained from native code. In other words, it’s
up to the user to decide how close to the metal things should be; the
closer, the less instances being created - as we move further (e.g. by
adding spatial bounds) and further (e.g. by also adding temporal bounds)
we end up paying more and more - which might be totally justifiable in
certain cases - but it’s ultimately a trade off that seems best left to
developers. Bonus point: if we go down this path, we no longer need to
dynamically spin memory access var handles
(MemoryAccessVarHandleGenerator) as we can just derive more complex
handles from the basic form shown above using regular var handle
combinators defined in MemoryHandles.
Secondly, we'd like to introduce a notion of Addressable entities - that
is, entities that can be mapped down to a MemoryAddress; turns out we
have quite a few of these (albeit some of them are in disguise):
* MemoryAddress - trivially, can be turned into an address with the
identity projection
* MemorySegment - can be mapped to an address via its baseAddress() method
* LibraryLookup symbols - right now LibraryLookup returns plain
addresses, but we could enhance the API to return a library symbol (with
given name and address)
* VaList, which can also be projected down to an address (VaList::address)
With such an abstraction, we can teach jextract to emit Addressable as a
carrier instead of MemoryAddress in parameter positions - so that users
could freely pass either a segment, or an address, or a valist, ...
without the need to manually convert one into the other.
A branch which implements all the aforementioned changes (including
jextract changes) is available at [2]. We also attempted to port the
existing examples in [1] to use the slightly revised API (see [3]). We
noticed some notable improvements in this new world. First, when
dereferencing a segment soon after creation, there's no longer need to
call baseAddress(), since dereference is now expressed in terms of
memory segments (this is mostly noticeable in the various benchmarks we
have). For instance, this:
@Benchmark
public void segment_loop() {
MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE);
for (int i = 0; i < ELEM_SIZE; i++) {
VH_int.set(segment.baseAddress(), (long) i, i);
}
segment.close();
}
becomes:
@Benchmark
public void segment_loop() {
MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE);
for (int i = 0; i < ELEM_SIZE; i++) {
VH_int.set(segment, (long) i, i);
}
segment.close();
}
Secondly, since jextract takes advantage of the new Addressable
abstraction, interacting with native bindings is also easier if you have
segments which need to be passed as pointers; for instance this:
String script = "print(sum([33, 55, 66]));
Py_Initialize();
try (var s = toCString(script)) {
var str = s.baseAddress();
PyRun_SimpleStringFlags(str, NULL);
Py_Finalize();
}
becomes:
String script = "print(sum([33, 55, 66]));
Py_Initialize();
try (var str = toCString(script)) {
PyRun_SimpleStringFlags(str, NULL);
Py_Finalize();
}
Finally, dereferencing memory locations obtained from native code
becomes much more straightforward; let's look at how the qsort
comparator function in StdLibTest can be simplified, from this:
static int qsortCompare(MemorySegment base, MemoryAddress
addr1, MemoryAddress addr2) {
return getIntAtOffset(base.baseAddress(),
addr1.rebase(base).segmentOffset()) -
getIntAtOffset(base.baseAddress(),
addr2.rebase(base).segmentOffset());
}
to this:
static int qsortCompare(MemorySegment base, MemoryAddress
addr1, MemoryAddress addr2) {
return getIntAtOffset(base, addr1.segmentOffset(base)) -
getIntAtOffset(base, addr2.segmentOffset(base));
}
While, in isolation, these might look like small simplifications, we
found that they add up considerably, resulting in more straightforward
code pretty much across the board, while at the same time providing an
API which feels _simpler_, as the separation of roles between segments
and addresses became much cleared. As such, we plan, over the next few
weeks to start pushing the contents of that branch onto the various
panama/foreign branches.
Cheers
Maurizio
[1] - https://github.com/sundararajana/panama-jextract-samples
[2] -
https://github.com/mcimadamore/panama-foreign/tree/segment_address_split
[3] -
https://github.com/sundararajana/panama-jextract-samples/tree/memoraddress-split%2Baddressable%2Bjextract
More information about the panama-dev
mailing list