Exposing concrete types of segments and addresses

Mon Dec 20 22:29:38 UTC 2021

Hi,
thanks for your email. This is a really tricky area, where no optimal 
solution exists yet.

First, we have recently spotted an issue with escape analysis not 
working correctly with memory segments - for this I filed the issue:

https://bugs.openjdk.java.net/browse/JDK-8278429

Which has been closed as a duplicate of another VM bug which is being 
worked on. I believe that fix should generally improve all scenario 
where there is a bottleneck due to failure of scalarization when 
creating new segments (e.g. slicing).

That said, this does not address your fundamental point that, at the end 
of the day, some of these optimizations depend on the ability of C2 to 
inline through code (but this is also true for the ByteBuffer API).

The ultimate solution would be IMHO to make memory segment _less_ 
polymorphic, by having a single implementation class which then 
delegates its memory access behavior to a secondary abstraction (which 
could be a constant, based on the access type: on-heap, or off-heap).

If we did that, a memory segment would become a dumb wrapper around a 
base object, a length and some (constant) access object helper.

Unfortunately this solution (which we have tried) doesn't work - because 
Unsafe memory access needs to know whether access is going to be on- or 
off-heap (in order to remove important memory barriers). Currently this 
is done with the help of type profiling: if we are accessing memory on a 
type that C2 can prove to be "NativeMemorySegmentImpl", then C2 also 
knows that access is going to be off-heap - and unsafe access is fast. 
To have profiling working correctly we need one concrete segment type 
for each possible access type (native, mapped, and one for each 
primitive on-heap array). But if there's only one concrete type, there's 
no type profiling to go on, so we gain monomorphism, but we lose (very 
badly) when it comes to profile pollution exposure. To fix this, we need 
better ways to do type profiling (based not only on receiver/parameter 
types, but maybe the type of some fields in an instance).

Now, in the current implementation we can hide the polymorphism, pretty 
much like ByteBuffer does, under a common interface. Exposing concrete 
types as you suggest is going to be painful - as users will see another 
9 more segment types (7 primitive arrays, + mapped + native), which 
would increase the size of the API quite considerably. Maybe some 
intermediate point might also be useful to consider (e.g. perhaps only 
two types - for native segments and heap segments, but do not 
differentiate between mapped/native or between byte[] and long[] in the 
public API). But we need to conder any such moves very carefully: while 
we can add these types very easily in the future, if it proves to be the 
only possible path (e.g. even after Valhalla) in order to use memory 
segments sanely, the reverse is not true: if we add these new types now, 
and later on we discover these new types to be superseded by some new VM 
optimization, or better support thanks to Valhalla, we'd be stuck with 
these types for a long time.

I think at this point in time we'd like to know where the performance 
potholes are - so if you happen to have a benchmark which shows the 
problem you discussed, we'd be very happy to take a look. Our experience 
so far seems to suggest that performance is acceptable - even in cases 
where segments are created in very hot paths (we do have a spliterator 
test which indundates the system with slices - and that doesn't seem to 
perform too bad). At the same time, I can believe you when you say that 
some of the optimizations we might rely upon are fraglie (I've been 
there when using the API on my own, so the mileage of certain idioms can 
vary).

Unfortunately this is a bigger problem IMHO than just MemorySegments: 
currently writing immutable APIs in Java can lead to spotty performance. 
The hope is that Valhalla will give us tools to help us manage that kind 
of complexity - but even then, some of the optimizations (e.g. 
scalarization) might be gated by excessive polymorphism and/or lack of 
inlining. If we can improve the VM enough to do the type profiling we 
need to keep unsafe access sharp even in the face of a "monomorphic" 
implementation, then I believe the current API could take advantage of 
Valhalla in a more straightfoward fashion (and we could, in the future, 
add Valhalla optimizations to special case treatment for sealed 
interfaces whose only implementation is a primitive class).

[Btw, this discussion is really about MemorySegment - for MemoryAddress, 
in my own experiments I could already see Valhalla making quick work of 
all the address instantiations - as MemoryAddressImpl is the only 
implementation of MemoryAddress].

Maurizio

On 20/12/2021 07:21, Quân Anh Mai wrote:
> Hi,
>
> Currently, we can only access MemorySegments and MemoryAddresses through
> the respective interface. While this provides a nice interface for all
> kinds of memory segments, the lack of ability to use the concrete types
> leads to a lot of performance caveats.
>
> Firstly, polymorphism disables scalarization. While a non-escaped object
> can be scalarized in most cases, there are still circumstances that scalar
> replacement fails (e.g when we want to continuously slice a segment in a
> loop). Furthermore, this makes us become dependent on the inlining ability
> of the compiler, which is unpredictable and limits the use of segments and
> addresses for desired performance. On the other hand, scalarization of
> polymorphic types in fields and calling convention seems to be really
> really complicated. With primitive classes, we could make the performance
> of foreign API become much more predictable with the elimination of
> allocations as well as pointer chasings where we can and want to limit the
> kind of segment we operate on.
>
> The above caveats lead to possible usage of foreign API to pass around the
> naked addresses as long values and only construct segments where it is
> needed. This approach, while being an ugly hack, is still not ideal cause
> multiple methods may fail to be inlined.
>
> Secondly, polymorphism limits specialisation. With JEP 218, we may have
> multiple specialisations of the same methods operating on different kinds
> of segments. While it is still possible, to some extent, to have
> specialisation with a polymorphic type MemorySegment, it would likely be a
> fragile optimisation that relies on inlining and a lot of type checks.
>
> Furthermore, while having common aspects, MemorySegments expose different
> behaviours on the others. E.g. HeapMemorySegment is not Addressable,
> MappedMemorySegment has various additional specific methods. While this is
> not an argument for the design of foreign API, it is a small bonus point
> over those above.
>
> Overall, the current status of foreign API seems to put us in a position
> that relies too much on the compiler to get the desired performance.
> Exposing the concrete types would enable us to write more predictable codes
> where it needs to and flexible code (i.e using polymorphic MemotySegment,
> MemoryAddress, etc) where it is more desirable.
>
> My apologies if this question has been addressed before. Thank you very
> much.
> Quan Anh