Exposing concrete types of segments and addresses
Quân Anh Mai
anhmdq at gmail.com
Thu Dec 23 12:12:59 UTC 2021
Indeed, your idea makes a lot of sense, I think with the help of sealed
classes and method specialization we can achieve the same result as
JDK-8278390 <https://bugs.openjdk.java.net/browse/JDK-8278390>:
Scalarization of nullable inline types in the calling convention. This kind
of auto specialization can be applied to field stores too, though the
implications there would be much more complicated.
Regards,
Quan Anh
On Thu, 23 Dec 2021 at 18:40, Bhateja, Jatin <jatin.bhateja at intel.com>
wrote:
> > Yes, it's not possible to detect actual instance held by class reference
> > at compile time in polymorphic class hierarchy, thus allocation
> > elimination and scalarization is not possible in closed world context
> > under which compilation is performed.
>
> Apart from depending on type profiling as was mentioned which will
> eventually result into jitting multiple versions of methods
> With different predication checks under unverified entry points.
>
> Best Regards,
> Jatin
>
> > -----Original Message-----
> > From: Bhateja, Jatin
> > Sent: Thursday, December 23, 2021 4:55 PM
> > To: Quân Anh Mai <anhmdq at gmail.com>; Maurizio Cimadamore
> > <maurizio.cimadamore at oracle.com>
> > Cc: panama-dev at openjdk.java.net
> > Subject: RE: Exposing concrete types of segments and addresses
> >
> > Hi Quan Anh,
> >
> > > > >
> > > > > Currently, we can only access MemorySegments and MemoryAddresses
> > > > > through the respective interface. While this provides a nice
> > > > > interface for all kinds of memory segments, the lack of ability to
> > > > > use the concrete types leads to a lot of performance caveats.
> > > > >
> >
> > Yes, it's not possible to detect actual instance held by class reference
> > at compile time in polymorphic class hierarchy, thus allocation
> > elimination and scalarization is not possible in closed world context
> > under which compilation is performed, apart from depending on type
> profiling as was mentioned.
> >
> > >
> > > Regarding a non-optimal circumstance, I discovered an interesting case
> > > where I want to read a long value from a byte array given the read
> > > bytes might be less than 8. The benchmark is as follow:
> > >
> >
> > One probable solution which builds on your ideal of segment
> specialization
> > could be to use an approach similar to VectorAPI which does remove
> > redundant boxes of concrete Vector Types during unboxing idealizations.
> >
> > Best Regards,
> > Jatin
> >
> > > -----Original Message-----
> > > From: panama-dev <panama-dev-retn at openjdk.java.net> On Behalf Of Quân
> > > Anh Mai
> > > Sent: Tuesday, December 21, 2021 8:22 PM
> > > To: Maurizio Cimadamore <maurizio.cimadamore at oracle.com>
> > > Cc: panama-dev at openjdk.java.net
> > > Subject: Re: Exposing concrete types of segments and addresses
> > >
> > > Thank you very much for the detailed explanation, I agree that we need
> > > to be patient as adding more types to the API is easier than removing
> > those.
> > > I can imagine that later on, we can expose only HeapMemorySegment<T>,
> > > NativeMemorySegment and MappedMemorySegment if it is forced to do so.
> > >
> > > Regarding a non-optimal circumstance, I discovered an interesting case
> > > where I want to read a long value from a byte array given the read
> > > bytes might be less than 8. The benchmark is as follow:
> > >
> > > @Benchmark
> > > public long read() {
> > > int length = this.length;
> > > var segment = MemorySegment.ofArray(this.array);
> > > long result = 0;
> > > long offset = this.offset;
> > > if ((length & Byte.BYTES) != 0) {
> > > result =
> > > Byte.toUnsignedLong(segment.get(ValueLayout.JAVA_BYTE,
> > > offset));
> > > offset += Byte.BYTES;
> > > }
> > > if ((length & Short.BYTES) != 0) {
> > > result = (result << Short.SIZE) |
> > > Short.toUnsignedLong(segment.get(ValueLayout.JAVA_SHORT, offset));
> > > offset += Short.BYTES;
> > > }
> > > if ((length & Integer.BYTES) != 0) {
> > > result = (result << Integer.SIZE) |
> > > Integer.toUnsignedLong(segment.get(ValueLayout.JAVA_INT, offset));
> > > }
> > > return result;
> > > }
> > >
> > > Running with a fairly recent revision of openjdk/jdk (the difference
> > > is 12 commits as of right now, which means the running JVM contains
> > > the fix for your mentioned bug already), the generated assembly seems
> > > to be not optimal, with the segment failing to be scalarized.
> > >
> > > Regards,
> > > Quan Anh
> > >
> > > On Tue, 21 Dec 2021 at 05:29, Maurizio Cimadamore <
> > > maurizio.cimadamore at oracle.com> wrote:
> > >
> > > > Hi,
> > > > thanks for your email. This is a really tricky area, where no
> > > > optimal solution exists yet.
> > > >
> > > > First, we have recently spotted an issue with escape analysis not
> > > > working correctly with memory segments - for this I filed the issue:
> > > >
> > > > https://bugs.openjdk.java.net/browse/JDK-8278429
> > > >
> > > > Which has been closed as a duplicate of another VM bug which is
> > > > being worked on. I believe that fix should generally improve all
> > > > scenario where there is a bottleneck due to failure of scalarization
> > > > when creating new segments (e.g. slicing).
> > > >
> > > > That said, this does not address your fundamental point that, at the
> > > > end of the day, some of these optimizations depend on the ability of
> > > > C2 to inline through code (but this is also true for the ByteBuffer
> > > API).
> > > >
> > > > The ultimate solution would be IMHO to make memory segment _less_
> > > > polymorphic, by having a single implementation class which then
> > > > delegates its memory access behavior to a secondary abstraction
> > > > (which could be a constant, based on the access type: on-heap, or
> off-
> > heap).
> > > >
> > > > If we did that, a memory segment would become a dumb wrapper around
> > > > a base object, a length and some (constant) access object helper.
> > > >
> > > > Unfortunately this solution (which we have tried) doesn't work -
> > > > because Unsafe memory access needs to know whether access is going
> > > > to be on- or off-heap (in order to remove important memory barriers).
> > > > Currently this is done with the help of type profiling: if we are
> > > > accessing memory on a type that C2 can prove to be
> > > > "NativeMemorySegmentImpl", then C2 also knows that access is going
> > > > to be
> > > off-heap - and unsafe access is fast.
> > > > To have profiling working correctly we need one concrete segment
> > > > type for each possible access type (native, mapped, and one for each
> > > > primitive on-heap array). But if there's only one concrete type,
> > > > there's no type profiling to go on, so we gain monomorphism, but we
> > > > lose (very
> > > > badly) when it comes to profile pollution exposure. To fix this, we
> > > > need better ways to do type profiling (based not only on
> > > > receiver/parameter types, but maybe the type of some fields in an
> > > instance).
> > > >
> > > > Now, in the current implementation we can hide the polymorphism,
> > > > pretty much like ByteBuffer does, under a common interface. Exposing
> > > > concrete types as you suggest is going to be painful - as users will
> > > > see another
> > > > 9 more segment types (7 primitive arrays, + mapped + native), which
> > > > would increase the size of the API quite considerably. Maybe some
> > > > intermediate point might also be useful to consider (e.g. perhaps
> > > > only two types - for native segments and heap segments, but do not
> > > > differentiate between mapped/native or between byte[] and long[] in
> > > > the public API). But we need to conder any such moves very carefully:
> > > > while we can add these types very easily in the future, if it proves
> > > > to be the only possible path (e.g. even after Valhalla) in order to
> > > > use memory segments sanely, the reverse is not true: if we add these
> > > > new types now, and later on we discover these new types to be
> > > > superseded by some new VM optimization, or better support thanks to
> > > > Valhalla, we'd be stuck with these types for a long time.
> > > >
> > > > I think at this point in time we'd like to know where the
> > > > performance potholes are - so if you happen to have a benchmark
> > > > which shows the problem you discussed, we'd be very happy to take a
> > > > look. Our experience so far seems to suggest that performance is
> > > > acceptable - even in cases where segments are created in very hot
> > > > paths (we do have a spliterator test which indundates the system
> > > > with slices - and that doesn't seem to perform too bad). At the same
> > > > time, I can believe you when you say that some of the optimizations
> > > > we might rely upon are fraglie (I've been there when using the API
> > > > on my own, so the mileage of certain idioms can vary).
> > > >
> > > > Unfortunately this is a bigger problem IMHO than just MemorySegments:
> > > > currently writing immutable APIs in Java can lead to spotty
> > performance.
> > > > The hope is that Valhalla will give us tools to help us manage that
> > > > kind of complexity - but even then, some of the optimizations (e.g.
> > > > scalarization) might be gated by excessive polymorphism and/or lack
> > > > of inlining. If we can improve the VM enough to do the type
> > > > profiling we need to keep unsafe access sharp even in the face of a
> > "monomorphic"
> > > > implementation, then I believe the current API could take advantage
> > > > of Valhalla in a more straightfoward fashion (and we could, in the
> > > > future, add Valhalla optimizations to special case treatment for
> > > > sealed interfaces whose only implementation is a primitive class).
> > > >
> > > > [Btw, this discussion is really about MemorySegment - for
> > > > MemoryAddress, in my own experiments I could already see Valhalla
> > > > making quick work of all the address instantiations - as
> > > > MemoryAddressImpl is the only implementation of MemoryAddress].
> > > >
> > > > Maurizio
> > > >
> > > >
> > > > On 20/12/2021 07:21, Quân Anh Mai wrote:
> > > > > Hi,
> > > > >
> > > > > Currently, we can only access MemorySegments and MemoryAddresses
> > > > > through the respective interface. While this provides a nice
> > > > > interface for all kinds of memory segments, the lack of ability to
> > > > > use the concrete types leads to a lot of performance caveats.
> > > > >
> > > > > Firstly, polymorphism disables scalarization. While a non-escaped
> > > > > object can be scalarized in most cases, there are still
> > > > > circumstances that
> > > > scalar
> > > > > replacement fails (e.g when we want to continuously slice a
> > > > > segment in a loop). Furthermore, this makes us become dependent on
> > > > > the inlining
> > > > ability
> > > > > of the compiler, which is unpredictable and limits the use of
> > > > > segments
> > > > and
> > > > > addresses for desired performance. On the other hand,
> > > > > scalarization of polymorphic types in fields and calling
> > > > > convention seems to be really really complicated. With primitive
> > > > > classes, we could make the performance of foreign API become much
> > > > > more predictable with the elimination of allocations as well as
> > > > > pointer chasings where we can and want to limit
> > > > the
> > > > > kind of segment we operate on.
> > > > >
> > > > > The above caveats lead to possible usage of foreign API to pass
> > > > > around
> > > > the
> > > > > naked addresses as long values and only construct segments where
> > > > > it is needed. This approach, while being an ugly hack, is still
> > > > > not ideal cause multiple methods may fail to be inlined.
> > > > >
> > > > > Secondly, polymorphism limits specialisation. With JEP 218, we may
> > > > > have multiple specialisations of the same methods operating on
> > > > > different kinds of segments. While it is still possible, to some
> > > > > extent, to have specialisation with a polymorphic type
> > > > > MemorySegment, it would likely be
> > > > a
> > > > > fragile optimisation that relies on inlining and a lot of type
> > checks.
> > > > >
> > > > > Furthermore, while having common aspects, MemorySegments expose
> > > > > different behaviours on the others. E.g. HeapMemorySegment is not
> > > > > Addressable, MappedMemorySegment has various additional specific
> > > > > methods. While this
> > > > is
> > > > > not an argument for the design of foreign API, it is a small bonus
> > > > > point over those above.
> > > > >
> > > > > Overall, the current status of foreign API seems to put us in a
> > > > > position that relies too much on the compiler to get the desired
> > > performance.
> > > > > Exposing the concrete types would enable us to write more
> > > > > predictable
> > > > codes
> > > > > where it needs to and flexible code (i.e using polymorphic
> > > > > MemotySegment, MemoryAddress, etc) where it is more desirable.
> > > > >
> > > > > My apologies if this question has been addressed before. Thank you
> > > > > very much.
> > > > > Quan Anh
> > > >
>
More information about the panama-dev
mailing list