[foreign-memaccess+abi] RFR: 8270376: Finalize API for memory copy

Wed Jul 14 00:26:12 UTC 2021

On Jul 13, 2021, at 2:34 PM, Maurizio Cimadamore <mcimadamore at openjdk.java.net<mailto:mcimadamore at openjdk.java.net>> wrote:

Thanks for the comments John - I agree pretty much with all the points you raise, so I think it's worth spending some more cycles to try and get this right. I do think that there is an useful distinction between primitive copy methods, and sugary ones, and we should try to make that distinction clearer in the API.

You are welcome.  Thanks for letting me throw in my $0.02 in this important work.

One thing I wasn't too sure about was if having a single entry point taking Object would create havoc - as the code will have to access array length reflectively, and all that - and I can guess it can get bad under profile pollution? This was the main reason as to why, few years ago, we went with the `MemorySegment::ofArray` API with all the overloads (instead of having only one taking an object) - because we had inlining issues with the monomorphic version. Maybe it's time to try again - or are there "tricks" that I overlooked?

One trick:

Make the primitive @ForceInline.  Then the type of the array (or Class mirror) will be exact, and the JIT won’t need to look at the profile (whether polluted or not).

Second trick:

Make the primitive take an explicit length value as its argument.  Take a starting index too, for that matter.  The sugar method passes  (0, a.length) or something like that.  Then there’s no Arrays.getLength call.  (But the first trick should fix that in any case.  Reflection is cheap when the JIT knows exact types.)

I'm assuming the code would need to access at least:

* array length (can be done with Reflection.getLength which is an intrinsic)
* unsafe base/stride (can be done with Unsafe::arrayBaseOffset/arrayIndexScale)

Getting the base/stride should be cheap after the first trick is played.  If there is a further problem with it, we can fix by making arrayBaseOffset etc. be an intrinsic.

Or if we fix the higher-leverage bug https://bugs.openjdk.java.net/browse/JDK-8238260 then we could use a ClassValue, and expect the JIT to constant-fold it.  (That will be a good trick for all sorts of ad hoc var-handles, I think.)

In order to perform the bulk copy call. Do you foresee performance issues if we go down that path? If not, does that mean that we should retest our assumption when it comes to MemorySegment::ofArray as well?

Without new intrinsics or ClassValue optimization, I think we can get good performance in all these cases by leaning heavily on @ForceInline and on treating the `a.getClass()` value or the `atype` value (as the case may be) as a constant.

To make things easier, we might build a single intrinsic (buried in Unsafe) which produces 64-bits of steering data for an array type.  The steering data can start with the base offset and the element size, and can be extended later, perhaps, to encode alignment or other conditions.  If the JIT can constant fold this (or compute it rapidly by some other means) then we can take it apart as bitfields and use them (as constants or cheaply computed values loaded from the `_klass` field).    Or, since it’s only two ints, and we already know how to make intrinsics like that, just add two more cases to `LibraryCallKit::inline_native_Class_query`.

— John