[foreign-memaccess] RFR 8231402: layout API implementation is not constant enough

Tue Sep 24 14:06:04 UTC 2019

Hi,
as the subject says, the implementation classes of the layout API do not 
always store their properties into final fields, and they resort to lazy 
computation, etc. This negatively impacts C2 scrutability of same data 
structures.

This patch fixes this situation, by changing size/alignment to be final 
fields in AbstractLayout - so that they will have to be provided before 
hand. I've added, for clarity, and extra 'default' constructor to all 
layout implementation classes which allows to create a layout with 
standard alignment and empty name.

There are also few minor changes:

* I've tweaked VM to also trust final fields in the layout package

* I've rearranged some some scope classes so that their creation is less 
straightforward, more transparent and requires less reflective checks. 
This is particularly evident in HeapScope and BufferScope. Note that I 
also changed the public API of MemorySegment::ofArray and replaced that 
with multiple overloads (one per primitive array). This is good because 
it makes the code more 'static' and also because it removes the 
possibility for the user to pass in a wrong array type.

* I've re-ordered the way in which scope vs. segment is created - that 
is, instead of this

new XYZSegment(.., ..., ..., new XYZScope(...))

We now do this:

XYZScope scope = new XYZScope(...);
new XYZSegment(.., ..., ..., scope)

As this makes a difference for C2 (Vlad pointed this out).

Webrev:

http://cr.openjdk.java.net/~mcimadamore/panama/8231402_v2/

With this patch, the level of performances of the memory access API is 
virtually on par with Unsafe in our suite of synthetic benchmarks, at 
least when using the Graal compiler (*). With C2 there are still issues 
which have to do mainly with (i) escape analysis not being aggressive 
enough (a VM patch is required) and (ii) inlining not working well in 
relatively 'cold' code (e.g. segment creation/closure), so that some 
manual sprinkling of @ForceInline annotations is  required. I will 
pursue these in a follow up patch.

Maurizio

(*) the only exception to this is a test which performs indexed access, 
in which Graal compiler is not able to vectorize the loop when using the 
memory access API (because of the presence of address operation on 
longs); that said, performance of code compiled by the Graal compiler 
with the memory access API is still superior than that of C2 using 
unsafe (I'm also following up with the Graal compiler team on this 
issue). There also seems to be an issue with the liveness check which is 
never hoisted out of hot loops - leading to slightly slower 
performances; in C2 we have a similar issue and it's caused by the fact 
that the VM puts a memory barrier after Unsafe memory access calls 
(since the Unsafe call could touch the loop invariant itself!). We 
obviously need to relax some of these checks if the Unsafe call occurs 
from within the memory access API (which does not allow arbitrary 
read/write of Java fields).