[foreign-memaccess] RFR 8231402: layout API implementation is not constant enough
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Sep 24 14:06:04 UTC 2019
Hi,
as the subject says, the implementation classes of the layout API do not
always store their properties into final fields, and they resort to lazy
computation, etc. This negatively impacts C2 scrutability of same data
structures.
This patch fixes this situation, by changing size/alignment to be final
fields in AbstractLayout - so that they will have to be provided before
hand. I've added, for clarity, and extra 'default' constructor to all
layout implementation classes which allows to create a layout with
standard alignment and empty name.
There are also few minor changes:
* I've tweaked VM to also trust final fields in the layout package
* I've rearranged some some scope classes so that their creation is less
straightforward, more transparent and requires less reflective checks.
This is particularly evident in HeapScope and BufferScope. Note that I
also changed the public API of MemorySegment::ofArray and replaced that
with multiple overloads (one per primitive array). This is good because
it makes the code more 'static' and also because it removes the
possibility for the user to pass in a wrong array type.
* I've re-ordered the way in which scope vs. segment is created - that
is, instead of this
new XYZSegment(.., ..., ..., new XYZScope(...))
We now do this:
XYZScope scope = new XYZScope(...);
new XYZSegment(.., ..., ..., scope)
As this makes a difference for C2 (Vlad pointed this out).
Webrev:
http://cr.openjdk.java.net/~mcimadamore/panama/8231402_v2/
With this patch, the level of performances of the memory access API is
virtually on par with Unsafe in our suite of synthetic benchmarks, at
least when using the Graal compiler (*). With C2 there are still issues
which have to do mainly with (i) escape analysis not being aggressive
enough (a VM patch is required) and (ii) inlining not working well in
relatively 'cold' code (e.g. segment creation/closure), so that some
manual sprinkling of @ForceInline annotations is required. I will
pursue these in a follow up patch.
Maurizio
(*) the only exception to this is a test which performs indexed access,
in which Graal compiler is not able to vectorize the loop when using the
memory access API (because of the presence of address operation on
longs); that said, performance of code compiled by the Graal compiler
with the memory access API is still superior than that of C2 using
unsafe (I'm also following up with the Graal compiler team on this
issue). There also seems to be an issue with the liveness check which is
never hoisted out of hot loops - leading to slightly slower
performances; in C2 we have a similar issue and it's caused by the fact
that the VM puts a memory barrier after Unsafe memory access calls
(since the Unsafe call could touch the loop invariant itself!). We
obviously need to relax some of these checks if the Unsafe call occurs
from within the memory access API (which does not allow arbitrary
read/write of Java fields).
More information about the panama-dev
mailing list