[foreign-memaccess] RFR 8231402: layout API implementation is not constant enough

Wed Sep 25 11:53:21 UTC 2019

Hi,

Patch looks good!

Some minor comments:

MemorySegment.java
   - Minor typo @58: "using the one of the provided factory methods"
   - You removed the SecurityManager check from ofByteBuffer. Isn't it 
needed?

Cheers,
Jorn

On 24/09/2019 16:06, Maurizio Cimadamore wrote:
> Hi,
> as the subject says, the implementation classes of the layout API do 
> not always store their properties into final fields, and they resort 
> to lazy computation, etc. This negatively impacts C2 scrutability of 
> same data structures.
>
> This patch fixes this situation, by changing size/alignment to be 
> final fields in AbstractLayout - so that they will have to be provided 
> before hand. I've added, for clarity, and extra 'default' constructor 
> to all layout implementation classes which allows to create a layout 
> with standard alignment and empty name.
>
> There are also few minor changes:
>
> * I've tweaked VM to also trust final fields in the layout package
>
> * I've rearranged some some scope classes so that their creation is 
> less straightforward, more transparent and requires less reflective 
> checks. This is particularly evident in HeapScope and BufferScope. 
> Note that I also changed the public API of MemorySegment::ofArray and 
> replaced that with multiple overloads (one per primitive array). This 
> is good because it makes the code more 'static' and also because it 
> removes the possibility for the user to pass in a wrong array type.
>
> * I've re-ordered the way in which scope vs. segment is created - that 
> is, instead of this
>
> new XYZSegment(.., ..., ..., new XYZScope(...))
>
> We now do this:
>
> XYZScope scope = new XYZScope(...);
> new XYZSegment(.., ..., ..., scope)
>
> As this makes a difference for C2 (Vlad pointed this out).
>
> Webrev:
>
> http://cr.openjdk.java.net/~mcimadamore/panama/8231402_v2/
>
>
> With this patch, the level of performances of the memory access API is 
> virtually on par with Unsafe in our suite of synthetic benchmarks, at 
> least when using the Graal compiler (*). With C2 there are still 
> issues which have to do mainly with (i) escape analysis not being 
> aggressive enough (a VM patch is required) and (ii) inlining not 
> working well in relatively 'cold' code (e.g. segment 
> creation/closure), so that some manual sprinkling of @ForceInline 
> annotations is  required. I will pursue these in a follow up patch.
>
> Maurizio
>
> (*) the only exception to this is a test which performs indexed 
> access, in which Graal compiler is not able to vectorize the loop when 
> using the memory access API (because of the presence of address 
> operation on longs); that said, performance of code compiled by the 
> Graal compiler with the memory access API is still superior than that 
> of C2 using unsafe (I'm also following up with the Graal compiler team 
> on this issue). There also seems to be an issue with the liveness 
> check which is never hoisted out of hot loops - leading to slightly 
> slower performances; in C2 we have a similar issue and it's caused by 
> the fact that the VM puts a memory barrier after Unsafe memory access 
> calls (since the Unsafe call could touch the loop invariant itself!). 
> We obviously need to relax some of these checks if the Unsafe call 
> occurs from within the memory access API (which does not allow 
> arbitrary read/write of Java fields).
>
>
>