VarHandle optimization

Thu Oct 31 18:45:38 UTC 2024

New subject... (was Unsafe vs MemorySegments / Bounds checking...)

One thing that surprises me is that VarHandles don't have some sort of 
"compile" step which can reduce to a smaller form. As far as I can tell, 
before JDK 22, VarHandles didn't support any kind of transforms, and so 
they were always tiny and inlinable.

Whether I act upon a VarHandle directly or indirectly (via a 
MemorySegment method), the full chain of transforms gets expanded and 
processed again and again at each call site. My assumption was that 
there would be some intermediate optimized form which could be re-used.

Is it even possible to support some sort of compile/optimize/reduce step 
with VarHandles? Manually or automatically?

On 2024-10-30 03:09 PM, Maurizio Cimadamore wrote:
> On 30/10/2024 21:49, Maurizio Cimadamore wrote:
> 
>> What I would like to do in the meantime is (when I have some cycles) 
>> to write a benchmark which tests memory segment in a loop with random 
>> access (e.g. using random offsets). And then evaluate the costs 
>> associated with the various approaches in isolation and see if 
>> anything pops up. I have a couple of ideas on how to possibly improve 
>> the story for the unbounded MAX_VALUE access -- but I'd like to see 
>> where we are w.r.t. random access in vanilla Java 24 first. 
> 
> I find this email from a year ago or so interesting:
> 
> https://mail.openjdk.org/pipermail/panama-dev/2023-July/019478.html
> 
> Some kind of issue /was indeed/ identified in there. Surely, adding an 
> extra |< 0| check in all memory segment accessors shouldn’t do much, but 
> the numbers suggested that the patched code was almost 2x as fast 
> (although not as fast as Unsafe).
> 
> Then I had the other idea to adapt var handles (the one you are 
> currently using):
> 
> https://mail.openjdk.org/pipermail/panama-dev/2023-July/019487.html
> 
> Which, in retrospect, was perhaps going too far. Sure, the numbers in 
> the synthetic benchmark looked very very good… but such an approach does 
> add problems when it comes to the shape and simplicity of the generated 
> code. There’s an allocation in every hot path, and a call to reinterpret 
> (which is restricted, so there’s a check for that too). All that stuff 
> is not normally on the critical path for memory access, but the trick of 
> putting it inside the var handle adaptation code makes it part of the 
> hot path, which I think solves some problems (it’s fast!) and creates 
> some new ones (it’s brittle!).
> 
> Hence my suggestion to go back a little, and see what we can do to speed 
> up access for a segment created with:
> 
> |MemorySegment.NULL.reinterpret(Long.MAX_VALUE) |
> 
> (which, as Ron correctly points out, might not mean /exactly as fast as 
> Unsafe/)
> 
> Maurizio
> 
>