[lworld] RFR: Prototype inline cursors for List

Sergey Kuksenko sergey.kuksenko at oracle.com
Thu Mar 26 20:37:20 UTC 2020


It won't work for ArrayList.

I told about that on Valhalla meeting (half year ago) and showed it's in 
my presentation about Valhalla performance 
(http://cr.openjdk.java.net/~skuksenko/valhalla/talk/vp.pdf, slides 68-75).

There is no difference cost between memory access to stack and heap. You 
won't get performance benefits on any range checks. Iterator/Cursor 
memory allocation itself is very cheap.

The only place where it may have significant effect (and it has) is GC 
write barriers.  Simple not-tuned Cursor for HashMap give +40% boost. If 
we want to see performance difference for Cursor we need an Iterator 
where Iterator's state contains references and these references should 
be updated on each "next()" invocation. ArrayList needs only update int 
field - no GC barriers.

My raw experiments were published in last August 
http://cr.openjdk.java.net/~skuksenko/valhalla/playground/.

It's still not enough to observe performance improvement on 
microbenchmarks. HotSpot is really good at 
Iterators-inside-microbenchmarks scalarization. Iterator should be 
forced DO NOT to scalarize the iterator. The simplest way to do it:

     @CompilerControl(CompilerControl.Mode.DONT_INLINE)
     private static Iterator<Integer> hide(Iterator<Integer> it) {
         return it;
     }
         
  @Benchmark
     public int sumIteratorHidden() {
         int s = 0;
         for (Iterator<Integer> iterator = hide(map.keyIterator()); iterator.hasNext(); ) {
             s += iterator.next();
         }
         return s;
     }


Minor note: Don't run it with ZGC. ZGC has no write barriers. Default G1 
is the best option due to highest cost of write barriers. ParallelGC 
write barriers has much less cost than G1 and Cursor still shows 
performance boost on ParallelGC, but much less than on G1.


On 3/26/20 10:45 AM, Brian Goetz wrote:
> These don't look so good!  I suspect that the Iterator is performing 
> so well because it's getting good EA, but not sure why cursors aren't 
> doing better?  Guess we'll have to dig into the generated code....
>
> On 3/26/2020 1:14 PM, Roger Riggs wrote:
>> Implementation of Cursors and jmh tests comparing
>> typical List traversal via direct index, iterator,
>> inline cursor, and an iterator implemented on top of cursor.
>>
>> Sample results:
>>
>>                                     (size)  Mode  Cnt Score       
>> Error  Units
>> XArrayListCursorTest.getViaArray            100000  avgt    5 
>> 507793.484  7086.038  ns/op
>> XArrayListCursorTest.getViaCursorForLoop    100000  avgt    5 
>> 656461.958  52488.547  ns/op
>> XArrayListCursorTest.getViaCursorWhileLoop  100000  avgt    5 
>> 641963.323  32219.409  ns/op
>> XArrayListCursorTest.getViaIterator         100000  avgt    5 
>> 558863.817  23539.256  ns/op
>> XArrayListCursorTest.getViaIteratorCurs     100000  avgt    5 
>> 733161.466  33721.881  ns/op
>>
>> -------------
>>
>> Commit messages:
>>   - Prototype inline cursors for List
>>
>> Changes: https://git.openjdk.java.net/valhalla/pull/5/files
>>   Webrev: https://webrevs.openjdk.java.net/valhalla/5/webrev.00
>>    Stats: 2139 lines in 3 files changed: 2139 ins; 0 del; 0 mod
>>    Patch: https://git.openjdk.java.net/valhalla/pull/5.diff
>>    Fetch: git fetch https://git.openjdk.java.net/valhalla 
>> pull/5/head:pull/5
>>
>> PR: https://git.openjdk.java.net/valhalla/pull/5
>


More information about the valhalla-dev mailing list