RFR: jsr166 jdk9 integration wave 12

Wed Nov 16 20:14:41 UTC 2016

Thanks to Vitaly and Doug for being more concerned about offer/poll
performance than I was initially.  ArrayDeque is now back to using the head
+ tail + spare slot strategy, which keeps optimal performance for non-bulk
operations, while everything else gets faster.  We have new tests and new
benchmarks, as a result of which new bugs were found and fixed, and this
integration wave has grown rather larger than originally intended.

The JIT-friendly nested loop strategy was successful for making bulk
operations on circular arrays faster, and this is used for both ArrayDeque
and ArrayBlockingQueue.

+    /*
+     * VMs excel at optimizing simple array loops where indices are
+     * incrementing or decrementing over a valid slice, e.g.
+     *
+     * for (int i = start; i < end; i++) ... elements[i]
+     *
+     * Because in a circular array, elements are in general stored in
+     * two disjoint such slices, we help the VM by writing unusual
+     * nested loops for all traversals over the elements.
+     */

Bulk removal operations on array-based collections retreat to a two-pass
algorithm, which is slightly slower (but still O(n)), but continues to
support (questionable) predicates that reentrantly access the collection in
read mode.

http://cr.openjdk.java.net/~martin/webrevs/openjdk9/jsr166-jdk9-integration/

On Fri, Oct 21, 2016 at 12:32 PM, Martin Buchholz <martinrb at google.com>
wrote:

>
>
> On Tue, Oct 18, 2016 at 4:18 PM, Martin Buchholz <martinrb at google.com>
> wrote:
>
>>
>>
>> On Tue, Oct 18, 2016 at 4:00 PM, Vitaly Davidovich <vitalyd at gmail.com>
>> wrote:
>>
>>>
>>>> > * Change in allocation/capacity policy.
>>>> >
>>>> > The removal of the power-of-two restriction, and applying a 1.5x
>>>> growth
>>>> > factor (same as ArrayList) seems sensible. Does this mean that the
>>>> ability
>>>> > to compute the proper array index by using x & (length-1) wasn't
>>>> worth it?
>>>> > Or at least not worth the extra tail wastage?
>>>> >
>>>>
>>>> There's no integer division to optimize, as with hash tables.
>>>
>>> But it does add some new branches, doesn't it? Potentially branches that
>>> aren't taken prior to JIT compilation, but taken later = deopt.
>>>
>>
>> If it's a smidgeon slower, I don't care that much; improvement in
>> flexibility and scalability are more important.
>>
>> Of course, I do care about algorithmic improvements to e.g. removeIf
>>
>>
>>> Curious if you ran some benchmarks on ArrayDeque.
>>>
>>
>> Not yet.  Who would like to show how slow the code is?
>>
>
> I revived my ancient IteratorMicroBenchmark for the latest webrev, made it
> a proper jtreg test, added ArrayDeque Jobs, and verified that the new code
> is measurably faster than the old code, at least for the mundane job of
> iterating.
> http://cr.openjdk.java.net/~martin/webrevs/openjdk9/
> jsr166-jdk9-integration/ArrayList/
>
>