RFR: jsr166 jdk9 integration wave 12

Paul Sandoz Paul.Sandoz at oracle.com
Thu Nov 17 20:03:00 UTC 2016


> On 16 Nov 2016, at 16:03, Martin Buchholz <martinrb at google.com> wrote:
> 
> Apologies for having this wave turn into a tsunami.
> 
> CopyOnWriteArrayList was rewritten due to finding https://bugs.openjdk.java.net/browse/JDK-8169738
> ArrayDeque and ArrayBlockingQueue now both use efficient nested loop idiom for all traversals.
> Most bulk remove methods in all collection classes were rewritten for efficiency, using similar code.
> Lots of tests and benchmarks added.

Ok, i will just go through all of it.


Collection-optimization
—

Perhaps consolidate the repeated mini bit set methods as package private static methods on AbstractCollection? Unclear where the j.u.concurrent ones should go though...

It’s purely a subjective thing but i would be inclined to change the variable name “deathRow" to something more neutral such as “removedBitSet”.

The 2 layer nested loop is quite clever, certainly twists the mind when first encountered. It’s arguably more readable if there are two separate, (not-nested) loops, but that also requires two explicit invocations of the action, which may perturb the performance characteristics.


ArrayDeque
—

 188     public ArrayDeque(int numElements) {
 189         elements = new Object[Math.max(1, numElements + 1)];
 190     }

What if Integer.MAX_VALUE is passed?


 202     public ArrayDeque(Collection<? extends E> c) {
 203         elements = new Object[c.size() + 1];
 204         addAll(c);
 205     }

What if c.size() returns Integer.MAX_VALUE?


 226      * Adds i and j, mod modulus.
 227      * Precondition and postcondition: 0 <= i < modulus, 0 <= j <= modulus.
 228      */
 229     static final int add(int i, int j, int modulus) {

Is that upper bound correct for j? 0 <= j < modulus ?


 317         c.forEach(e -> addLast(e));

this::addLast, up to you which you prefer


 843         public boolean tryAdvance(Consumer<? super E> action) {
 844             if (action == null)
 845                 throw new NullPointerException();
 846             int t, i;
 847             if ((t = fence) < 0) t = getFence();

Is that for optimisation purposes, since the same check is also performed in getFence? If so that seems like overkill


 848             if (t == (i = cursor))
 849                 return false;
 850             final Object[] es;
 851             action.accept(nonNullElementAt(es = elements, i));
 852             cursor = inc(i, es.length);

Recommend updating cursor before the action


 853             return true;
 854         }



Collection8Test
—

 429     public void testRemoveAfterForEachRemaining() {

Are tests run by default with testImplementationDetails == true? I am guessing so.


Semaphore
—

 633     /**
 634      * Acquires and returns all permits that are immediately available.
 635      * Upon return, zero permits are available.
 636      *
 637      * @return the number of permits acquired
 638      */
 639     public int drainPermits() {
 640         return sync.drainPermits();
 641     }

Arguably, if positive acquires all permits, otherwise releases all permits. Perhaps:

  Acquires and returns all permits that are immediately available, otherwise releases all permits (if negative).
  Upton return, zero permits are available.

  @return the number of permits acquired, or the number of permits release (a negative value)

Probably requires a CCC which i can manage.



SplittableRandom
—

 533     /**
 534      * Generates a pseudorandom number with the indicated number of
 535      * low-order bits.  Because this class has no subclasses, this
 536      * method cannot be invoked or overridden.
 537      *
 538      * @param  bits random bits
 539      * @return the next pseudorandom value from this random number
 540      *         generator's sequence
 541      */
 542     protected int next(int bits) {
 543         return nextInt() >>> (32 - bits);
 544     }
 545

Unless i am missing something really subtle, I don’t think this is required since SplittableRandom does not extend from Random (unlike in ThreadLocalRandom), and no associated reflective test is required.


ForkJoin
—

  69  * tasks that are never joined. All worker threads are initialized
  70  * with {@link Thread#isDaemon} set {@code true}.

CCC?


Thanks,
Paul.



> 
> 
> On Wed, Nov 16, 2016 at 3:34 PM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> Hi Martin,
> 
> Glad you looked more closely at the perf aspects of ArrayDeque.
> 
> I am struggling to determine what has changed since the last revision. Apart from ArrayDeque what files should i be looking at?
> 
> Thanks,
> Paul.
> 
> > On 16 Nov 2016, at 12:14, Martin Buchholz <martinrb at google.com> wrote:
> >
> > Thanks to Vitaly and Doug for being more concerned about offer/poll
> > performance than I was initially.  ArrayDeque is now back to using the head
> > + tail + spare slot strategy, which keeps optimal performance for non-bulk
> > operations, while everything else gets faster.  We have new tests and new
> > benchmarks, as a result of which new bugs were found and fixed, and this
> > integration wave has grown rather larger than originally intended.
> >
> > The JIT-friendly nested loop strategy was successful for making bulk
> > operations on circular arrays faster, and this is used for both ArrayDeque
> > and ArrayBlockingQueue.
> >
> > +    /*
> > +     * VMs excel at optimizing simple array loops where indices are
> > +     * incrementing or decrementing over a valid slice, e.g.
> > +     *
> > +     * for (int i = start; i < end; i++) ... elements[i]
> > +     *
> > +     * Because in a circular array, elements are in general stored in
> > +     * two disjoint such slices, we help the VM by writing unusual
> > +     * nested loops for all traversals over the elements.
> > +     */
> >
> >
> > Bulk removal operations on array-based collections retreat to a two-pass
> > algorithm, which is slightly slower (but still O(n)), but continues to
> > support (questionable) predicates that reentrantly access the collection in
> > read mode.
> >
> > http://cr.openjdk.java.net/~martin/webrevs/openjdk9/jsr166-jdk9-integration/
> >
> > On Fri, Oct 21, 2016 at 12:32 PM, Martin Buchholz <martinrb at google.com>
> > wrote:
> >
> >>
> >>
> >> On Tue, Oct 18, 2016 at 4:18 PM, Martin Buchholz <martinrb at google.com>
> >> wrote:
> >>
> >>>
> >>>
> >>> On Tue, Oct 18, 2016 at 4:00 PM, Vitaly Davidovich <vitalyd at gmail.com>
> >>> wrote:
> >>>
> >>>>
> >>>>>> * Change in allocation/capacity policy.
> >>>>>>
> >>>>>> The removal of the power-of-two restriction, and applying a 1.5x
> >>>>> growth
> >>>>>> factor (same as ArrayList) seems sensible. Does this mean that the
> >>>>> ability
> >>>>>> to compute the proper array index by using x & (length-1) wasn't
> >>>>> worth it?
> >>>>>> Or at least not worth the extra tail wastage?
> >>>>>>
> >>>>>
> >>>>> There's no integer division to optimize, as with hash tables.
> >>>>
> >>>> But it does add some new branches, doesn't it? Potentially branches that
> >>>> aren't taken prior to JIT compilation, but taken later = deopt.
> >>>>
> >>>
> >>> If it's a smidgeon slower, I don't care that much; improvement in
> >>> flexibility and scalability are more important.
> >>>
> >>> Of course, I do care about algorithmic improvements to e.g. removeIf
> >>>
> >>>
> >>>> Curious if you ran some benchmarks on ArrayDeque.
> >>>>
> >>>
> >>> Not yet.  Who would like to show how slow the code is?
> >>>
> >>
> >> I revived my ancient IteratorMicroBenchmark for the latest webrev, made it
> >> a proper jtreg test, added ArrayDeque Jobs, and verified that the new code
> >> is measurably faster than the old code, at least for the mundane job of
> >> iterating.
> >> http://cr.openjdk.java.net/~martin/webrevs/openjdk9/
> >> jsr166-jdk9-integration/ArrayList/
> >>
> >>
> 
> 



More information about the core-libs-dev mailing list