Alternative Spliterator for SpinedBuffer

Fri Apr 12 08:10:53 PDT 2013

On 04/11/2013 09:53 PM, brian.goetz at oracle.com wrote:
> Changeset: 5e7f8797b9c1
> Author:    briangoetz
> Date:      2013-04-11 15:53 -0400
> URL:       http://hg.openjdk.java.net/lambda/lambda/jdk/rev/5e7f8797b9c1
>
> Spec pass on StreamBuilder
>
> ! src/share/classes/java/util/stream/StreamBuilder.java
> ! src/share/classes/java/util/stream/Streams.java
>
>

Hi,

I played a little with StreamBuilder and it works quite optimally using 
private SpinedBuffer under the hood. The only sub-optimality is when it 
starts small (min. 16 elements) and then grows big and after that it is 
used in a parallel computation where leaf task maximum size is more than 
16 elements. For example, this code:

public class SpinedBufferTest {

     static int dump(Spliterator<Integer> spliterator, int level, int 
iterateSize) {
System.out.print(" ".substring(0, level));
         System.out.print(spliterator.estimateSize());
         Spliterator<Integer> spl2;
         if (spliterator.estimateSize() > iterateSize && (spl2 = 
spliterator.trySplit()) != null) {
             System.out.println(": split " + spl2.estimateSize() + " + " 
+ spliterator.estimateSize());
             return dump(spl2, level + 1, iterateSize) +
                    dump(spliterator, level + 1, iterateSize);
         }
         else {
             Integer[] minMax = new Integer[2];
             spliterator.forEachRemaining(x -> {
                 minMax[0] = minMax[0] == null ? x : Math.min(minMax[0], x);
                 minMax[1] = minMax[1] == null ? x : Math.max(minMax[1], x);
             });
             System.out.println(": (" + minMax[0] + "..." + minMax[1] + 
")");
             return 1;
         }
     }

     public static void main(String[] args) {
         SpinedBuffer<Integer> buf = new SpinedBuffer<>();
         for (int i = 0; i < 16384; i++) {
             buf.accept(i);
         }
         int leafs = dump(buf.spliterator(), 0, 1024);
         System.out.println("\nTotal leafs: " + leafs);
     }
}

Prints the splits that would occur for 16k elements when max. leaf task 
size is 1k:

16384: split 16 + 16368
  16: (0...15)
  16368: split 16 + 16352
   16: (16...31)
   16352: split 32 + 16320
    32: (32...63)
    16320: split 64 + 16256
     64: (64...127)
     16256: split 128 + 16128
      128: (128...255)
      16128: split 256 + 15872
       256: (256...511)
       15872: split 512 + 15360
        512: (512...1023)
        15360: split 1024 + 14336
         1024: (1024...2047)
         14336: split 2048 + 12288
          2048: split 1024 + 1024
           1024: (2048...3071)
           1024: (3072...4095)
          12288: split 4096 + 8192
           4096: split 2048 + 2048
            2048: split 1024 + 1024
             1024: (4096...5119)
             1024: (5120...6143)
            2048: split 1024 + 1024
             1024: (6144...7167)
             1024: (7168...8191)
           8192: split 4096 + 4096
            4096: split 2048 + 2048
             2048: split 1024 + 1024
              1024: (8192...9215)
              1024: (9216...10239)
             2048: split 1024 + 1024
              1024: (10240...11263)
              1024: (11264...12287)
            4096: split 2048 + 2048
             2048: split 1024 + 1024
              1024: (12288...13311)
              1024: (13312...14335)
             2048: split 1024 + 1024
              1024: (14336...15359)
              1024: (15360...16383)

Total leafs: 22

7 leaf tasks are half the max. size or less.

Now SpinedBuffer has chunks with sizes increasing exponentially using 
power of 2:  16, 16, 32, 64, 128, 256, 512, ... Each chunk has capacity 
equal to the sum of capacities of all previous chunks. So nearly optimal 
split (modulo last non-full chunk) would be to always split so that the 
last chunk is cut-off and the returned Spliterator contains all the 
chunks up to the last and this Spliterator only contains the last chunk.

Here's my attempt at modifying the spliterator to act like that (I only 
did it for the generic Object variant in 1st try, but I can prepare the 
primitive variants too, if desired):

http://dl.dropboxusercontent.com/u/101777488/jdk8-lambda/SpinedBufferAltSpliterator/webrev.01/index.html

With this modification, the above code prints:

16384: split 8192 + 8192
  8192: split 4096 + 4096
   4096: split 2048 + 2048
    2048: split 1024 + 1024
     1024: (0...1023)
     1024: (1024...2047)
    2048: split 1024 + 1024
     1024: (2048...3071)
     1024: (3072...4095)
   4096: split 2048 + 2048
    2048: split 1024 + 1024
     1024: (4096...5119)
     1024: (5120...6143)
    2048: split 1024 + 1024
     1024: (6144...7167)
     1024: (7168...8191)
  8192: split 4096 + 4096
   4096: split 2048 + 2048
    2048: split 1024 + 1024
     1024: (8192...9215)
     1024: (9216...10239)
    2048: split 1024 + 1024
     1024: (10240...11263)
     1024: (11264...12287)
   4096: split 2048 + 2048
    2048: split 1024 + 1024
     1024: (12288...13311)
     1024: (13312...14335)
    2048: split 1024 + 1024
     1024: (14336...15359)
     1024: (15360...16383)

Total leafs: 16

Regards, Peter