Alternative Spliterator for SpinedBuffer
Brian Goetz
brian.goetz at oracle.com
Fri Apr 12 10:10:48 PDT 2013
Thanks for taking a look at this code! Your approach does look like it
would yield a better decomposition -- and this is a really good time to
be looking at this, as the APIs have mostly settled.
I agree that there are improvements to be made here. Have your
performance measurements as improved runtime as well as reduced task
count?
On 4/12/2013 11:10 AM, Peter Levart wrote:
> On 04/11/2013 09:53 PM, brian.goetz at oracle.com wrote:
>> Changeset: 5e7f8797b9c1
>> Author: briangoetz
>> Date: 2013-04-11 15:53 -0400
>> URL: http://hg.openjdk.java.net/lambda/lambda/jdk/rev/5e7f8797b9c1
>>
>> Spec pass on StreamBuilder
>>
>> ! src/share/classes/java/util/stream/StreamBuilder.java
>> ! src/share/classes/java/util/stream/Streams.java
>>
>>
>
> Hi,
>
> I played a little with StreamBuilder and it works quite optimally using
> private SpinedBuffer under the hood. The only sub-optimality is when it
> starts small (min. 16 elements) and then grows big and after that it is
> used in a parallel computation where leaf task maximum size is more than
> 16 elements. For example, this code:
>
>
> public class SpinedBufferTest {
>
> static int dump(Spliterator<Integer> spliterator, int level, int
> iterateSize) {
> System.out.print(" ".substring(0, level));
> System.out.print(spliterator.estimateSize());
> Spliterator<Integer> spl2;
> if (spliterator.estimateSize() > iterateSize && (spl2 =
> spliterator.trySplit()) != null) {
> System.out.println(": split " + spl2.estimateSize() + " + "
> + spliterator.estimateSize());
> return dump(spl2, level + 1, iterateSize) +
> dump(spliterator, level + 1, iterateSize);
> }
> else {
> Integer[] minMax = new Integer[2];
> spliterator.forEachRemaining(x -> {
> minMax[0] = minMax[0] == null ? x : Math.min(minMax[0],
> x);
> minMax[1] = minMax[1] == null ? x : Math.max(minMax[1],
> x);
> });
> System.out.println(": (" + minMax[0] + "..." + minMax[1] +
> ")");
> return 1;
> }
> }
>
> public static void main(String[] args) {
> SpinedBuffer<Integer> buf = new SpinedBuffer<>();
> for (int i = 0; i < 16384; i++) {
> buf.accept(i);
> }
> int leafs = dump(buf.spliterator(), 0, 1024);
> System.out.println("\nTotal leafs: " + leafs);
> }
> }
>
>
> Prints the splits that would occur for 16k elements when max. leaf task
> size is 1k:
>
>
> 16384: split 16 + 16368
> 16: (0...15)
> 16368: split 16 + 16352
> 16: (16...31)
> 16352: split 32 + 16320
> 32: (32...63)
> 16320: split 64 + 16256
> 64: (64...127)
> 16256: split 128 + 16128
> 128: (128...255)
> 16128: split 256 + 15872
> 256: (256...511)
> 15872: split 512 + 15360
> 512: (512...1023)
> 15360: split 1024 + 14336
> 1024: (1024...2047)
> 14336: split 2048 + 12288
> 2048: split 1024 + 1024
> 1024: (2048...3071)
> 1024: (3072...4095)
> 12288: split 4096 + 8192
> 4096: split 2048 + 2048
> 2048: split 1024 + 1024
> 1024: (4096...5119)
> 1024: (5120...6143)
> 2048: split 1024 + 1024
> 1024: (6144...7167)
> 1024: (7168...8191)
> 8192: split 4096 + 4096
> 4096: split 2048 + 2048
> 2048: split 1024 + 1024
> 1024: (8192...9215)
> 1024: (9216...10239)
> 2048: split 1024 + 1024
> 1024: (10240...11263)
> 1024: (11264...12287)
> 4096: split 2048 + 2048
> 2048: split 1024 + 1024
> 1024: (12288...13311)
> 1024: (13312...14335)
> 2048: split 1024 + 1024
> 1024: (14336...15359)
> 1024: (15360...16383)
>
> Total leafs: 22
>
>
> 7 leaf tasks are half the max. size or less.
>
> Now SpinedBuffer has chunks with sizes increasing exponentially using
> power of 2: 16, 16, 32, 64, 128, 256, 512, ... Each chunk has capacity
> equal to the sum of capacities of all previous chunks. So nearly optimal
> split (modulo last non-full chunk) would be to always split so that the
> last chunk is cut-off and the returned Spliterator contains all the
> chunks up to the last and this Spliterator only contains the last chunk.
>
> Here's my attempt at modifying the spliterator to act like that (I only
> did it for the generic Object variant in 1st try, but I can prepare the
> primitive variants too, if desired):
>
> http://dl.dropboxusercontent.com/u/101777488/jdk8-lambda/SpinedBufferAltSpliterator/webrev.01/index.html
>
>
>
> With this modification, the above code prints:
>
> 16384: split 8192 + 8192
> 8192: split 4096 + 4096
> 4096: split 2048 + 2048
> 2048: split 1024 + 1024
> 1024: (0...1023)
> 1024: (1024...2047)
> 2048: split 1024 + 1024
> 1024: (2048...3071)
> 1024: (3072...4095)
> 4096: split 2048 + 2048
> 2048: split 1024 + 1024
> 1024: (4096...5119)
> 1024: (5120...6143)
> 2048: split 1024 + 1024
> 1024: (6144...7167)
> 1024: (7168...8191)
> 8192: split 4096 + 4096
> 4096: split 2048 + 2048
> 2048: split 1024 + 1024
> 1024: (8192...9215)
> 1024: (9216...10239)
> 2048: split 1024 + 1024
> 1024: (10240...11263)
> 1024: (11264...12287)
> 4096: split 2048 + 2048
> 2048: split 1024 + 1024
> 1024: (12288...13311)
> 1024: (13312...14335)
> 2048: split 1024 + 1024
> 1024: (14336...15359)
> 1024: (15360...16383)
>
> Total leafs: 16
>
>
> Regards, Peter
>
More information about the lambda-dev
mailing list