StreamBuilder is awesome

Fri Apr 12 02:09:39 PDT 2013

The underlying implementation can do a good splitting job only if
individual items in the stream take the same amount of time to process. If
they are of variable size, that should also be taken account of in
determining split points.

Say you are processing a list of binary files, given as a File[]. A File
takes more time to filter(), map(), etc. if it's larger. So,
Arrays.stream() gives you a "balanced" Stream<File> only if the files are
of the same relative size. If they are not, you can do something like this:

int chunkSize = 0;
for (Fie file: files) {
    builder.accept(file);

    chunkSize += file.length();
    if (chunkSize > threshold) {
        builder.split();
        chunkSize = 0;
    }
}

This is much easier than the only other option, writing a custom Spiterator.

On Fri, Apr 12, 2013 at 4:46 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> That doesn't make sense to me.  The choice of split points is structural,
> not semantic.  But since you have no idea what the structure is underneath,
> how could you possibly choose the right split points?
>
>
> On 4/11/2013 6:45 PM, Ali Lahijani wrote:
>
>> It would be nice to have a way to hint the StreamBuilder implementation
>> about splitting points. One such way is a split() method in
>> StreamBuilder. One would call builder.split() to hint that the next call
>> to builder.accept() should create a new split.
>>
>>
>> On Fri, Apr 12, 2013 at 12:16 AM, Brian Goetz <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com**>> wrote:
>>
>>     Yes, the precipitating cause was to mitigate the pain of taking away
>>     FlatMapper, but as you point out, it has other uses too.
>>
>>     Note that you could always use ArrayList as a stream builder, so its
>>     not like this is something you couldn't do at all before.  But it
>>     should be more efficient, since it requires fewer object creations
>>     and doesn't have to copy elements every time you overflow a buffer.
>>
>>     The streams that are built still retain reasonable parallelism.  The
>>     internal representation is an array of increasing-size arrays, so
>>     the spliterator first splits the "spine" array and then can split
>>     into the individual arrays.
>>
>>     We're still twiddling with the API to try to ensure that it can be
>>     done with minimum overhead for flatMap-like usages (after all, if we
>>     didn't care about overhead, we'd just say "use ArrayList".)
>>
>>
>>     On 4/11/2013 3:31 PM, Ali Lahijani wrote:
>>
>>         I just want to express my delight at this great new feature.
>>
>>         StreamBuilder can be used to append to or concatenate Streams:
>>
>>                   StreamBuilder<E> builder = Streams.builder();
>>                   s1.forEach(builder);
>>                   builder.accept(x);
>>                   builder.accept(y);
>>                   s2.forEach(builder);
>>                   Stream<E> stream = builder.build();
>>
>>         And it can be used to create streams fluently, and in push mode,
>>         which
>>         is arguably the way that feels most natural in many situations.
>>
>>                   StreamBuilder.OfInt builder = Streams.intBuilder();
>>                   for (int i = 0; i < 1000; i++) {
>>                       if (isPrime(i)) {
>>                           build.accept(i)
>>                       }
>>                   }
>>                   IntStream stream = builder.build();
>>
>>         Though admittedly, streams built this way might benefit much from
>>         Stream framework's support for parallelism.
>>
>>         An observation:
>>         The call to build() at the end should always be put there, and
>>         exactly
>>         once. After that point, the builder should no longer be used. In
>>         line
>>         with these rules, I would prefer the following syntax:
>>
>>                   Stream<E> Streams.build((builder) -> {
>>                       s1.forEach(builder);
>>                       builder.accept(x);
>>                       builder.accept(y);
>>                       s2.forEach(builder);
>>                   });
>>
>>         The implicit call to build() is inserted after the lambda returns,
>>         after which point the builder is no longer available to be used. I
>>         think it feels more natural, and a bit more Lambda-like.
>>
>>         Once again, thanks for giving us this great new tool!
>>
>>         Best
>>
>>
>>