RFR: JDK-8067969 Optimize Stream.count for SIZED Streams

Chris Hegarty chris.hegarty at oracle.com
Thu Mar 12 12:07:24 UTC 2015


On 12 Mar 2015, at 11:25, Paul Sandoz <paul.sandoz at oracle.com> wrote:

> 
> On Mar 12, 2015, at 12:05 PM, Chris Hegarty <chris.hegarty at oracle.com> wrote:
> 
>> 
>> On 12 Mar 2015, at 09:44, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>> 
>>> 
>>> On Mar 11, 2015, at 1:45 PM, Aggelos Biboudis <biboudis at gmail.com> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> Please review the patch for the count terminal operator on SIZED streams.
>>>> 
>>>> https://bugs.openjdk.java.net/browse/JDK-8067969
>>>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8067969-optimize-stream-count/webrev/
>>>> 
>>>> Thanks Paul Sandoz for sponsoring this.
>>>> 
>>> 
>>> This looks good. Code is nicely contained and not as much as i initially anticipated.
>> 
>> This does indeed look nice.
>> 
>> One, trivial, question why call spliterator.getExactSizeIfKnown() and not spliterator.estimateSize() ?
> 
> Because the latter might return an estimate and not the exact size. For example, say we revamp the Files.lines for UTF-8 and optimize it, the root spliterator could report the size of the file as an estimate of the number of lines but it would not know the exact number of lines.

OK, got it.

>>> I am pondering adding an api note to the count methods to head off any suprises as now the stream pipeline may not be executed.
>> 
>> I think it would be good to add a note to the spec, as this could be surprising.
>> 
>> So really this comes down to the type if intermediate operations, right?
> 
> And what optimizations the implementation can do.
> 
> 
>> For example, filter will always be executed:
>> 
>> IntStream.of(1, 2, 3, 4).peek(System.out::println).filter(x -> true).count();
>> 
> 
> Yes.
> 
> 
>> Should the note capture something about the type of the intermediate operations?
>> 
> 
> How about:
> 
> * @apiNote
> * An implementation may choose to not execute the stream pipeline (either
> * sequentially or in parallel) if it is capable of computing the count
> * directly from the stream source.  In such cases no source elements will
> * be traversed and no intermediate operations will be evaluated.
> * Behavioral parameters with side-effects, which are strongly discouraged
> * except for harmless cases such as debugging, may be affected.  For
> * example, consider the following stream:
> * <pre>{@code
> *     List<String> l = ...
> *     long count = l.stream().peek(System.out::println).count();
> * }</pre>
> * The number of elements covered by the stream source, a {@code List}, is
> * known and the intermediate operation, {@code peek}, does not inject into
> * or remove elements from the stream (as may be the case for
> * {@code flatMap} or {@code filter} operations).  Thus the count is the
> * size of the {@code List} and there is no need to execute the pipeline
> * and, as a side-effect, print out the list elements.

Looks good to me.

> I want to tread lightly here and focus on operations that might legitimately be used for harmless side-effects.

Make sense.

-Chris.

> Paul.




More information about the core-libs-dev mailing list