Streams and Spliterator characteristics confusion
Kasper Nielsen
kasperni at gmail.com
Sat Jun 28 15:40:13 UTC 2014
Thanks,
followup questions inlined.
On Fri, Jun 27, 2014 at 11:43 AM, Paul Sandoz <paul.sandoz at oracle.com>
wrote:
> Internally in the stream pipeline we keep track of certain characteristics
> for optimization purposes and those are conveniently used to determine the
> characteristics of the Spliterator, so there are some idiosyncrasies poking
> through.
>
>
> > s.sorted().spliterator() -> Spliterator.SORTED = true
> > But if I use specify a comparator the stream is not sorted
> > s.sorted((a,b) -> 1).spliterator() -> Spliterator.SORTED = false
> >
>
> Right, there is an optimization internally that ensures if the upstream
> stream is already sorted than the sort operation becomes a nop e.g.
>
> s.sorted().sorted();
>
> This optimization cannot apply when a comparator is passed in since we
> don't know if two comparators are identical in their behaviour e.g:
>
> s.sorted((a, b) ->
> a.compareTo(b)).sorted(Compatators.naturalOrder()).sorted()
>
>
What initially made me wonder was the javadoc of
Spliterator#getComparator()
which list "If this Spliterator's source is SORTED by a Comparator returns
that Comparator."
So I assumed s.sorted((a,b) -> 1).spliterator().getComparator() would
return said comparator.
It just feels a bit inconsistent compared to, for example, new
TreeMap(comparator).keyset().spliterator()
which returns Spliterator.SORTED = true and a comparator.
>
> > s.distinct().spliterator() -> Spliterator.DISTINCT = true
> > but limiting the number of distinct elements makes the stream non
> distinct
> > s.distinct().limit(10).spliterator() -> Spliterator.DISTINCT = false
>
> I don't observe that (see program below).
Right, that was an error on my part.
But still, I think some there are some cases where the flag should be
maintained.
For example, I think following the following program should print 4 'true'
values but it only prints 1.
Especially the second one puzzles me, invoking distinct() makes it
non-distinct?
static IntStream s() {
return StreamSupport.intStream(Spliterators.spliterator(new int[] { 12,
34 }, Spliterator.DISTINCT), false);
}
public static void main(String[] args) {
System.out.println(s().spliterator().hasCharacteristics(Spliterator.DISTINCT));
System.out.println(s().distinct().spliterator().hasCharacteristics(Spliterator.DISTINCT));
System.out.println(s().boxed().spliterator().hasCharacteristics(Spliterator.DISTINCT));
System.out.println(s().asDoubleStream().spliterator().hasCharacteristics(Spliterator.DISTINCT));
}
>
> > On the other hand something like Spliterator.SORTED is maintained when I
> > invoke limit
> > s.sorted().limit(10).spliterator() -> Spliterator.SORTED = true
> >
> >
> > A flag such as Spliterator.NONNULL is also cleared in situations where it
> > should not be.
>
> That is because it is not tracked in the pipeline as there is no gain
> optimisation-wise (if it was it would be cleared for map/flatMap operations
> and preserved for other ops like filter as you say below).
>
It's not that difficult to support this and should add no measurable
> performance cost, we deliberately left space in the bit fields, however
> since spliterator() is an escape-hatch for doing stuff that cannot be done
> by other operations i think the value of supporting NONULL is marginal.
>
>
I am trying to implement the stream interfaces and I want to make sure that
my implementation have similar behaviour as the default implementation in
java.util.stream. The interoperability between streams and
Spliterator.characteristics is the only thing I'm having serious issues
with. I feel the current state is more a result of how streams are
implemented at the moment then as part of a public API.
I think something like a table with non-terminal stream operations as rows
and characteristics as columns. Where each cell was either: "cleared",
"set" or "maintained" would make sense.
Cheers
Kasper
More information about the core-libs-dev
mailing list