Streams and Spliterator characteristics confusion

Paul Sandoz paul.sandoz at oracle.com
Mon Jun 30 09:34:41 UTC 2014


On Jun 28, 2014, at 5:40 PM, Kasper Nielsen <kasperni at gmail.com> wrote:

> 
> > s.distinct().spliterator() -> Spliterator.DISTINCT = true
> > but limiting the number of distinct elements makes the stream non distinct
> > s.distinct().limit(10).spliterator() -> Spliterator.DISTINCT = false
> 
> I don't observe that (see program below).
> Right, that was an error on my part.
> 
> But still, I think some there are some cases where the flag should be maintained.
> For example, I think following the following program should print 4 'true' values but it only prints 1.
> Especially the second one puzzles me, invoking distinct() makes it non-distinct?
> 
> static IntStream s() {
>   return StreamSupport.intStream(Spliterators.spliterator(new int[] { 12, 34 }, Spliterator.DISTINCT), false);
> }
> 
> public static void main(String[] args) {
>    System.out.println(s().spliterator().hasCharacteristics(Spliterator.DISTINCT));
>    System.out.println(s().distinct().spliterator().hasCharacteristics(Spliterator.DISTINCT));
>    System.out.println(s().boxed().spliterator().hasCharacteristics(Spliterator.DISTINCT));
>    System.out.println(s().asDoubleStream().spliterator().hasCharacteristics(Spliterator.DISTINCT));
> }
> 

The second is a good example as to why this is an implementation detail, here is the implementation (some may want to close their eyes!):

    public final IntStream distinct() {
        // While functional and quick to implement, this approach is not very efficient.
        // An efficient version requires an int-specific map/set implementation.
        return boxed().distinct().mapToInt(i -> i);
    }

We could work out how to inject back in distinct but since the spliterator is intended as an escape hatch i did not think it worth the effort.

Note if the latter source was a a long stream it would not be able to inject DISTINCT because not all long values can be represented precisely as double values.
 

>  
> I am trying to implement the stream interfaces and I want to make sure that my implementation have similar behaviour as the default implementation in java.util.stream. The interoperability between streams and Spliterator.characteristics is the only thing I'm having serious issues with. I feel the current state is more a result of how streams are implemented at the moment then as part of a public API.
> 
> I think something like a table with non-terminal stream operations as rows and characteristics as columns. Where each cell was either: "cleared", "set" or "maintained" would make sense.
> 

We deliberately did not specify this aspect, the implementation could change and we don't want to unduly constrain it based on an escape-hatch (it's not the common case). Implementations can decide to what extent the quality is of that escape-hatch spliterator. For your implementation you are free to provide better quality escape-hatch spliterators.

I think we should clarify the documentation on BaseStream.spliterator() to say something like:

  The characteristics of the returned spliterator need not correlate with characteristics of the stream source
  and those inferred from intermediate operations proceeding this terminal operation.

  https://bugs.openjdk.java.net/browse/JDK-8048689

I have also logged the following issues :

  Spliterator.NONNULL
  https://bugs.openjdk.java.net/browse/JDK-8048690

  ~ORDERED & SORTED
  https://bugs.openjdk.java.net/browse/JDK-8048691

Thanks,
Paul.



More information about the core-libs-dev mailing list