To Stream.slice(fromInclusive, toExclusive) or Stream.slice(toSkip, limit) that is the question

Brian Goetz brian.goetz at oracle.com
Fri Oct 11 07:18:38 PDT 2013


Rather than focus on "what do other Java APIs/languages do" (which is a 
useful data point, but rarely provides the whole story), we should ask 
"what's the goal."

I think the start:end notation makes essentially no sense for 
non-indexed entities.  For indexed entities like strings and lists, it 
is very natural, and the analogy with something similar but different is 
confusing us.  If you were slicing a String, you would probably do 
something like:

   int startPos = s.indexOf("[");
   int endPos = s.indexOf("]");
   if (startPos != -1 && endPos != -1)
     s = s.substring(startPos+1, endPos);

But, without being able to search through the characters to find the 
pattern you're looking for, how would you know where to end the slice? 
The same is true with lists.  In order for "stop at position N" to be 
useful, you have to have a way of searching through the elements to find 
at what position you might want to stop.

But streams don't let you do that.  So this use case, motivated by 
strings and other indexable entities, is a red herring.

However, with streams it is still useful to say "I want the n'th page of 
results."  Which translates to "Skip (n-1)*K, and limit to K".  It is 
useful to say "I want no more than K results."

Several have pointed out that it is surprising that .skip(n).limit(k) is 
inefficient enough to want a fusing.  Of course, in the sequential case, 
its fine.  But because skip/limit are constrained to operate in 
encounter order, in the worst case (non-SIZED+SUBSIZED, non-UNORDERED), 
we have to buffer.  Doing two rounds of buffering would suck twice -- 
and this was the primary motivation for a fused operation.

So I think there are two sensible choices here:

  - slice(toSkip, toLimit)
  - drop slice entirely

The cost of the latter is that those who need it in parallel in the 
unpleasant cases are even more likely to have to retreat to sequential.

On 10/11/2013 5:52 AM, Joe Bowbeer wrote:
> slice notation in groovy is also start..end, and slice notation in scala
> and clojure is start:end.
>
> Those are all the languages I know that have a slice. In what language
> is a slice parameterized by start:count?
>
> On Oct 11, 2013 1:12 AM, "Paul Sandoz" <paul.sandoz at oracle.com
> <mailto:paul.sandoz at oracle.com>> wrote:
>
>     Hi Joe,
>
>     I tend to think of slice(s, l) as the fused (optimal) form of
>     skip(s).limit(l). For parallel streams the the fused form will
>     result in less wrapping and/or buffering (depending on the
>     properties of the input stream). Documentation-wise we should
>     probably include an api note on skip and limit referring to slice in
>     this respect.
>
>     Paul.
>
>     On Oct 11, 2013, at 3:28 AM, Joe Bowbeer <joe.bowbeer at gmail.com
>     <mailto:joe.bowbeer at gmail.com>> wrote:
>
>      > slice(start, end) would be more useful and more consistent with
>     its use in
>      > other languages (Python, Perl).
>      >
>      > In Python, the elements are start .. end-1 whereas the end is
>     inclusive in
>      > Perl.  But I think the use of start:end is fairly consistent for most
>      > implementations of slice.
>      >
>      > I claim this would be more useful because otherwise there's no
>     difference
>      > between slice and skip(start) + limit(end-start)
>      >
>      > --Joe
>      >
>      >
>      > On Thu, Oct 10, 2013 at 5:04 PM, Brian Goetz
>     <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>> wrote:
>      >
>      >> FWIW, this is what the semantics were originally, and they got
>     modified
>      >> to be consistent with substring() when we renamed slice to
>     substring()
>      >> originally.  So this is a reversion to where we were before.
>      >>
>      >>
>      >> On 10/10/2013 7:54 PM, Mike Duigou wrote:
>      >>
>      >>> Hello all;
>      >>>
>      >>> In the review of the renaming patch for Stream.substream() ->
>     slice()
>      >>> Brian asked me to consider also changing the semantics of the
>      >>> parameters from the current
>     Stream.slice(fromInclusive,**toExclusive).
>      >>> The rationale is that we then have only one sense of usage in the
>      >>> parameters for skip/limit/slice. This also makes slice() more
>      >>> obviously equivalent to skip(toSkip).limit(limit). I am inclined to
>      >>> agree with him that using the same semantics for the parameters
>      >>> across the three methods has value.
>      >>>
>      >>> I will go forward with changing to Stream.slice(toSkip,limit)
>     before
>      >>> Monday assuming there is no outcry.
>      >>>
>      >>> Mike
>      >>>
>      >>>
>


More information about the lambda-libs-spec-experts mailing list