From dl at cs.oswego.edu  Mon Apr  1 05:37:59 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Mon, 01 Apr 2013 08:37:59 -0400
Subject: sorting and stability
In-Reply-To: <515318BB.2030805@cs.oswego.edu>
References: <5150DE09.3020505@cs.oswego.edu> <515318BB.2030805@cs.oswego.edu>
Message-ID: <51597FA7.9070207@cs.oswego.edu>

On 03/27/13 12:05, Doug Lea wrote:

>
> * The previous versions required temp workspace
> arrays as large as source array, even if only a
> portion was being sorted. Now they don't.

Except (and this took an embarrassingly long time to
track down), DualPivotQuickSort itself (as of JDK7)
sometimes creates a temp array, and if so, allocates
it to be as long as the array, not the slice. Now fixed.
This led to crazy anomalies during tests that made me suspect
all kinds of other problems with parallel versions.
As a side benefit though, it did lead to a few minor
improvements made while rechecking everything except
what I should have been looking at.

Another implementation note:
While I cleaned up some of it, Arrays.java and
the internal DualPivotQuickSort, TimSort, and
ComparableTimSort classes are still inconsistent about which
side does range checks and call conversion into
internal forms vs propagating convenience methods.
Someday this should be straightened out, so that
only Arrays.java does these, calling only expanded internal
forms in the sorter classes. Hilariously, this requires
that method rangeCheck be removed from TimSort.

Paul Sandoz: Please find sorts2.tar in the usual place.

-Doug


From kevinb at google.com  Mon Apr  1 07:23:27 2013
From: kevinb at google.com (Kevin Bourrillion)
Date: Mon, 1 Apr 2013 07:23:27 -0700
Subject: Spec and API review for {Int,Long,Double}SummaryStatistics
In-Reply-To: <1E7C3B20-8B4A-4782-BD59-B82ACD7AF4DB@oracle.com>
References: <514CD46F.9020508@oracle.com>
	<1BC38610-51E9-4A69-A1E7-192880618E5F@oracle.com>
	<CAGKkBku5OJTdxGA_NGjdbwe3T=k-C5d8NBPcKAx78yL1HOpQuA@mail.gmail.com>
	<1E7C3B20-8B4A-4782-BD59-B82ACD7AF4DB@oracle.com>
Message-ID: <CAGKkBkvBJa0Mw4Fx=bZRjZi+5H0kCUdux5YaW-3yuz18CFVgUQ@mail.gmail.com>

I'm confused, but I've seen nothing to change my impression that exposing
sumOfSquares is not helpful. As unpleasant as it may seem, if we want to
address the variance case at all, I think we have little choice but to
expose sampleVariance() and populationVariance() ourselves, and then
*those* can
use Kahan summation or whatever (which internally computes "sum of squares
of deltas", not sum of squares, as I (don't) understand it).


On Fri, Mar 29, 2013 at 3:16 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> > Also, while I'm here...
> >
> > Exposing sumOfSquares() does not permit users to safely calculate
> variance, which I believe makes it fairly useless and even dangerous:
> >
> > "The failure of Cauchy's fundamental inequality is another important
> example of the breakdown of traditional algebra in the presence of floating
> point arithmetic...Novice programmers who calculate the standard deviation
> of some observations by using the textbook formula [formula for the
> standard deviation in terms of the sum of squares] often find themselves
> taking the square root of a negative number!"  (Knuth AoCP vol 2, section
> 4.2.2)
>
> Thanks for raising this issue again -- I'd meant to respond earlier.  I
> ran this by our numerics guys.
>
> Basically, the problem is that for floating point numbers, since squaring
> makes small numbers smaller and big numbers bigger, summing squares in the
> obvious way risks the usual problem with adding numbers of grossly
> differing magnitudes. So while the naive factoring of population/sample
> variance allows you to compute them from sum(x) and sum(x^2), the latter is
> potentially numerically challenged.  (Note that this problem doesn't exist
> for int/long, assuming a long is big enough to compute sum(x^2) without
> overflow.)
>
> Still, I am not sure we do users a favor by leaving this out.  Many of
> them are likely to simply extend DoubleSummaryStatistics to calculate
> sum(x^2) anyway.  And the only other alternative is horrible; stream the
> data into a collection and make two passes on it, one for mean and one for
> variance.  That's at least 3x as expensive, if you can fit the whole thing
> in memory in the first place.
>
> The Knuth section you cite also offers a means to calculate variance more
> effectively in a single pass using a recurrence relation based on Kahan
> summation.  So I think the winning move is to provide a better
> implementation of sumsq than either of the naive implementations above, one
> that uses real numerics fu.  (We intend to provide a better implementation
> of summation for DoubleSummaryStatistics as well, based on Kahan.)
>
> Of course the crappy implementation that is in there now is less than
> ideal.
>
>
>


-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130401/08d1bbd5/attachment.html 

From brian.goetz at oracle.com  Mon Apr  1 07:47:23 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 01 Apr 2013 10:47:23 -0400
Subject: Spec and API review for {Int,Long,Double}SummaryStatistics
In-Reply-To: <CAGKkBkvBJa0Mw4Fx=bZRjZi+5H0kCUdux5YaW-3yuz18CFVgUQ@mail.gmail.com>
References: <514CD46F.9020508@oracle.com>
	<1BC38610-51E9-4A69-A1E7-192880618E5F@oracle.com>
	<CAGKkBku5OJTdxGA_NGjdbwe3T=k-C5d8NBPcKAx78yL1HOpQuA@mail.gmail.com>
	<1E7C3B20-8B4A-4782-BD59-B82ACD7AF4DB@oracle.com>
	<CAGKkBkvBJa0Mw4Fx=bZRjZi+5H0kCUdux5YaW-3yuz18CFVgUQ@mail.gmail.com>
Message-ID: <51599DFB.1020405@oracle.com>

The motivation for sumOfSquares() is indeed to help in calculation of 
variance.  As you've noted, there are multiple forms this can take 
(e.g., sample vs population).  Modulo numerical issues, sum(sq) is an 
input to all the various forms, so we theoretically stay out of 
whack-a-mole territory by providing this form rather than trying to 
provide all the various forms people might want.

Note that *not* providing any help here is a disaster for those who want 
it; they have to materialize the collection and then make two passes. 
Its not like those users can just (safely) extend the summary statistics 
to also calculate the part they need.

Note also that for numeric types like long, there are no numerical 
issues.  So punishing long for his brother's instability just seems mean.

(For those following at home: the formalae for variance involve:
   sum((x_i - \bar x)^2)

where \bar x is the average of x.  Which means you would first have to 
make a pass to find the average, and then make another pass to calculate 
sum of squares of deviation from the mean.  Factoring the above:

   (x_i - \bar x)^2 = x_i^2 - 2 x_i \bar x + (\bar x)^2

So the sum can be expressed in terms of average and sum of squares, and 
done in a single pass.  But unfortunately since squaring makes big 
numbers bigger and small numbers smaller, you end up risking adding 
10^20 and 10^-20 and losing data when done with floating points.)

On 4/1/2013 10:23 AM, Kevin Bourrillion wrote:
> I'm confused, but I've seen nothing to change my impression that
> exposing sumOfSquares is not helpful. As unpleasant as it may seem, if
> we want to address the variance case at all, I think we have little
> choice but to expose sampleVariance() and populationVariance()
> ourselves, and then /those/ can use Kahan summation or whatever (which
> internally computes "sum of squares of deltas", not sum of squares, as I
> (don't) understand it).
>
>
> On Fri, Mar 29, 2013 at 3:16 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>      > Also, while I'm here...
>      >
>      > Exposing sumOfSquares() does not permit users to safely calculate
>     variance, which I believe makes it fairly useless and even dangerous:
>      >
>      > "The failure of Cauchy's fundamental inequality is another
>     important example of the breakdown of traditional algebra in the
>     presence of floating point arithmetic...Novice programmers who
>     calculate the standard deviation of some observations by using the
>     textbook formula [formula for the standard deviation in terms of the
>     sum of squares] often find themselves taking the square root of a
>     negative number!"  (Knuth AoCP vol 2, section 4.2.2)
>
>     Thanks for raising this issue again -- I'd meant to respond earlier.
>       I ran this by our numerics guys.
>
>     Basically, the problem is that for floating point numbers, since
>     squaring makes small numbers smaller and big numbers bigger, summing
>     squares in the obvious way risks the usual problem with adding
>     numbers of grossly differing magnitudes. So while the naive
>     factoring of population/sample variance allows you to compute them
>     from sum(x) and sum(x^2), the latter is potentially numerically
>     challenged.  (Note that this problem doesn't exist for int/long,
>     assuming a long is big enough to compute sum(x^2) without overflow.)
>
>     Still, I am not sure we do users a favor by leaving this out.  Many
>     of them are likely to simply extend DoubleSummaryStatistics to
>     calculate sum(x^2) anyway.  And the only other alternative is
>     horrible; stream the data into a collection and make two passes on
>     it, one for mean and one for variance.  That's at least 3x as
>     expensive, if you can fit the whole thing in memory in the first place.
>
>     The Knuth section you cite also offers a means to calculate variance
>     more effectively in a single pass using a recurrence relation based
>     on Kahan summation.  So I think the winning move is to provide a
>     better implementation of sumsq than either of the naive
>     implementations above, one that uses real numerics fu.  (We intend
>     to provide a better implementation of summation for
>     DoubleSummaryStatistics as well, based on Kahan.)
>
>     Of course the crappy implementation that is in there now is less
>     than ideal.
>
>
>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com
> <mailto:kevinb at google.com>

From brian.goetz at oracle.com  Mon Apr  1 12:28:07 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 01 Apr 2013 15:28:07 -0400
Subject: Survey results
Message-ID: <5159DFC7.4070001@oracle.com>

Survey results for the last two surveys are here:
 
https://www.surveymonkey.com/sr.aspx?sm=Rmxo_2fOmocQqW5Txn1rPztBT4bwQsjNcCWzomugR5Fsg_3d

 
https://www.surveymonkey.com/sr.aspx?sm=KxnVsqG2kS7L_2bayV3Kg_2bu2Qi40QNOfB8penEX2R4Cuc_3d

Mike has already responded to the comments for XxxSummaryStatistics.

I have integrated the comments for Stream into a recent push, and 
propagated them forward to {Int,Long,Double}Stream.

I have removed forEachUntilCancelled based on evidence that people find 
it too confusing.  We're working on a new proposal for cancelation, stay 
tuned.


From paul.sandoz at oracle.com  Tue Apr  2 06:34:06 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Tue, 2 Apr 2013 15:34:06 +0200
Subject: RFR JDK-8010096 : Initial java.util.Spliterator putback
In-Reply-To: <515AD100.5040103@oracle.com>
References: <09A8DF98-6FF6-452E-8150-E86D9113E580@oracle.com>
	<515AD100.5040103@oracle.com>
Message-ID: <C6849F7B-4665-44C9-8712-A2A3B55D007F@oracle.com>

On Apr 2, 2013, at 2:37 PM, Chris Hegarty <chris.hegarty at oracle.com> wrote:
> Nice work Paul, some small comments.
> 
> - new javadocs tags, @implSpec, @apiNote, etc. I really like the use of
>  implSpec to define the behavior of this implementations default
>  methods. There is probably a separate thread, but any idea when these
>  will be generated in the javadoc, not just the lambda docs?
> 

I do not know, Mike is the one who is very likely to know more.

 
> - Iterator.remove @since 1.8? I see there is a conflict here between
>  when the method was originally added and its default
> 

Right, that is most likely a mistake. How can we express that the default method is there since 1.8?


> - Spliterator class level examples are not showing in the specdiff.
>   Are these really API Notes? Maybe they are.
> 

The examples are non-normative so i think such docs can be categorized under @apiNote.

See here for generated JavaDoc from the lambda repo:

  http://cr.openjdk.java.net/~psandoz/lambda/spliterator/jdk-8010096/api/java/util/Spliterator.html

Paul.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130402/897c3863/attachment.html 

From brian.goetz at oracle.com  Wed Apr  3 10:27:32 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 03 Apr 2013 13:27:32 -0400
Subject: Additional Collectors
Message-ID: <515C6684.8020007@oracle.com>

There's been some feedback on lambda-dev and from the recent Lambda Hack 
Day on Collectors.  There were two big categories:

1.  Need more / better docs.

2.  We want some more collectors.

The first is obvious and we've been working on those.  Here are some 
suggestions for simple additions to the Collector set.

  - count() (and possibly sum, min, max)

These are straighforward analogues of the specialized stream methods; 
they serve as a "gentle on ramp"  to understanding reduction.

People also expressed concern that the "toMap()" (nee mappedTo, 
joiningWith) is not flexible enough.  As a reminder, what toMap does is 
take a Stream<T> and a function T->U and produces a Map<T,U>.  Some 
people call this "backwards"; they would rather have something that 
takes a Stream<T> and function T->K and produces a Map<K,T>.  And others 
would rather have something that takes two functions T->K and T->U and 
produces a Map<K,U>.

All of these are useful enough.  The question is how to fit them into 
the API.  I think the name "toMap" is a bit of a challenge, since there 
are several "modes" and not all of them can be easily handled by 
overloads.  Maybe:

   toMap(T->U) // first version
   toMap(T->K, T->U) // third version

and leave the second version out, since the third version can easily 
simulate the second?


From forax at univ-mlv.fr  Wed Apr  3 10:41:17 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 03 Apr 2013 19:41:17 +0200
Subject: Additional Collectors
In-Reply-To: <515C6684.8020007@oracle.com>
References: <515C6684.8020007@oracle.com>
Message-ID: <515C69BD.3090505@univ-mlv.fr>

On 04/03/2013 07:27 PM, Brian Goetz wrote:
> There's been some feedback on lambda-dev and from the recent Lambda 
> Hack Day on Collectors.  There were two big categories:
>
> 1.  Need more / better docs.
>
> 2.  We want some more collectors.
>
> The first is obvious and we've been working on those.  Here are some 
> suggestions for simple additions to the Collector set.
>
>  - count() (and possibly sum, min, max)
>
> These are straighforward analogues of the specialized stream methods; 
> they serve as a "gentle on ramp"  to understanding reduction.
>
> People also expressed concern that the "toMap()" (nee mappedTo, 
> joiningWith) is not flexible enough.  As a reminder, what toMap does 
> is take a Stream<T> and a function T->U and produces a Map<T,U>.  Some 
> people call this "backwards"; they would rather have something that 
> takes a Stream<T> and function T->K and produces a Map<K,T>.  And 
> others would rather have something that takes two functions T->K and 
> T->U and produces a Map<K,U>.
>
> All of these are useful enough.  The question is how to fit them into 
> the API.  I think the name "toMap" is a bit of a challenge, since 
> there are several "modes" and not all of them can be easily handled by 
> overloads.  Maybe:

better if you rename U to V

>
>   toMap(T->V) // first version

produces a Map<T,V>

> toMap(T->K, T->V) // third version

produces a Map<K,V>.

why toMap(T -> V) is not toMap(T -> T, T, -> V) ?
in that case, we only need one toMap.

>
> and leave the second version out, since the third version can easily 
> simulate the second?
>

cheers,
R?mi


From tim at peierls.net  Wed Apr  3 10:49:40 2013
From: tim at peierls.net (Tim Peierls)
Date: Wed, 3 Apr 2013 13:49:40 -0400
Subject: Additional Collectors
In-Reply-To: <515C6684.8020007@oracle.com>
References: <515C6684.8020007@oracle.com>
Message-ID: <CA+F8eeQDVO8kzYimbknCERP1ijpuCrVjvvMF=cnMKPVpzQgYsQ@mail.gmail.com>

On Wed, Apr 3, 2013 at 1:27 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> People also expressed concern that the "toMap()" (nee mappedTo,
> joiningWith) is not flexible enough.  As a reminder, what toMap does is
> take a Stream<T> and a function T->U and produces a Map<T,U>.  Some people
> call this "backwards"; they would rather have something that takes a
> Stream<T> and function T->K and produces a Map<K,T>.  And others would
> rather have something that takes two functions T->K and T->U and produces a
> Map<K,U>.
>

The second form (Stream<T> and T->K producing Map<K, T>) could be called
"indexing", so toIndexMap or toIndex would seem appropriate. I don't have a
sense of a natural name for the third form. toMap still seems good for the
first form.


> All of these are useful enough.  The question is how to fit them into the
> API.  I think the name "toMap" is a bit of a challenge, since there are
> several "modes" and not all of them can be easily handled by overloads.
>  Maybe:
>
>   toMap(T->U) // first version
>   toMap(T->K, T->U) // third version
>
> and leave the second version out, since the third version can easily
> simulate the second?
>

Maybe, but I like the thought of toIndex or something like that.

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130403/488e27e2/attachment.html 

From brian.goetz at oracle.com  Wed Apr  3 10:53:30 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 03 Apr 2013 13:53:30 -0400
Subject: Performance update
Message-ID: <515C6C9A.2000105@oracle.com>

With Doug's help, we've been beating on the performance of the Streams 
implementation.  We've been in pretty good shape all along with 
per-element overhead, since we departed from Iterator very early on. 
But we've been struggling with startup overhead.  As the API has 
stabilized and many simplifying assumptions have been made (e.g., recent 
simplification of sequential/parallel, outlawing "reuse", outlawing 
"forked" streams, etc), we've recently been able to make a refactoring 
pass that reduces the object count for setting up a stream.  Highlights 
of this include:
  - merging PipelineHelper into AbstractPipeline;
  - eliminating the Supplier<Spliterator> capture even when the client 
provides a late-binding Spliterator;
  - Recasting the Op implementations as "extends XxxPipeline" instead of 
having the pipeline object encapsulate the Op (2x reduction)
  - merging TerminalOp and TerminalSink for some operations, including 
forEach

Some of this is already in, but the rest should be going in the next few 
days.  It does not affect the public API at all.

We've also opened the door to implementing some parallel stateful 
operations without full barriers, so they can be better pipelined.  For 
example, limit/substream on a stream that is SIZED+SUBSIZED can be 
expressed as a wrapping spliterator without touching the data or 
computing elements that won't be part of the result.  The new 
implementation strategy permits this, though we have to do some more 
work to upgrade the candidate operations.

Further, many of the "expensive" setup operations are now avoided for 
sequential and stateless parallel pipelines, only being paid by parallel 
pipelines with stateful ops.

From brian.goetz at oracle.com  Wed Apr  3 11:02:33 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 03 Apr 2013 14:02:33 -0400
Subject: Additional Collectors
In-Reply-To: <CA+F8eeQDVO8kzYimbknCERP1ijpuCrVjvvMF=cnMKPVpzQgYsQ@mail.gmail.com>
References: <515C6684.8020007@oracle.com>
	<CA+F8eeQDVO8kzYimbknCERP1ijpuCrVjvvMF=cnMKPVpzQgYsQ@mail.gmail.com>
Message-ID: <515C6EB9.5030301@oracle.com>

There are basically three strategies here we could take:

1.  Annoint one direction as the "natural" direction and make the other 
either fit into the general form (as I proposed) or have a modified name 
(as Tim proposes.)  I am fine with either of these.  (There will always 
be those who say "you picked the wrong direction to annoint.)

2.  Lard up both names with a directionality.

3.  Pick totally new names, such as mappedTo and indexedFrom.

Also we want to avoid variant overload.  Right now we have two versions 
each of toMap, toConcurrentMap; a simple (mapping function only) one, 
and a kitchen-sink (mapping function + map ctor + merge function).  This 
was already a compromise to keep the count low.

If we go with my suggestion (keep T->U form, plus add one more general 
form) that is 4 new methods.  If we decide to have forms for all three 
forms, that's 8 new methods.

I think if we do
   Map<T,U> toMap(T->U)
and
   Map<K,U> toMap(T->K, T->U)

we can call them both toMap and people will get it.

On 4/3/2013 1:49 PM, Tim Peierls wrote:
> On Wed, Apr 3, 2013 at 1:27 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     People also expressed concern that the "toMap()" (nee mappedTo,
>     joiningWith) is not flexible enough.  As a reminder, what toMap does
>     is take a Stream<T> and a function T->U and produces a Map<T,U>.
>       Some people call this "backwards"; they would rather have
>     something that takes a Stream<T> and function T->K and produces a
>     Map<K,T>.  And others would rather have something that takes two
>     functions T->K and T->U and produces a Map<K,U>.
>
>
> The second form (Stream<T> and T->K producing Map<K, T>) could be called
> "indexing", so toIndexMap or toIndex would seem appropriate. I don't
> have a sense of a natural name for the third form. toMap still seems
> good for the first form.
>
>     All of these are useful enough.  The question is how to fit them
>     into the API.  I think the name "toMap" is a bit of a challenge,
>     since there are several "modes" and not all of them can be easily
>     handled by overloads.  Maybe:
>
>        toMap(T->U) // first version
>        toMap(T->K, T->U) // third version
>
>     and leave the second version out, since the third version can easily
>     simulate the second?
>
>
> Maybe, but I like the thought of toIndex or something like that.
>
> --tim
>

From tim at peierls.net  Wed Apr  3 11:34:39 2013
From: tim at peierls.net (Tim Peierls)
Date: Wed, 3 Apr 2013 14:34:39 -0400
Subject: Additional Collectors
In-Reply-To: <515C6EB9.5030301@oracle.com>
References: <515C6684.8020007@oracle.com>
	<CA+F8eeQDVO8kzYimbknCERP1ijpuCrVjvvMF=cnMKPVpzQgYsQ@mail.gmail.com>
	<515C6EB9.5030301@oracle.com>
Message-ID: <CA+F8eeTmmeSmQDKUp0YMyx9g3_0nwNyo+w=1jLG_0+jybhK4Eg@mail.gmail.com>

On Wed, Apr 3, 2013 at 2:02 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> I think if we do
>   Map<T,U> toMap(T->U)
> and
>   Map<K,U> toMap(T->K, T->U)
>
> we can call them both toMap and people will get it.


I think that's fine, but if you do that it would be really nice to show how
to get the "toIndexMap" behavior in the docs for the two-arg toMap.

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130403/b1845b2a/attachment.html 

From brian.goetz at oracle.com  Wed Apr  3 20:00:25 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 03 Apr 2013 23:00:25 -0400
Subject: unrdered()
Message-ID: <515CECC9.6080101@oracle.com>

At one point, we had an unordered() op.  I think it may be time to bring 
it back.

There are a growing number of ops that have optimized implementations 
for unordered streams:

   - distinct can be implemented with concurrent insertion into a CHS 
instead of merging if we don't care about order.  Not only is this less 
work (merging is expensive), but it makes distinct lazy (elements can 
flow through immediately once they've not been found in the CHS, instead 
of waiting for all the elements to be seen.)

   - sorted is non-stable in unordered streams.

   - limit/subsequence are far lighter for unordered streams (and can 
similarly be made lazy)

So a way of saying "I know you think this stream has ordering, but I 
don't care about it" is a way of opting into these optimizations.

Implementation is trivial.

Adding .unordered() could also enable us to get rid of 
.collectUnordered(), and allow more of the reduce-like ops to benefit 
from the embrace of "disorder" without API explosion.


From paul.sandoz at oracle.com  Thu Apr  4 02:11:51 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Thu, 4 Apr 2013 11:11:51 +0200
Subject: unrdered()
In-Reply-To: <515CECC9.6080101@oracle.com>
References: <515CECC9.6080101@oracle.com>
Message-ID: <FDC198CD-9835-494A-BDAD-55F2400251C3@oracle.com>


On Apr 4, 2013, at 5:00 AM, Brian Goetz <brian.goetz at Oracle.COM> wrote:

> At one point, we had an unordered() op.  I think it may be time to bring it back.
> 
> There are a growing number of ops that have optimized implementations for unordered streams:
> 
>  - distinct can be implemented with concurrent insertion into a CHS instead of merging if we don't care about order.  Not only is this less work (merging is expensive), but it makes distinct lazy (elements can flow through immediately once they've not been found in the CHS, instead of waiting for all the elements to be seen.)
> 
>  - sorted is non-stable in unordered streams.
> 
>  - limit/subsequence are far lighter for unordered streams (and can similarly be made lazy)
> 
> So a way of saying "I know you think this stream has ordering, but I don't care about it" is a way of opting into these optimizations.
> 

Right, we previously thought "well lets just go with what the two ends of the pipeline define in terms of having order and preserving order respectively".

AFAICT unordered() would be useful for parallel pipelines with:

1) a source that has order

2) stateful operations that can be optimize if order need not be preserved

3) an order preserving terminal operation

and implying unordered() should be declared close to the source.


> Implementation is trivial.
> 
> Adding .unordered() could also enable us to get rid of .collectUnordered(), and allow more of the reduce-like ops to benefit from the embrace of "disorder" without API explosion.
> 

Although collectUnordered also back propagates lack of order upstream (just like forEach, or findAny). To remove collectUnordered we would need collectors to define whether they preserve order or not (I see in a recent change set to lambda you started work on that).

So a collect(toConcurrentMap()) should back propagate lack of order since CHM is used, but for the supplier version we cannot guarantee that since ConcurrentSkipListMap might be used. Ugh is all a bit complex.

Paul.

From paul.sandoz at oracle.com  Thu Apr  4 05:02:36 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Thu, 4 Apr 2013 14:02:36 +0200
Subject: Additional Collectors
In-Reply-To: <CA+F8eeTmmeSmQDKUp0YMyx9g3_0nwNyo+w=1jLG_0+jybhK4Eg@mail.gmail.com>
References: <515C6684.8020007@oracle.com>
	<CA+F8eeQDVO8kzYimbknCERP1ijpuCrVjvvMF=cnMKPVpzQgYsQ@mail.gmail.com>
	<515C6EB9.5030301@oracle.com>
	<CA+F8eeTmmeSmQDKUp0YMyx9g3_0nwNyo+w=1jLG_0+jybhK4Eg@mail.gmail.com>
Message-ID: <C332F150-CF5A-4D38-B43E-48BA7D45C736@oracle.com>


On Apr 3, 2013, at 8:34 PM, Tim Peierls <tim at peierls.net> wrote:

> On Wed, Apr 3, 2013 at 2:02 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
>> I think if we do
>>  Map<T,U> toMap(T->U)
>> and
>>  Map<K,U> toMap(T->K, T->U)
>> 
>> we can call them both toMap and people will get it.
> 
> 
> I think that's fine, but if you do that it would be really nice to show how
> to get the "toIndexMap" behavior in the docs for the two-arg toMap.
> 

I quite like the fact, at the moment, that toMap always uses an element as the key and maps element to values. It is an easy rule to remember.

Where as groupingBy always requires a classifiying function to map an element to a key, plus many ways to collect elements associated the same key, the canonical one being to collect elements to a list [*].

There seems another basic use-case which is toUniqueIndexMap:

  Map<U, T> toUniqueIndexMap(T->U)

and then we don't need to merge values of T. If that is required then groupingBy could be used instead.

Perhaps documentation-wise it may be helpful to provide examples of how toMap etc could be implemented using groupingBy?

Paul.

[*] I  wondering whether we really need the explicit List returning variants of groupingBy.

From joe.bowbeer at gmail.com  Thu Apr  4 06:01:09 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 4 Apr 2013 08:01:09 -0500
Subject: unrdered()
In-Reply-To: <515CECC9.6080101@oracle.com>
References: <515CECC9.6080101@oracle.com>
Message-ID: <CAHzJPEqDQwtvciaMMaJ3yLqhPaSNrjZY3cOsXSB7OkXKdPrKsQ@mail.gmail.com>

Consider making forEach ordered by default, and relying on unordered() to
disable this.
On Apr 3, 2013 10:00 PM, "Brian Goetz" <brian.goetz at oracle.com> wrote:

> At one point, we had an unordered() op.  I think it may be time to bring
> it back.
>
> There are a growing number of ops that have optimized implementations for
> unordered streams:
>
>   - distinct can be implemented with concurrent insertion into a CHS
> instead of merging if we don't care about order.  Not only is this less
> work (merging is expensive), but it makes distinct lazy (elements can flow
> through immediately once they've not been found in the CHS, instead of
> waiting for all the elements to be seen.)
>
>   - sorted is non-stable in unordered streams.
>
>   - limit/subsequence are far lighter for unordered streams (and can
> similarly be made lazy)
>
> So a way of saying "I know you think this stream has ordering, but I don't
> care about it" is a way of opting into these optimizations.
>
> Implementation is trivial.
>
> Adding .unordered() could also enable us to get rid of
> .collectUnordered(), and allow more of the reduce-like ops to benefit from
> the embrace of "disorder" without API explosion.
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130404/9e7eb177/attachment.html 

From brian.goetz at oracle.com  Thu Apr  4 06:56:53 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 04 Apr 2013 09:56:53 -0400
Subject: unrdered()
In-Reply-To: <FDC198CD-9835-494A-BDAD-55F2400251C3@oracle.com>
References: <515CECC9.6080101@oracle.com>
	<FDC198CD-9835-494A-BDAD-55F2400251C3@oracle.com>
Message-ID: <515D86A5.1030709@oracle.com>

> Although collectUnordered also back propagates lack of order upstream
> (just like forEach, or findAny). To remove collectUnordered we would
> need collectors to define whether they preserve order or not (I see
> in a recent change set to lambda you started work on that).

With
   foo.unordered()....collect()
vs
   foo...collectUnordered()

unless any of the ops in ... inject order (only candidate I can think of 
is sort, when you probably really want the order!), it will be unordered 
for all of the ... ops -- so do we really need the back-propagation?


From paul.sandoz at oracle.com  Thu Apr  4 07:08:24 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Thu, 4 Apr 2013 16:08:24 +0200
Subject: unrdered()
In-Reply-To: <515D86A5.1030709@oracle.com>
References: <515CECC9.6080101@oracle.com>
	<FDC198CD-9835-494A-BDAD-55F2400251C3@oracle.com>
	<515D86A5.1030709@oracle.com>
Message-ID: <E138DBC1-1766-4A53-B7E7-B0EE1710F563@oracle.com>


On Apr 4, 2013, at 3:56 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

>> Although collectUnordered also back propagates lack of order upstream
>> (just like forEach, or findAny). To remove collectUnordered we would
>> need collectors to define whether they preserve order or not (I see
>> in a recent change set to lambda you started work on that).
> 
> With
>  foo.unordered()....collect()
> vs
>  foo...collectUnordered()
> 
> unless any of the ops in ... inject order (only candidate I can think of is sort, when you probably really want the order!), it will be unordered for all of the ... ops -- so do we really need the back-propagation?
> 

I was thinking the same thing, we can get rid of the back propagation. It is complex, plus annoying to implement :-)

We can then also achieve what Joe proposes with forEach.

Paul.

From brian.goetz at oracle.com  Thu Apr  4 07:18:32 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 04 Apr 2013 10:18:32 -0400
Subject: unrdered()
In-Reply-To: <CAHzJPEqDQwtvciaMMaJ3yLqhPaSNrjZY3cOsXSB7OkXKdPrKsQ@mail.gmail.com>
References: <515CECC9.6080101@oracle.com>
	<CAHzJPEqDQwtvciaMMaJ3yLqhPaSNrjZY3cOsXSB7OkXKdPrKsQ@mail.gmail.com>
Message-ID: <515D8BB8.6080601@oracle.com>

> Consider making forEach ordered by default, and relying on unordered()
> to disable this.

We did consider this, and it is weird that this the only terminal that 
has unordered as its behavior, but I think the current behavior is right.

If someone does:

   seqStream.forEach(action)

They will expect that the action is performed sequentially in the 
calling thread.  If they do:

   seqStream.parallel().forEach(action)

I believe they will (reasonably) expect the action to happen in parallel 
across threads.  Constraining to encounter order gives up the vast 
majority of the parallelism.  I think if we did this people would say 
"parallel streams don't work."


Separately, Paul quite correctly points out that back-propagating 
unordered from the terminal is a pain.  In:

   seqStream.parallel().distinct().forEach()

Here, since the forEach will be unordered, there's no point in doing the 
more expensive ordered processing for distinct.  This only shows up for 
parallel pipelines with stateful operations.  In that case, we can walk 
backwards injecting unordered, but have to stop when we hit a 
short-circuit operation.  (Though we could just omit this for now, its 
just an optimization.)

From brian.goetz at oracle.com  Fri Apr  5 09:28:16 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 05 Apr 2013 12:28:16 -0400
Subject: Fwd: hg: lambda/lambda/jdk: Add .unordered() operation; eliminate
	.collectUnordered()
In-Reply-To: <20130405160957.BAADE480D2@hg.openjdk.java.net>
References: <20130405160957.BAADE480D2@hg.openjdk.java.net>
Message-ID: <515EFBA0.3010804@oracle.com>

Stream.collectUnordered has been removed in favor of a more general 
.unordered() method (which may be a no-op if the stream is already 
unordered.)  This allows more stateful and terminal ops to gain the 
benefit of opting out of ordering.

The collect(Collector) method currently performs a concurrent collection 
when all of the following are true:
  - the stream is parallel
  - the collector is concurrent
  - the collector is unordered OR the stream is unordered

Currently the groupingByConcurrent / toConcurrentMap collectors are not 
UNORDERED.  Meaning that users still have to have an unordered source 
(or ask for unordered explicitly) to get concurrent collection.  I'm 
currently working through what it would look like if this were reversed, 
and these collectors declared UNORDERED.

-------- Original Message --------
Subject: hg: lambda/lambda/jdk: Add .unordered() operation; eliminate 
.collectUnordered()
Date: Fri, 05 Apr 2013 16:09:44 +0000
From: brian.goetz at oracle.com
To: lambda-dev at openjdk.java.net

Changeset: adc363b47e78
Author:    briangoetz
Date:      2013-04-05 12:09 -0400
URL:       http://hg.openjdk.java.net/lambda/lambda/jdk/rev/adc363b47e78

Add .unordered() operation; eliminate .collectUnordered()

! src/share/classes/java/util/stream/AbstractPipeline.java
! src/share/classes/java/util/stream/BaseStream.java
! src/share/classes/java/util/stream/Collector.java
! src/share/classes/java/util/stream/DelegatingStream.java
! src/share/classes/java/util/stream/DoublePipeline.java
! src/share/classes/java/util/stream/IntPipeline.java
! src/share/classes/java/util/stream/LongPipeline.java
! src/share/classes/java/util/stream/ReferencePipeline.java
! src/share/classes/java/util/stream/Stream.java
! test-ng/bootlib/java/util/stream/OpTestCase.java
! test-ng/bootlib/java/util/stream/StreamTestData.java
! test-ng/tests/org/openjdk/tests/java/util/stream/TabulatorsTest.java


From brian.goetz at oracle.com  Fri Apr  5 12:47:35 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 05 Apr 2013 15:47:35 -0400
Subject: API and spec review for Stream
In-Reply-To: <514CD7E0.6030102@oracle.com>
References: <514CD7E0.6030102@oracle.com>
Message-ID: <515F2A57.8080507@oracle.com>

Updated based on comments from last survey.  New survey is up at:
   https://www.surveymonkey.com/s/VQ8MYBN

Includes Stream, IntStream, LongStream, DoubleStream, and a rough 
version of package doc.


On 3/22/2013 6:14 PM, Brian Goetz wrote:
> I have posted a survey at:
>    https://www.surveymonkey.com/s/59CTHS8
>
> This is a hopefully-final review of the API and preliminary review of
> the specification for the single class Stream.  Docs are linked from the
> survey.  Usual password.  Any and all constructive comments welcome.
>
> It is known that the specs are incomplete; what is here is a start.
> Suggestions for improvement are welcome.

From brian.goetz at oracle.com  Fri Apr  5 12:49:41 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 05 Apr 2013 15:49:41 -0400
Subject: API and spec review for Collector
Message-ID: <515F2AD5.90007@oracle.com>

I have posted a survey at:
   https://www.surveymonkey.com/s/VWC55PD

This is a review for the Collector API.  (Collectors will be separate.)


From brian.goetz at oracle.com  Fri Apr  5 12:51:32 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 05 Apr 2013 15:51:32 -0400
Subject: API and spec review for FlatMapper
Message-ID: <515F2B44.2000901@oracle.com>

I have posted a survey at:
   https://www.surveymonkey.com/s/VW5TZ5W

for the FlatMapper API.

From brian.goetz at oracle.com  Fri Apr  5 13:00:58 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 05 Apr 2013 16:00:58 -0400
Subject: Concurrent / unordered collection
Message-ID: <515F2D7A.4030109@oracle.com>

We've had some improvements in the model for managing ordering, so its 
time to take a look at whether these can flow into Collector as well.

  - There is now an .unordered() operation, and we removed the 
special-purpose .collectUnordered().

  - Terminal operations can have flags/characteristics just like 
intermediate operations.  The allowable flags for a terminal operation 
are ORDERED/NOT_ORDERED and SHORT_CIRCUIT.  The unordered status can 
back-propagate from a terminal up the pipeline:

   stream.distinct().forEach(...)

In the above, ordinarily distinct would be constrained to encounter 
order.  But, because we know there is an unordered forEach operation 
coming downstream, we can back-propagate UNORDERED up the chain, 
enabling the more efficient unordered version of distinct().

  - The Collector API has been enhanced with characteristic flags, just 
like Spliterator and Stream.  Defined characteristics include CONCURRENT 
and UNORDERED.  UNORDERED-ness of a Collector flows into the terminal 
flags of a collect() operation.  So, for example, toSet() is an 
unordered collector.

Until now, you only got a concurrent reduction when BOTH of the 
following were true:
  - the Collector is CONCURRENT
  - the source is unordered

This was because a concurrent collection fundamentally interferes with 
encounter order.  So if a user did:
   stream.collect(groupingByConcurrent())

they would NOT get a concurrent collection because the stream is ordered.

But, I think this may be surprising to users.  Now that a Collector can 
indicate that it is UNORDERED, I think we should consider making the 
concurrent-map collectors UNORDERED.  So if a user says:

   stream.collect(groupingByConcurrent())

he truly gets a concurrent collection.  If he is surprised that this is 
an unorderd collection, it is an opportunity to learn more about 
ordering.  But I think this is more consistent with user expectations 
and we can now more easily represent this in the API.


From brian.goetz at oracle.com  Sun Apr  7 15:47:15 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sun, 07 Apr 2013 18:47:15 -0400
Subject: Whither FlatMapper?
Message-ID: <5161F773.6050705@oracle.com>

I started to work through the survey comments on FlatMapper, which 
amounted to "hate the name", "need more examples", "hard to understand." 
  As I started to write more examples, and consider some of the things 
that have changed in the implementation recently, I am starting to think 
that maybe now we *can* actually get away with only the "obvious" (but 
still less performant) form.

What people think they want is:

   flatMap(T -> Stream<U>)

And, in a perfect world, they would be right.  The reason this has 
historically been a bad fit is that the performance cost of this version 
over the "explicit" version was enormous.  (It was merely bad for the "I 
already have a collection lying around" case, but horrible for the "I am 
generating values" case.)

But, a lot has happened recently in the implementation.  Previously, 
each *iteration* would have generated a Spliterator, a 
Supplier<Spliterator>, a Pipeline, a PipelineHelper, and a ForEachTask 
-- just to pass the values down the stream.  Since then, the supplier 
and helper are gone, the spliterator can likely be merged with the 
pipeline, and the forEach eliminated in most cases.  And there is still 
quite a bit more running room to further decrease the cost of building 
small streams.  There's a dozen small things we can do -- many 
implementation-only, but some are small API additions (such as 
singletonStream(T)) -- to bring this cost down further.

Even with the general forms available, almost no one understands how 
they work, and even those who figure it out still can't figure out why 
they would want it.  The pretty version is just so attractive that no 
one is willing to believe that it is painfully slow compared to the ugly 
version.  Given that this adds seven new SAMs (a significant fraction of 
the public API surface area of java.util.stream), I'm having second 
thoughts on including these now.

So, concrete proposal:
  - Drop all FlatMapper.* SAMs;
  - Drop all forms of flatMap(FlatMapper*)
  - Add back flatMapToXxx(Function<T, XxxStream) to Stream


From sam at sampullara.com  Sun Apr  7 16:01:48 2013
From: sam at sampullara.com (Sam Pullara)
Date: Sun, 7 Apr 2013 16:01:48 -0700
Subject: Whither FlatMapper?
In-Reply-To: <5161F773.6050705@oracle.com>
References: <5161F773.6050705@oracle.com>
Message-ID: <CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>

I'm a big fan of the current FlatMapper stuff that takes a Consumer. Much
more efficient and straightforward when you don't have a stream or
collection to just return. Here is some code that uses 3 of them for good
effect:

https://github.com/spullara/twitterprocessor/blob/master/src/main/java/twitterprocessor/App.java


On Sun, Apr 7, 2013 at 3:47 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> I started to work through the survey comments on FlatMapper, which
> amounted to "hate the name", "need more examples", "hard to understand."
>  As I started to write more examples, and consider some of the things that
> have changed in the implementation recently, I am starting to think that
> maybe now we *can* actually get away with only the "obvious" (but still
> less performant) form.
>
> What people think they want is:
>
>   flatMap(T -> Stream<U>)
>
> And, in a perfect world, they would be right.  The reason this has
> historically been a bad fit is that the performance cost of this version
> over the "explicit" version was enormous.  (It was merely bad for the "I
> already have a collection lying around" case, but horrible for the "I am
> generating values" case.)
>
> But, a lot has happened recently in the implementation.  Previously, each
> *iteration* would have generated a Spliterator, a Supplier<Spliterator>, a
> Pipeline, a PipelineHelper, and a ForEachTask -- just to pass the values
> down the stream.  Since then, the supplier and helper are gone, the
> spliterator can likely be merged with the pipeline, and the forEach
> eliminated in most cases.  And there is still quite a bit more running room
> to further decrease the cost of building small streams.  There's a dozen
> small things we can do -- many implementation-only, but some are small API
> additions (such as singletonStream(T)) -- to bring this cost down further.
>
> Even with the general forms available, almost no one understands how they
> work, and even those who figure it out still can't figure out why they
> would want it.  The pretty version is just so attractive that no one is
> willing to believe that it is painfully slow compared to the ugly version.
>  Given that this adds seven new SAMs (a significant fraction of the public
> API surface area of java.util.stream), I'm having second thoughts on
> including these now.
>
> So, concrete proposal:
>  - Drop all FlatMapper.* SAMs;
>  - Drop all forms of flatMap(FlatMapper*)
>  - Add back flatMapToXxx(Function<T, XxxStream) to Stream
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130407/2da371a9/attachment.html 

From brian.goetz at oracle.com  Mon Apr  8 12:08:49 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 15:08:49 -0400
Subject: Setting of UNORDERED on concurrent collectors
Message-ID: <516315C1.3080509@oracle.com>

Now that we've removed collectUnordered in favor of a more general 
unordered() op, we should consider what should be the default behavior for:

   orderedStream.collect(groupingByConcurrent(f))

Currently, the collect-to-ConcurrentMap collectors are *not* defined as 
UNORDERED.  Which means, if the stream is ordered, we will attempt to do 
an ordered collection anyway, which is incompatible with concurrent 
collection, and we will do the plain old partition-and-merge with 
ConcurrentMap.

Here, we have competing evidence for the user intent.  On the one hand, 
the stream is ordered, and the user could have chosen unordered.  On the 
other, the user has asked for concurrent grouping.  Its not 100% obvious 
which should win.

On the other hand, ordered map collections are so awful that they will 
almost certainly be unhappy with the performance if they forget to say 
unordered here in the parallel case (and it makes no difference in the 
sequential case.)  So I'm inclined to make groupingByConcurrent / 
toConcurrentMap be UNORDERED collections.

From dl at cs.oswego.edu  Mon Apr  8 12:27:53 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Mon, 08 Apr 2013 15:27:53 -0400
Subject: Whither FlatMapper?
In-Reply-To: <CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>
References: <5161F773.6050705@oracle.com>
	<CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>
Message-ID: <51631A39.30001@cs.oswego.edu>

On 04/07/13 19:01, Sam Pullara wrote:
> I'm a big fan of the current FlatMapper stuff that takes a Consumer. Much more
> efficient and straightforward when you don't have a stream or collection to just
> return. Here is some code that uses 3 of them for good effect:

I think the main issue is whether, given the user reactions so far, we
should insist on people using a generally better but non-obvious
approach to flat-mapping. Considering that anyone *could* write their own
FlatMappers layered on top of existing functionality (we could
even show how to do it as a code example somewhere), I'm with
Brian on this: give people the obvious forms in the API. People
who are most likely to use it are the least likely to be obsessive
about its performance. And when they are, they can learn about
alternatives.

-Doug


From joe.bowbeer at gmail.com  Mon Apr  8 12:36:37 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 8 Apr 2013 12:36:37 -0700
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <516315C1.3080509@oracle.com>
References: <516315C1.3080509@oracle.com>
Message-ID: <CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>

What is groupingByConcurrent good for?  What's the difference between
parallel and concurrent in this context?

I've re-read the last 5 emails that mention groupingByConcurrent and it is
not clear to me what's going on.

The most succinct indication of its function is:

The collect(Collector) method currently performs a concurrent collection
> when all of the following are true:
>  - the stream is parallel
>  - the collector is *concurrent*
>  - the collector is unordered OR the stream is unordered


In other words, *if* I happen to use groupingByConcurrent *then* maybe a
concurrent collection will be performed, but maybe not, depending on a
couple other factors...

Can we make this simpler and more intuitive/predictable?  I realize that's
what you're addressing now, but can't we go a lot farther?

Can we, say, get rid of groupingByConcurrent and just assume that if the
stream is parallel?  What do we lose?  Do we lose any functionality that
can't be derived another way?

Please educate me!

--Joe


On Mon, Apr 8, 2013 at 12:08 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Now that we've removed collectUnordered in favor of a more general
> unordered() op, we should consider what should be the default behavior for:
>
>   orderedStream.collect(**groupingByConcurrent(f))
>
> Currently, the collect-to-ConcurrentMap collectors are *not* defined as
> UNORDERED.  Which means, if the stream is ordered, we will attempt to do an
> ordered collection anyway, which is incompatible with concurrent
> collection, and we will do the plain old partition-and-merge with
> ConcurrentMap.
>
> Here, we have competing evidence for the user intent.  On the one hand,
> the stream is ordered, and the user could have chosen unordered.  On the
> other, the user has asked for concurrent grouping.  Its not 100% obvious
> which should win.
>
> On the other hand, ordered map collections are so awful that they will
> almost certainly be unhappy with the performance if they forget to say
> unordered here in the parallel case (and it makes no difference in the
> sequential case.)  So I'm inclined to make groupingByConcurrent /
> toConcurrentMap be UNORDERED collections.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130408/90b8808e/attachment.html 

From brian.goetz at oracle.com  Mon Apr  8 12:50:48 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 15:50:48 -0400
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
References: <516315C1.3080509@oracle.com>
	<CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
Message-ID: <51631F98.8030304@oracle.com>

> What is groupingByConcurrent good for?  What's the difference between
> parallel and concurrent in this context?

For sequential streams, concurrent is irrelevant.  So this is only 
relevant for parallel streams.

When doing a reduction on a parallel stream, there are two obvious ways 
to do it:

  - Partition the input, reduce the chunks separately into isolated 
subresults, then combine the subresults "up the tree" into a complete 
result (call this a "traditional" parallel reduction)

  - Use some sort of thread-safe combiner, and blast input elements at 
some shared combiner from all threads (call this a "concurrent" 
reduction.)  This is more like a forEach than a reduce.  Requirements 
for this to be safe include: the combiner must be thread-safe, and the 
user must not care about order, since there's no telling in what order 
the elements will be blasted.

When the reduction is a groupBy into a Map, this can make a big 
difference because of the merging performance of HashMap.

The traditional reduction looks like this:
  - create a HashMap per partition
  - Insert the elements of this partition into the HashMap
  - Go up the tree, merging two HashMaps into one.  This involves 
iterating a key-by-key merge.  This is slow.

The concurrent reduction looks like:
  - create one ConcurrentHashMap
  - Blast elements into it using atomic methods like putIfAbsent
  - Return that, no merging

In most reasonable cases, concurrent parallel reduction with CHM blows 
away traditional parallel reduction with HashMap.  On the other hand, 
one of the casualties of the concurrent approach is ordering.

If your input is (ordered):

   [ 1, 2, 3, 4, 5, 6, 7, 8 ]

and your classifier function is:

   e % 2

then the traditional approach must yield:
  { 0 => [ 2, 4, 6, 8 ],
    1 => [ 1, 3, 5, 7] }

but the concurrent approach could yield:

  { 0 => [ 6, 2, 4, 8 ],
    1 => [ 7, 1, 3, 5 ] }

So the question is, when confronted with an obvious desire to use a 
concurrent-safe collector, do we infer that the user must not care about 
ordering?

> The most succinct indication of its function is:
>
>     The collect(Collector) method currently performs a
>     concurrent collection when all of the following are true:
>       - the stream is parallel
>       - the collector is *concurrent*
>       - the collector is unordered OR the stream is unordered

This is the current rule about whether or not collect() does a 
concurrent reduction.  My question here is whether we wish to make our 
existing concurrent collectors always be unordered, so that the last 
bullet is trivially satisfied for the built-in concurrent collectors.

> In other words, *if* I happen to use groupingByConcurrent *then* maybe a
> concurrent collection will be performed, but maybe not, depending on a
> couple other factors...

That is the current state of affairs.

> Can we make this simpler and more intuitive/predictable?  I realize
> that's what you're addressing now, but can't we go a lot farther?
>
> Can we, say, get rid of groupingByConcurrent and just assume that if the
> stream is parallel?  What do we lose?  Do we lose any functionality that
> can't be derived another way?

That would cause us to access a non-thread-safe HashMap concurrently 
from multiple threads.


From brian.goetz at oracle.com  Mon Apr  8 13:05:45 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 16:05:45 -0400
Subject: Whither FlatMapper?
In-Reply-To: <51631A39.30001@cs.oswego.edu>
References: <5161F773.6050705@oracle.com>
	<CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>
	<51631A39.30001@cs.oswego.edu>
Message-ID: <51632319.4040704@oracle.com>

A slight correction: if we remove the flatMap(FlatMapper), there is no 
fluent form that is as efficient as the removed form that accepts (T, 
Consumer<T>), since there's no other way to get your hands on the 
downstream Sink.  (Not that this dampens my enthusiasm for removing it 
much.)

For the truly diffident, a middle ground does exist: remove FlatMapper 
and its six brothers as a named SAM, and replace it with BiConsumer<T, 
Consumer<T>>, leaving both forms of flatMap methods in place:
   flatMap(Function<T,STream<U>>)
   flapMap(BiConsumer<T, Consumer<U>>)

The main advantage being that the package javadoc is not polluted by 
seven forms of FlatMapper.

On 4/8/2013 3:27 PM, Doug Lea wrote:
> On 04/07/13 19:01, Sam Pullara wrote:
>> I'm a big fan of the current FlatMapper stuff that takes a Consumer.
>> Much more
>> efficient and straightforward when you don't have a stream or
>> collection to just
>> return. Here is some code that uses 3 of them for good effect:
>
> I think the main issue is whether, given the user reactions so far, we
> should insist on people using a generally better but non-obvious
> approach to flat-mapping. Considering that anyone *could* write their own
> FlatMappers layered on top of existing functionality (we could
> even show how to do it as a code example somewhere), I'm with
> Brian on this: give people the obvious forms in the API. People
> who are most likely to use it are the least likely to be obsessive
> about its performance. And when they are, they can learn about
> alternatives.
>
> -Doug
>

From joe.bowbeer at gmail.com  Mon Apr  8 13:07:40 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 8 Apr 2013 13:07:40 -0700
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <51631F98.8030304@oracle.com>
References: <516315C1.3080509@oracle.com>
	<CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
	<51631F98.8030304@oracle.com>
Message-ID: <CAHzJPEo5WMZqKwoLTQ4Q+3GVkf7aEoRstTakcKQzKeb=gVFPWg@mail.gmail.com>

>
>
> That would cause us to access a non-thread-safe HashMap concurrently from
> multiple threads.


I assumed that a concurrent collection would use a concurrent map.  Isn't
 it reasonable to assume that operations on a parallel stream will use
thread-safe collections?

BTW, the other downside of the current state of affairs is experienced by
the user who specifies a parallel stream and even declares it unordered,
but still gets a non-concurrent collection because groupingBy was used
instead of groupingByConcurrent.

In your examples, the difference between the two results is primarily one
of order, not concurrency.  Can we reflect this choice more directly in the
API?

Joe


On Mon, Apr 8, 2013 at 12:50 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> What is groupingByConcurrent good for?  What's the difference between
>> parallel and concurrent in this context?
>>
>
> For sequential streams, concurrent is irrelevant.  So this is only
> relevant for parallel streams.
>
> When doing a reduction on a parallel stream, there are two obvious ways to
> do it:
>
>  - Partition the input, reduce the chunks separately into isolated
> subresults, then combine the subresults "up the tree" into a complete
> result (call this a "traditional" parallel reduction)
>
>  - Use some sort of thread-safe combiner, and blast input elements at some
> shared combiner from all threads (call this a "concurrent" reduction.)
>  This is more like a forEach than a reduce.  Requirements for this to be
> safe include: the combiner must be thread-safe, and the user must not care
> about order, since there's no telling in what order the elements will be
> blasted.
>
> When the reduction is a groupBy into a Map, this can make a big difference
> because of the merging performance of HashMap.
>
> The traditional reduction looks like this:
>  - create a HashMap per partition
>  - Insert the elements of this partition into the HashMap
>  - Go up the tree, merging two HashMaps into one.  This involves iterating
> a key-by-key merge.  This is slow.
>
> The concurrent reduction looks like:
>  - create one ConcurrentHashMap
>  - Blast elements into it using atomic methods like putIfAbsent
>  - Return that, no merging
>
> In most reasonable cases, concurrent parallel reduction with CHM blows
> away traditional parallel reduction with HashMap.  On the other hand, one
> of the casualties of the concurrent approach is ordering.
>
> If your input is (ordered):
>
>   [ 1, 2, 3, 4, 5, 6, 7, 8 ]
>
> and your classifier function is:
>
>   e % 2
>
> then the traditional approach must yield:
>  { 0 => [ 2, 4, 6, 8 ],
>    1 => [ 1, 3, 5, 7] }
>
> but the concurrent approach could yield:
>
>  { 0 => [ 6, 2, 4, 8 ],
>    1 => [ 7, 1, 3, 5 ] }
>
> So the question is, when confronted with an obvious desire to use a
> concurrent-safe collector, do we infer that the user must not care about
> ordering?
>
>
>  The most succinct indication of its function is:
>>
>>     The collect(Collector) method currently performs a
>>     concurrent collection when all of the following are true:
>>       - the stream is parallel
>>       - the collector is *concurrent*
>>       - the collector is unordered OR the stream is unordered
>>
>
> This is the current rule about whether or not collect() does a concurrent
> reduction.  My question here is whether we wish to make our existing
> concurrent collectors always be unordered, so that the last bullet is
> trivially satisfied for the built-in concurrent collectors.
>
>
>  In other words, *if* I happen to use groupingByConcurrent *then* maybe a
>> concurrent collection will be performed, but maybe not, depending on a
>> couple other factors...
>>
>
> That is the current state of affairs.
>
>
>  Can we make this simpler and more intuitive/predictable?  I realize
>> that's what you're addressing now, but can't we go a lot farther?
>>
>> Can we, say, get rid of groupingByConcurrent and just assume that if the
>> stream is parallel?  What do we lose?  Do we lose any functionality that
>> can't be derived another way?
>>
>
> That would cause us to access a non-thread-safe HashMap concurrently from
> multiple threads.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130408/71807564/attachment-0001.html 

From brian.goetz at oracle.com  Mon Apr  8 13:19:16 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 16:19:16 -0400
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <CAHzJPEo5WMZqKwoLTQ4Q+3GVkf7aEoRstTakcKQzKeb=gVFPWg@mail.gmail.com>
References: <516315C1.3080509@oracle.com>
	<CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
	<51631F98.8030304@oracle.com>
	<CAHzJPEo5WMZqKwoLTQ4Q+3GVkf7aEoRstTakcKQzKeb=gVFPWg@mail.gmail.com>
Message-ID: <51632644.2020502@oracle.com>

> I assumed that a concurrent collection would use a concurrent map.
>   Isn't  it reasonable to assume that operations on a parallel stream
> will use thread-safe collections?

ABSOLUTELY NOT!

Any non-thread-safe collection can be used as a source for a parallel 
stream, without any more synchronization than is already implicit in the 
FJ library.  (Some may partition better than others, though; linked 
lists are never going to be parallel screamers.)

Similarly, any reduction can be done in parallel into a non-thread-safe 
collection.  Many of our collectors use non-thread-safe result 
containers like ArrayList, StringBuilder, or HashMap but are still 
perfectly parallel-safe.  The library provides the necessary isolation, 
so that these non-thread-safe containers are serially thread-confined 
and still we can get decent parallelism.

The only thing the user has to be careful of in order to not undermine 
this wonderful gift is to avoid interference.  Interference includes 
things like:
  - Modifying the source while you're doing a stream operation on it.
  - Using "lambdas" that depend on state that might be modified during 
the course of the stream operation.

In other words, as long as you can hold relevant state constant for the 
duration of your query, you get all this parallelism for free without 
having to think about thread safety or use thread-safe collections. 
Effective immutability is a very powerful thing.

> BTW, the other downside of the current state of affairs is experienced
> by the user who specifies a parallel stream and even declares it
> unordered, but still gets a non-concurrent collection because groupingBy
> was used instead of groupingByConcurrent.

Right.  But he will still get a parallel reduction.  It just may be that 
in some cases, he gets a reduction that parallelizes poorly, because the 
combine step of the reduction happens to be way more expensive that the 
accumulate step, as it is when the combine step is a merge-maps-by-key. 
  (We have no way of knowing this a priori.  Some non-concurrent 
reductions will parallelize with fine performance and have no need of 
the additional benefit that a concurrent collection gives.)

> In your examples, the difference between the two results is primarily
> one of order, not concurrency.  Can we reflect this choice more directly
> in the API?

We used to have that -- the selection of ordering (collect vs 
collectUnordered) was orthogonal to the collector, and we did a 
concurrent collection if we were in the (unordered, concurrent) 
quadrant.  That's the most explicit.


From joe.bowbeer at gmail.com  Mon Apr  8 13:41:49 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 8 Apr 2013 13:41:49 -0700
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <51632644.2020502@oracle.com>
References: <516315C1.3080509@oracle.com>
	<CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
	<51631F98.8030304@oracle.com>
	<CAHzJPEo5WMZqKwoLTQ4Q+3GVkf7aEoRstTakcKQzKeb=gVFPWg@mail.gmail.com>
	<51632644.2020502@oracle.com>
Message-ID: <CAHzJPEquc3DBmnSx8RA_Ntc42-UUXbXz7+Q8dC3VSn_MJZRMfQ@mail.gmail.com>

>
> In other words, as long as you can hold relevant state constant for the
> duration of your query, you get all this parallelism for free without
> having to think about thread safety or use thread-safe collections.


I'm using the forms of collect that hide the collections completely (except
as a return type).  I was only thinking about the order vs unorder and
parallel vs sequential aspects -- and I'd prefer to keep it that way.  So,
for example:

collect(unordered+parallel) should perform a concurrent collection?

(But you've already indicated that yes I do, in addition, need to think
about the collection type in this case even if I don't handle the
construction, right?)

Whereas your question is:

collectConcurrent(ordered+parallel) should disregard order?

I'm OK with this, but I wish groupingByConcurrent could go away.

--Joe


On Mon, Apr 8, 2013 at 1:19 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> I assumed that a concurrent collection would use a concurrent map.
>>   Isn't  it reasonable to assume that operations on a parallel stream
>> will use thread-safe collections?
>>
>
> ABSOLUTELY NOT!
>
> Any non-thread-safe collection can be used as a source for a parallel
> stream, without any more synchronization than is already implicit in the FJ
> library.  (Some may partition better than others, though; linked lists are
> never going to be parallel screamers.)
>
> Similarly, any reduction can be done in parallel into a non-thread-safe
> collection.  Many of our collectors use non-thread-safe result containers
> like ArrayList, StringBuilder, or HashMap but are still perfectly
> parallel-safe.  The library provides the necessary isolation, so that these
> non-thread-safe containers are serially thread-confined and still we can
> get decent parallelism.
>
> The only thing the user has to be careful of in order to not undermine
> this wonderful gift is to avoid interference.  Interference includes things
> like:
>  - Modifying the source while you're doing a stream operation on it.
>  - Using "lambdas" that depend on state that might be modified during the
> course of the stream operation.
>
> In other words, as long as you can hold relevant state constant for the
> duration of your query, you get all this parallelism for free without
> having to think about thread safety or use thread-safe collections.
> Effective immutability is a very powerful thing.
>
>
>  BTW, the other downside of the current state of affairs is experienced
>> by the user who specifies a parallel stream and even declares it
>> unordered, but still gets a non-concurrent collection because groupingBy
>> was used instead of groupingByConcurrent.
>>
>
> Right.  But he will still get a parallel reduction.  It just may be that
> in some cases, he gets a reduction that parallelizes poorly, because the
> combine step of the reduction happens to be way more expensive that the
> accumulate step, as it is when the combine step is a merge-maps-by-key.
>  (We have no way of knowing this a priori.  Some non-concurrent reductions
> will parallelize with fine performance and have no need of the additional
> benefit that a concurrent collection gives.)
>
>
>  In your examples, the difference between the two results is primarily
>> one of order, not concurrency.  Can we reflect this choice more directly
>> in the API?
>>
>
> We used to have that -- the selection of ordering (collect vs
> collectUnordered) was orthogonal to the collector, and we did a concurrent
> collection if we were in the (unordered, concurrent) quadrant.  That's the
> most explicit.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130408/dec3c3bc/attachment.html 

From brian.goetz at oracle.com  Mon Apr  8 13:46:00 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 16:46:00 -0400
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <CAHzJPEquc3DBmnSx8RA_Ntc42-UUXbXz7+Q8dC3VSn_MJZRMfQ@mail.gmail.com>
References: <516315C1.3080509@oracle.com>
	<CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
	<51631F98.8030304@oracle.com>
	<CAHzJPEo5WMZqKwoLTQ4Q+3GVkf7aEoRstTakcKQzKeb=gVFPWg@mail.gmail.com>
	<51632644.2020502@oracle.com>
	<CAHzJPEquc3DBmnSx8RA_Ntc42-UUXbXz7+Q8dC3VSn_MJZRMfQ@mail.gmail.com>
Message-ID: <51632C88.4090309@oracle.com>

> I'm using the forms of collect that hide the collections completely
> (except as a return type).  I was only thinking about the order vs
> unorder and parallel vs sequential aspects -- and I'd prefer to keep it
> that way.  So, for example:
>
> collect(unordered+parallel) should perform a concurrent collection?

Most of the collectors hide the return type, but expose the 
concurrent-ness of the return type in their name.

Earlier, we had a separate bag (ConcurrentCollectors) for concurrent 
collectors.  I disliked this because, with the obvious static imports, 
the user couldn't tell whether

   parStream.collect(groupingBy(f))

would be a concurrent (unordered) reduction or a traditional (ordered) 
one.

> (But you've already indicated that yes I do, in addition, need to think
> about the collection type in this case even if I don't handle the
> construction, right?)

Not the specific collection type.  You do need to reason about shape 
(List vs Map), and you need to reason about concurrent vs not (HashMap 
vs CHM), but not necessarily about List vs Set.

> Whereas your question is:
>
> collectConcurrent(ordered+parallel) should disregard order?

More whether:

   Collectors.groupingByConcurrent(f)

should declare itself to be an unordered Collector, just as toSet() is.


From dl at cs.oswego.edu  Mon Apr  8 13:57:34 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Mon, 08 Apr 2013 16:57:34 -0400
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <CAHzJPEquc3DBmnSx8RA_Ntc42-UUXbXz7+Q8dC3VSn_MJZRMfQ@mail.gmail.com>
References: <516315C1.3080509@oracle.com>
	<CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
	<51631F98.8030304@oracle.com>
	<CAHzJPEo5WMZqKwoLTQ4Q+3GVkf7aEoRstTakcKQzKeb=gVFPWg@mail.gmail.com>
	<51632644.2020502@oracle.com>
	<CAHzJPEquc3DBmnSx8RA_Ntc42-UUXbXz7+Q8dC3VSn_MJZRMfQ@mail.gmail.com>
Message-ID: <51632F3E.70505@cs.oswego.edu>

On 04/08/13 16:41, Joe Bowbeer wrote:

> I'm OK with this, but I wish groupingByConcurrent could go away.
>

These were the kinds of thoughts that led me last fall to suggest
that we just tell people to do it themselves as a little idiom:
   chm = ...
   c.parallelStream().forEach( chm.merge(x->keyFor(x), x, mergefn); }

The main downside is that this, the most commonly recommended
way of doing parallel groupBy, would not be in the family of
collect methods. Still maybe worth reconsidering though.

-Doug


From brian.goetz at oracle.com  Mon Apr  8 14:09:44 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 17:09:44 -0400
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <51632F3E.70505@cs.oswego.edu>
References: <516315C1.3080509@oracle.com>
	<CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
	<51631F98.8030304@oracle.com>
	<CAHzJPEo5WMZqKwoLTQ4Q+3GVkf7aEoRstTakcKQzKeb=gVFPWg@mail.gmail.com>
	<51632644.2020502@oracle.com>
	<CAHzJPEquc3DBmnSx8RA_Ntc42-UUXbXz7+Q8dC3VSn_MJZRMfQ@mail.gmail.com>
	<51632F3E.70505@cs.oswego.edu>
Message-ID: <51633218.6070906@oracle.com>

That option is always available regardless of what we do with Collectors.

Remember, where Collector really shines is not the simple things like 
this, but composite collections, like

   Map<Buyer, Map<Seller, Transaction>>
     biggestTransactionByBuyerSeller =
       stream.collect(groupingBy(Txn::buyer,
                                 groupingBy(Txn::seller,
                                            maxBy(comparing(Txn::amount))

The groupBy combinator lets you compose complex collections out of 
building blocks.  These would have to be manually inlined with the 
explicit parallel forEach version.


On 4/8/2013 4:57 PM, Doug Lea wrote:
> On 04/08/13 16:41, Joe Bowbeer wrote:
>
>> I'm OK with this, but I wish groupingByConcurrent could go away.
>>
>
> These were the kinds of thoughts that led me last fall to suggest
> that we just tell people to do it themselves as a little idiom:
>    chm = ...
>    c.parallelStream().forEach( chm.merge(x->keyFor(x), x, mergefn); }
>
> The main downside is that this, the most commonly recommended
> way of doing parallel groupBy, would not be in the family of
> collect methods. Still maybe worth reconsidering though.
>
> -Doug
>

From spullara at gmail.com  Mon Apr  8 14:40:32 2013
From: spullara at gmail.com (Sam Pullara)
Date: Mon, 8 Apr 2013 14:40:32 -0700
Subject: Whither FlatMapper?
In-Reply-To: <51632319.4040704@oracle.com>
References: <5161F773.6050705@oracle.com>
	<CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>
	<51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com>
Message-ID: <61E16080-A6C3-4B76-A42B-9D5E84A4D133@gmail.com>

I like this plan. I'd hate to lose the lower level API.

Sam

On Apr 8, 2013, at 1:05 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> A slight correction: if we remove the flatMap(FlatMapper), there is no fluent form that is as efficient as the removed form that accepts (T, Consumer<T>), since there's no other way to get your hands on the downstream Sink.  (Not that this dampens my enthusiasm for removing it much.)
> 
> For the truly diffident, a middle ground does exist: remove FlatMapper and its six brothers as a named SAM, and replace it with BiConsumer<T, Consumer<T>>, leaving both forms of flatMap methods in place:
>  flatMap(Function<T,STream<U>>)
>  flapMap(BiConsumer<T, Consumer<U>>)
> 
> The main advantage being that the package javadoc is not polluted by seven forms of FlatMapper.
> 
> On 4/8/2013 3:27 PM, Doug Lea wrote:
>> On 04/07/13 19:01, Sam Pullara wrote:
>>> I'm a big fan of the current FlatMapper stuff that takes a Consumer.
>>> Much more
>>> efficient and straightforward when you don't have a stream or
>>> collection to just
>>> return. Here is some code that uses 3 of them for good effect:
>> 
>> I think the main issue is whether, given the user reactions so far, we
>> should insist on people using a generally better but non-obvious
>> approach to flat-mapping. Considering that anyone *could* write their own
>> FlatMappers layered on top of existing functionality (we could
>> even show how to do it as a code example somewhere), I'm with
>> Brian on this: give people the obvious forms in the API. People
>> who are most likely to use it are the least likely to be obsessive
>> about its performance. And when they are, they can learn about
>> alternatives.
>> 
>> -Doug
>> 


From joe.bowbeer at gmail.com  Mon Apr  8 14:47:30 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 8 Apr 2013 14:47:30 -0700
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <51633218.6070906@oracle.com>
References: <516315C1.3080509@oracle.com>
	<CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
	<51631F98.8030304@oracle.com>
	<CAHzJPEo5WMZqKwoLTQ4Q+3GVkf7aEoRstTakcKQzKeb=gVFPWg@mail.gmail.com>
	<51632644.2020502@oracle.com>
	<CAHzJPEquc3DBmnSx8RA_Ntc42-UUXbXz7+Q8dC3VSn_MJZRMfQ@mail.gmail.com>
	<51632F3E.70505@cs.oswego.edu> <51633218.6070906@oracle.com>
Message-ID: <CAHzJPEp5YtRidAiLK6z2ndNbjnA6FHPixRM1xVx-DBiyi2GsAA@mail.gmail.com>

On Mon, Apr 8, 2013 at 2:09 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> That option is always available regardless of what we do with Collectors.
>
> Remember, where Collector really shines is not the simple things like
> this, but composite collections, like
>
>   Map<Buyer, Map<Seller, Transaction>>
>     biggestTransactionByBuyerSelle**r =
>       stream.collect(groupingBy(Txn:**:buyer,
>                                 groupingBy(Txn::seller,
>                                            maxBy(comparing(Txn::amount))
>
>
This is where groupingBy really shines :)

But how is someone supposed to decide if any or some or all of these
groupingBy's should really be groupingByConcurrent's?

If we eliminated groupingByConcurrent in favor of a more explicit form in
those cases, would that ruin the shine?

--Joe


> The groupBy combinator lets you compose complex collections out of
> building blocks.  These would have to be manually inlined with the explicit
> parallel forEach version.
>
>
>
> On 4/8/2013 4:57 PM, Doug Lea wrote:
>
>> On 04/08/13 16:41, Joe Bowbeer wrote:
>>
>>  I'm OK with this, but I wish groupingByConcurrent could go away.
>>>
>>>
>> These were the kinds of thoughts that led me last fall to suggest
>> that we just tell people to do it themselves as a little idiom:
>>    chm = ...
>>    c.parallelStream().forEach( chm.merge(x->keyFor(x), x, mergefn); }
>>
>> The main downside is that this, the most commonly recommended
>> way of doing parallel groupBy, would not be in the family of
>> collect methods. Still maybe worth reconsidering though.
>>
>> -Doug
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130408/8a5f88a3/attachment.html 

From brian.goetz at oracle.com  Mon Apr  8 14:54:33 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 17:54:33 -0400
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <CAHzJPEp5YtRidAiLK6z2ndNbjnA6FHPixRM1xVx-DBiyi2GsAA@mail.gmail.com>
References: <516315C1.3080509@oracle.com>
	<CAHzJPEqcZup9RBXjjW0HZRf6jSEh2zDerFXqyyD5W8M_KeE4DA@mail.gmail.com>
	<51631F98.8030304@oracle.com>
	<CAHzJPEo5WMZqKwoLTQ4Q+3GVkf7aEoRstTakcKQzKeb=gVFPWg@mail.gmail.com>
	<51632644.2020502@oracle.com>
	<CAHzJPEquc3DBmnSx8RA_Ntc42-UUXbXz7+Q8dC3VSn_MJZRMfQ@mail.gmail.com>
	<51632F3E.70505@cs.oswego.edu> <51633218.6070906@oracle.com>
	<CAHzJPEp5YtRidAiLK6z2ndNbjnA6FHPixRM1xVx-DBiyi2GsAA@mail.gmail.com>
Message-ID: <51633C99.3030100@oracle.com>


>     Remember, where Collector really shines is not the simple things
>     like this, but composite collections, like
>
>        Map<Buyer, Map<Seller, Transaction>>
>          biggestTransactionByBuyerSelle__r =
>            stream.collect(groupingBy(Txn:__:buyer,
>                                      groupingBy(Txn::seller,
>
>       maxBy(comparing(Txn::amount))
>
> This is where groupingBy really shines :)
>
> But how is someone supposed to decide if any or some or all of these
> groupingBy's should really be groupingByConcurrent's?

Basically, if they care more about performance than ordering.  But 
groupingByConcurrent can do all the same cool composed collections that 
groupingBy can do.

> If we eliminated groupingByConcurrent in favor of a more explicit form
> in those cases, would that ruin the shine?

I think that would be silly; instead of choosing between "fast" and 
"ordered", you would have to choose between "fast" and "ordered and 
powerful and flexible."  Given that they can have powerful and flexible 
if they're willing to give up ordered, why would we do that?


From forax at univ-mlv.fr  Mon Apr  8 15:09:18 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Tue, 09 Apr 2013 00:09:18 +0200
Subject: Whither FlatMapper?
In-Reply-To: <51632319.4040704@oracle.com>
References: <5161F773.6050705@oracle.com>
	<CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>
	<51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com>
Message-ID: <5163400E.70002@univ-mlv.fr>

On 04/08/2013 10:05 PM, Brian Goetz wrote:
> A slight correction: if we remove the flatMap(FlatMapper), there is no 
> fluent form that is as efficient as the removed form that accepts (T, 
> Consumer<T>), since there's no other way to get your hands on the 
> downstream Sink.  (Not that this dampens my enthusiasm for removing it 
> much.)
>
> For the truly diffident, a middle ground does exist: remove FlatMapper 
> and its six brothers as a named SAM, and replace it with BiConsumer<T, 
> Consumer<T>>, leaving both forms of flatMap methods in place:
>   flatMap(Function<T,STream<U>>)
>   flapMap(BiConsumer<T, Consumer<U>>)
>

me trying to understand ...
we don't have more forms due to the primitive specialization ?

> The main advantage being that the package javadoc is not polluted by 
> seven forms of FlatMapper.

R?mi

>
> On 4/8/2013 3:27 PM, Doug Lea wrote:
>> On 04/07/13 19:01, Sam Pullara wrote:
>>> I'm a big fan of the current FlatMapper stuff that takes a Consumer.
>>> Much more
>>> efficient and straightforward when you don't have a stream or
>>> collection to just
>>> return. Here is some code that uses 3 of them for good effect:
>>
>> I think the main issue is whether, given the user reactions so far, we
>> should insist on people using a generally better but non-obvious
>> approach to flat-mapping. Considering that anyone *could* write their 
>> own
>> FlatMappers layered on top of existing functionality (we could
>> even show how to do it as a code example somewhere), I'm with
>> Brian on this: give people the obvious forms in the API. People
>> who are most likely to use it are the least likely to be obsessive
>> about its performance. And when they are, they can learn about
>> alternatives.
>>
>> -Doug
>>


From brian.goetz at oracle.com  Mon Apr  8 15:33:52 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 18:33:52 -0400
Subject: Whither FlatMapper?
In-Reply-To: <5163400E.70002@univ-mlv.fr>
References: <5161F773.6050705@oracle.com>
	<CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>
	<51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com>
	<5163400E.70002@univ-mlv.fr>
Message-ID: <516345D0.3080103@oracle.com>

OK, let me be more explicit.

We currently have:

Stream:
     <R> Stream<R> flatMap(Function<? super T, ? extends Stream<? 
extends R>> mapper);
     <R> Stream<R> flatMap(FlatMapper<? super T, R> mapper);
     IntStream flatMapToInt(FlatMapper.ToInt<? super T> mapper);
     LongStream flatMapToLong(FlatMapper.ToLong<? super T> mapper);
     DoubleStream flatMapToDouble(FlatMapper.ToDouble<? super T> mapper);

Plus two forms in each of {Int,Long,Double}Stream:

     DoubleStream flatMap(DoubleFunction<? extends DoubleStream> mapper);
     DoubleStream flatMap(FlatMapper.OfDoubleToDouble mapper);

Plus seven variants of FlatMapper:
     FlatMapper
     FlatMapper.Of{Int,Long,Double}
     FlatMapper.OfXToX for X={Int,Long,Double}

The proposal was to:
  - Keep the first form under Stream
  - Keep the first form under each of {Int,Long,Double}Stream
  - Remove the other forms
  - Remove all FlatMapper SAM variants
  - Add back 3 new Obj-to-int specializations to Stream:
     <R> Stream<R> flatMapToXxx(Function<T, XxxStream> mapper);

Then *all* the flatMap forms would take some form of element -> Stream 
function.

The motivation is: no one can understand the (element, Consumer) 
versions of these, and, even when explained, most people can't 
understand why they would ever not use the T->Stream<U> form, and the 
(element, Consumer) forms generate a lot of API surface area (including 
7 classes in java.util.stream).

The downside is that the T->STream<U> form *is* intrinsically slower, 
though we've made pretty big progress lately on stream startup cost and 
anticipate making more.

The objection to the proposal, coming from a few advanced users, is: 
"but, now that I *finally* figured out how the (element, Consumer) 
versions work, I realize they're faster, so I don't want to give them 
up."  (Note that we can still always add them later.)

The fallback position is to keep the methods as is, but drop the 
FlatMapper name, and instead fall back to BiConsumer<Element, 
Consumer<Element>>.  Frankly, I think that makes the advanced forms even 
harder to understand.

I still like the original proposal.


On 4/8/2013 6:09 PM, Remi Forax wrote:
> On 04/08/2013 10:05 PM, Brian Goetz wrote:
>> A slight correction: if we remove the flatMap(FlatMapper), there is no
>> fluent form that is as efficient as the removed form that accepts (T,
>> Consumer<T>), since there's no other way to get your hands on the
>> downstream Sink.  (Not that this dampens my enthusiasm for removing it
>> much.)
>>
>> For the truly diffident, a middle ground does exist: remove FlatMapper
>> and its six brothers as a named SAM, and replace it with BiConsumer<T,
>> Consumer<T>>, leaving both forms of flatMap methods in place:
>>   flatMap(Function<T,STream<U>>)
>>   flapMap(BiConsumer<T, Consumer<U>>)
>>
>
> me trying to understand ...
> we don't have more forms due to the primitive specialization ?
>
>> The main advantage being that the package javadoc is not polluted by
>> seven forms of FlatMapper.
>
> R?mi
>
>>
>> On 4/8/2013 3:27 PM, Doug Lea wrote:
>>> On 04/07/13 19:01, Sam Pullara wrote:
>>>> I'm a big fan of the current FlatMapper stuff that takes a Consumer.
>>>> Much more
>>>> efficient and straightforward when you don't have a stream or
>>>> collection to just
>>>> return. Here is some code that uses 3 of them for good effect:
>>>
>>> I think the main issue is whether, given the user reactions so far, we
>>> should insist on people using a generally better but non-obvious
>>> approach to flat-mapping. Considering that anyone *could* write their
>>> own
>>> FlatMappers layered on top of existing functionality (we could
>>> even show how to do it as a code example somewhere), I'm with
>>> Brian on this: give people the obvious forms in the API. People
>>> who are most likely to use it are the least likely to be obsessive
>>> about its performance. And when they are, they can learn about
>>> alternatives.
>>>
>>> -Doug
>>>
>

From brian.goetz at oracle.com  Mon Apr  8 16:02:02 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 19:02:02 -0400
Subject: Whither FlatMapper?
In-Reply-To: <51632319.4040704@oracle.com>
References: <5161F773.6050705@oracle.com>
	<CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>
	<51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com>
Message-ID: <51634C6A.1080301@oracle.com>

Actually, there is an allocation-free path to get almost the 
Consumer-version performance with the non-consumer version, using the 
proposed StreamBuilder type (that also implements Spliterator and 
Stream, so "building" is allocation-free), and stuffing that into a 
ThreadLocal:

   ThreadLocal<StreamBuilder> tl = ...

   ...

   stream.flatMap(e -> {
       StreamBuilder sb = tl.get();
       sb.init();
       // stuff elements into sb
       return sb.build();  // basically a no-op
   });

So I recant my earlier statement that there's no efficient way to 
simulate the consumer form.  Its just ugly.

And the above can be captured by a wrapping helper:

   Function<T, Stream<U>> = wrapWithThreadLocalStreamBuilder(
      (T t, Consumer<U> target) -> { /* old way */ });

So, I'm even more firmly in the "remove it" camp.

On 4/8/2013 4:05 PM, Brian Goetz wrote:
> A slight correction: if we remove the flatMap(FlatMapper), there is no
> fluent form that is as efficient as the removed form that accepts (T,
> Consumer<T>), since there's no other way to get your hands on the
> downstream Sink.  (Not that this dampens my enthusiasm for removing it
> much.)
>
> For the truly diffident, a middle ground does exist: remove FlatMapper
> and its six brothers as a named SAM, and replace it with BiConsumer<T,
> Consumer<T>>, leaving both forms of flatMap methods in place:
>    flatMap(Function<T,STream<U>>)
>    flapMap(BiConsumer<T, Consumer<U>>)
>
> The main advantage being that the package javadoc is not polluted by
> seven forms of FlatMapper.
>
> On 4/8/2013 3:27 PM, Doug Lea wrote:
>> On 04/07/13 19:01, Sam Pullara wrote:
>>> I'm a big fan of the current FlatMapper stuff that takes a Consumer.
>>> Much more
>>> efficient and straightforward when you don't have a stream or
>>> collection to just
>>> return. Here is some code that uses 3 of them for good effect:
>>
>> I think the main issue is whether, given the user reactions so far, we
>> should insist on people using a generally better but non-obvious
>> approach to flat-mapping. Considering that anyone *could* write their own
>> FlatMappers layered on top of existing functionality (we could
>> even show how to do it as a code example somewhere), I'm with
>> Brian on this: give people the obvious forms in the API. People
>> who are most likely to use it are the least likely to be obsessive
>> about its performance. And when they are, they can learn about
>> alternatives.
>>
>> -Doug
>>

From spullara at gmail.com  Mon Apr  8 16:14:34 2013
From: spullara at gmail.com (Sam Pullara)
Date: Mon, 8 Apr 2013 16:14:34 -0700
Subject: Whither FlatMapper?
In-Reply-To: <51634C6A.1080301@oracle.com>
References: <5161F773.6050705@oracle.com>
	<CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>
	<51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com>
	<51634C6A.1080301@oracle.com>
Message-ID: <CB902830-3601-473E-BBDC-4C3088A7121F@gmail.com>

That seems reasonable to me.

Sam

On Apr 8, 2013, at 4:02 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Actually, there is an allocation-free path to get almost the Consumer-version performance with the non-consumer version, using the proposed StreamBuilder type (that also implements Spliterator and Stream, so "building" is allocation-free), and stuffing that into a ThreadLocal:
> 
>  ThreadLocal<StreamBuilder> tl = ...
> 
>  ...
> 
>  stream.flatMap(e -> {
>      StreamBuilder sb = tl.get();
>      sb.init();
>      // stuff elements into sb
>      return sb.build();  // basically a no-op
>  });
> 
> So I recant my earlier statement that there's no efficient way to simulate the consumer form.  Its just ugly.
> 
> And the above can be captured by a wrapping helper:
> 
>  Function<T, Stream<U>> = wrapWithThreadLocalStreamBuilder(
>     (T t, Consumer<U> target) -> { /* old way */ });
> 
> So, I'm even more firmly in the "remove it" camp.
> 
> On 4/8/2013 4:05 PM, Brian Goetz wrote:
>> A slight correction: if we remove the flatMap(FlatMapper), there is no
>> fluent form that is as efficient as the removed form that accepts (T,
>> Consumer<T>), since there's no other way to get your hands on the
>> downstream Sink.  (Not that this dampens my enthusiasm for removing it
>> much.)
>> 
>> For the truly diffident, a middle ground does exist: remove FlatMapper
>> and its six brothers as a named SAM, and replace it with BiConsumer<T,
>> Consumer<T>>, leaving both forms of flatMap methods in place:
>>   flatMap(Function<T,STream<U>>)
>>   flapMap(BiConsumer<T, Consumer<U>>)
>> 
>> The main advantage being that the package javadoc is not polluted by
>> seven forms of FlatMapper.
>> 
>> On 4/8/2013 3:27 PM, Doug Lea wrote:
>>> On 04/07/13 19:01, Sam Pullara wrote:
>>>> I'm a big fan of the current FlatMapper stuff that takes a Consumer.
>>>> Much more
>>>> efficient and straightforward when you don't have a stream or
>>>> collection to just
>>>> return. Here is some code that uses 3 of them for good effect:
>>> 
>>> I think the main issue is whether, given the user reactions so far, we
>>> should insist on people using a generally better but non-obvious
>>> approach to flat-mapping. Considering that anyone *could* write their own
>>> FlatMappers layered on top of existing functionality (we could
>>> even show how to do it as a code example somewhere), I'm with
>>> Brian on this: give people the obvious forms in the API. People
>>> who are most likely to use it are the least likely to be obsessive
>>> about its performance. And when they are, they can learn about
>>> alternatives.
>>> 
>>> -Doug
>>> 


From brian.goetz at oracle.com  Mon Apr  8 17:08:02 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 20:08:02 -0400
Subject: Convenience Collector forms
Message-ID: <51635BE2.1010909@oracle.com>

One of the feedback items from the recent London Lambda Hack Day was 
"more convenience forms for Collectors please!".  One suggested was 
"count()" (related are min/max/sum).  Another is a dedicated form for 
frequency counting.

The idea is that:
  - They are easier to read than their obvious reduce expansion; 
everyone understands count(), even if they don't understand reduce (this 
was an argument in favor of sum() and friends on IntStream).
  - They provide more on-ramp for understanding reduction and 
composition of reduction; the Javadoc for count() can explain itself in 
terms of reduction, and simple examples like this help connect the dots 
better.
  - They are more discoverable that some of the idioms they expand to 
(once someone discovers Collectors.)

The implementations are of course trivial.

So, on the block are:

  - Collector<T,Long> counting()
  - Collector<T,T>    minBy(Comparator)
  - Collector<T,T>    maxBy(Comparator)
  - Collector<T,Long> sumBy(Function<T, Long>)
  - Collector<T,Map<T,Long>> countingFrequency()
  - Collector<T,Map<K,Long>> countingFrequency(T -> K classifier)

Q: Other Collector names are all of the form either toXxx or xxxing, 
which read relatively english-like:

   collect(groupingBy(f))
   collect(toList())

The minBy, maxBy, and sumBy don't follow this form, though still don't 
read terribly.  Sum can easily be "summingBy" but "minningBy" sucks.  Is 
this naming OK?

Q: Do we need separate long and int versions for sumBy()?


From brian.goetz at oracle.com  Mon Apr  8 17:10:34 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 08 Apr 2013 20:10:34 -0400
Subject: Additional Collectors
In-Reply-To: <515C6684.8020007@oracle.com>
References: <515C6684.8020007@oracle.com>
Message-ID: <51635C7A.9000805@oracle.com>

And, still need to close on this one:

> People also expressed concern that the "toMap()" (nee mappedTo,
> joiningWith) is not flexible enough.  As a reminder, what toMap does is
> take a Stream<T> and a function T->U and produces a Map<T,U>.  Some
> people call this "backwards"; they would rather have something that
> takes a Stream<T> and function T->K and produces a Map<K,T>.  And others
> would rather have something that takes two functions T->K and T->U and
> produces a Map<K,U>.
>
> All of these are useful enough.  The question is how to fit them into
> the API.  I think the name "toMap" is a bit of a challenge, since there
> are several "modes" and not all of them can be easily handled by
> overloads.  Maybe:
>
>    toMap(T->U) // first version
>    toMap(T->K, T->U) // third version
>
> and leave the second version out, since the third version can easily
> simulate the second?

From joe.bowbeer at gmail.com  Mon Apr  8 19:32:37 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Mon, 8 Apr 2013 19:32:37 -0700
Subject: Convenience Collector forms
In-Reply-To: <51635BE2.1010909@oracle.com>
References: <51635BE2.1010909@oracle.com>
Message-ID: <CAHzJPErt4aoDxZ=TdfebWZyvuBAAU3pG0SQpcx+GN1Tkbgx8ow@mail.gmail.com>

Q: Why is the method on the block called counting() instead of the proposed
count()?

Except for possibly count(), I'm not liking any of these, because:

1. There is already enough exposed "reduce" surface area in max/min/sum.

2. map/reduce is where it's at.  It's easier for me to read code that uses
those familiar forms than it is to familiarize myself with a bunch of new
convenience methods.

I don't think these new forms are going to make Collectors easier to learn,
or collectors code easier to read (except at a very superficial level).


On Mon, Apr 8, 2013 at 5:08 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> One of the feedback items from the recent London Lambda Hack Day was "more
> convenience forms for Collectors please!".  One suggested was "count()"
> (related are min/max/sum).  Another is a dedicated form for frequency
> counting.
>
> The idea is that:
>  - They are easier to read than their obvious reduce expansion; everyone
> understands count(), even if they don't understand reduce (this was an
> argument in favor of sum() and friends on IntStream).
>  - They provide more on-ramp for understanding reduction and composition
> of reduction; the Javadoc for count() can explain itself in terms of
> reduction, and simple examples like this help connect the dots better.
>  - They are more discoverable that some of the idioms they expand to (once
> someone discovers Collectors.)
>
> The implementations are of course trivial.
>
> So, on the block are:
>
>  - Collector<T,Long> counting()
>  - Collector<T,T>    minBy(Comparator)
>  - Collector<T,T>    maxBy(Comparator)
>  - Collector<T,Long> sumBy(Function<T, Long>)
>  - Collector<T,Map<T,Long>> countingFrequency()
>  - Collector<T,Map<K,Long>> countingFrequency(T -> K classifier)
>
> Q: Other Collector names are all of the form either toXxx or xxxing, which
> read relatively english-like:
>
>   collect(groupingBy(f))
>   collect(toList())
>
> The minBy, maxBy, and sumBy don't follow this form, though still don't
> read terribly.  Sum can easily be "summingBy" but "minningBy" sucks.  Is
> this naming OK?
>
> Q: Do we need separate long and int versions for sumBy()?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130408/ff7fa1e2/attachment.html 

From tim at peierls.net  Tue Apr  9 06:27:06 2013
From: tim at peierls.net (Tim Peierls)
Date: Tue, 9 Apr 2013 09:27:06 -0400
Subject: Convenience Collector forms
In-Reply-To: <CAHzJPErt4aoDxZ=TdfebWZyvuBAAU3pG0SQpcx+GN1Tkbgx8ow@mail.gmail.com>
References: <51635BE2.1010909@oracle.com>
	<CAHzJPErt4aoDxZ=TdfebWZyvuBAAU3pG0SQpcx+GN1Tkbgx8ow@mail.gmail.com>
Message-ID: <CA+F8eeThXuQ1C3NoO4fVH9ndyaSyoK6FoQ_34AskbXV8CEM8sg@mail.gmail.com>

On Mon, Apr 8, 2013 at 10:32 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> Q: Why is the method on the block called counting() instead of the
> proposed count()?
>

I like the adverbial form because it reads more like English.


Except for possibly count(), I'm not liking any of these, because:
>
> 1. There is already enough exposed "reduce" surface area in max/min/sum.
>
> 2. map/reduce is where it's at.  It's easier for me to read code that uses
> those familiar forms than it is to familiarize myself with a bunch of new
> convenience methods.
>
> I don't think these new forms are going to make Collectors easier to
> learn, or collectors code easier to read (except at a very superficial
> level).
>

I think there are many folks for whom these convenience Collectors will
make the difference between ignoring and using streams. As long as they're
bundled as static factory methods in a Collectors class, I don't see the
problem.

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130409/7abfe8ac/attachment.html 

From brian.goetz at oracle.com  Tue Apr  9 12:54:44 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 09 Apr 2013 15:54:44 -0400
Subject: Survey results
Message-ID: <51647204.4010108@oracle.com>

Closing three surveys, responses are here:

FlatMapper: 
https://www.surveymonkey.com/sr.aspx?sm=eqAnAfK4z0IjPKVllUu24NW38AIeF5NiPcBxcrdMTVc_3d

Resolution: FlatMapper removed as per discussion.

Collector: 
https://www.surveymonkey.com/sr.aspx?sm=eqAnAfK4z0IjPKVllUu24C2CNuL68Gm6quYPmGqoZ9A_3d

Resolution: spec adjusted as per comments -- additional spec work still 
needed.

Stream: 
https://www.surveymonkey.com/sr.aspx?sm=QyMHR9lw9a4qhahv_2bP4ePapqLUvfdbFSi_2fYBYkt2zgA_3d


From brian.goetz at oracle.com  Tue Apr  9 13:22:29 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 09 Apr 2013 16:22:29 -0400
Subject: Survey results
In-Reply-To: <51647204.4010108@oracle.com>
References: <51647204.4010108@oracle.com>
Message-ID: <51647885.6020604@oracle.com>

Updated Javadoc at:

   http://cr.openjdk.java.net/~briangoetz/JDK-8008682/api/java/util/stream/

On 4/9/2013 3:54 PM, Brian Goetz wrote:
> Closing three surveys, responses are here:
>
> FlatMapper:
> https://www.surveymonkey.com/sr.aspx?sm=eqAnAfK4z0IjPKVllUu24NW38AIeF5NiPcBxcrdMTVc_3d
>
>
> Resolution: FlatMapper removed as per discussion.
>
> Collector:
> https://www.surveymonkey.com/sr.aspx?sm=eqAnAfK4z0IjPKVllUu24C2CNuL68Gm6quYPmGqoZ9A_3d
>
>
> Resolution: spec adjusted as per comments -- additional spec work still
> needed.
>
> Stream:
> https://www.surveymonkey.com/sr.aspx?sm=QyMHR9lw9a4qhahv_2bP4ePapqLUvfdbFSi_2fYBYkt2zgA_3d
>
>

From brian.goetz at oracle.com  Tue Apr  9 13:25:28 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 09 Apr 2013 16:25:28 -0400
Subject: Survey: API review for static factory methods
Message-ID: <51647938.5080708@oracle.com>

I've posted a survey for the static factory methods in Streams at:
   https://www.surveymonkey.com/s/5WZ7NJL

We are also planning to add singletonStream() factories.

Usual password.


From brian.goetz at oracle.com  Tue Apr  9 14:16:58 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 09 Apr 2013 17:16:58 -0400
Subject: Possible groupingBy simplification?
Message-ID: <5164854A.3000006@oracle.com>

Paul suggested the following possible simplification for groupingBy.  It 
is somewhat counterintuitive at first glance, in that it removes the 
most commonly used form (!), but might make things easier to grasp in 
the long run (aided by good docs.)

Recall we currently have four forms of groupingBy:

     // classifier only -- maps keys to list of matching elements
     Collector<T, Map<K, List<T>>>
     groupingBy(Function<? super T, ? extends K> classifier)

     // Like above, but with explicit map ctor
     <T, K, M extends Map<K, List<T>>>
     Collector<T, M>
     groupingBy(Function<? super T, ? extends K> classifier,
                Supplier<M> mapFactory)

     // basic cascaded form
     Collector<T, Map<K, D>>
     groupingBy(Function<? super T, ? extends K> classifier,
                Collector<T, D> downstream)

     // cascaded form with explicit ctor
     <T, K, D, M extends Map<K, D>>
     Collector<T, M>
     groupingBy(Function<? super T, ? extends K> classifier,
                Supplier<M> mapFactory,
                Collector<T, D> downstream)

Plus four corresponding forms for groupingByConcurrent.

The first form is likely to be the most common, as it is the traditional 
"group by".  It is equivalent to:

   groupingBy(classifier, toList());

The proposal is: Drop the first two forms.  Just as users can learn that 
to collect elements into a list, you do:

   collect(toList())

people can learn that to do the simple form of groupBy, you can do:

   collect(groupingBy(f, toList());

Which also reads perfectly well.

By cutting the number of forms in half, it helps users to realize that 
groupingBy does just one thing -- classifies elements by key, and 
collects elements associated with that key.  Obviously the docs for 
groupingBy can show examples of the simple grouping as well as more 
sophisticated groupings.


From brian.goetz at oracle.com  Tue Apr  9 14:29:21 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 09 Apr 2013 17:29:21 -0400
Subject: toMap options
Message-ID: <51648831.4060301@oracle.com>

Currently we have:

     Collector<T, Map<T,U>>
     toMap(Function<? super T, ? extends U> mapper)

and

     <T, U, M extends Map<T, U>>
     Collector<T, M>
     toMap(Function<? super T, ? extends U> mapper,
           Supplier<M> mapSupplier,
           BinaryOperator<U> mergeFunction)

(plus concurrent versions of both of these.)  The former is just sugar for:

   toMap(mapper, HashMap::new, throwingMerger())

(We have predefined merge functions for throw-on-duplicates, first-wins, 
and last-wins, called throwingMerger, firstWinsMerger, and lastWinsMerger.)

As has been noted, we do not yet serve the use case of creating a map 
where the stream elements are the values of the map instead of the keys 
of the map.  Options for addressing this are:

1.  Leave toMap as is, add toIndexedMap (or toKeyedMap) variants.

2.  Leave toMap as is, add a two-function version of toMap:

     <T,K,U>
     Collector<T, Map<K,U>>
     toMap(Function<T, K> keyMapper,
           Function<T, U> valueMapper)

in which case the regular toMap becomes sugar for

     toMap(Function.identity(), mapper)

3.  Get rid of the current form of toMap, and just have the two-function 
form as in (2).

4.  Break free of the toMap naming (recall that until recently this was 
called mappedTo, and prior to that, joiningWith), and have two versions: 
mappedTo and mappedFrom.  This is explicit, but also doesn't address the 
use case where both key and value are functions of the stream elements.

Others?


From joe.bowbeer at gmail.com  Tue Apr  9 14:56:49 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Tue, 9 Apr 2013 14:56:49 -0700
Subject: Possible groupingBy simplification?
In-Reply-To: <5164854A.3000006@oracle.com>
References: <5164854A.3000006@oracle.com>
Message-ID: <CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>

I like the most popular form.  In fact, I think it's the only one that I've
used.

The argument that users will gain by removing their most common form seems
kind of far-fetched.

In my experience, I do a ctrl-space and look for my target return type on
the right-hand-side of the IDE popup, and then I try to fill in the missing
information, such as parameters.  In this case, having to provide toList()
would probably be a stumbling block for me, as the IDE is not as good when
it comes to suggesting expressions for parameters.

I sort of like the symmetry with collect(toList()) but not enough to make
up for the loss.


On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Paul suggested the following possible simplification for groupingBy.  It
> is somewhat counterintuitive at first glance, in that it removes the most
> commonly used form (!), but might make things easier to grasp in the long
> run (aided by good docs.)
>
> Recall we currently have four forms of groupingBy:
>
>     // classifier only -- maps keys to list of matching elements
>     Collector<T, Map<K, List<T>>>
>     groupingBy(Function<? super T, ? extends K> classifier)
>
>     // Like above, but with explicit map ctor
>     <T, K, M extends Map<K, List<T>>>
>     Collector<T, M>
>     groupingBy(Function<? super T, ? extends K> classifier,
>                Supplier<M> mapFactory)
>
>     // basic cascaded form
>     Collector<T, Map<K, D>>
>     groupingBy(Function<? super T, ? extends K> classifier,
>                Collector<T, D> downstream)
>
>     // cascaded form with explicit ctor
>     <T, K, D, M extends Map<K, D>>
>     Collector<T, M>
>     groupingBy(Function<? super T, ? extends K> classifier,
>                Supplier<M> mapFactory,
>                Collector<T, D> downstream)
>
> Plus four corresponding forms for groupingByConcurrent.
>
> The first form is likely to be the most common, as it is the traditional
> "group by".  It is equivalent to:
>
>   groupingBy(classifier, toList());
>
> The proposal is: Drop the first two forms.  Just as users can learn that
> to collect elements into a list, you do:
>
>   collect(toList())
>
> people can learn that to do the simple form of groupBy, you can do:
>
>   collect(groupingBy(f, toList());
>
> Which also reads perfectly well.
>
> By cutting the number of forms in half, it helps users to realize that
> groupingBy does just one thing -- classifies elements by key, and collects
> elements associated with that key.  Obviously the docs for groupingBy can
> show examples of the simple grouping as well as more sophisticated
> groupings.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130409/6212e15b/attachment.html 

From joe.bowbeer at gmail.com  Tue Apr  9 15:03:42 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Tue, 9 Apr 2013 15:03:42 -0700
Subject: Convenience Collector forms
In-Reply-To: <CA+F8eeThXuQ1C3NoO4fVH9ndyaSyoK6FoQ_34AskbXV8CEM8sg@mail.gmail.com>
References: <51635BE2.1010909@oracle.com>
	<CAHzJPErt4aoDxZ=TdfebWZyvuBAAU3pG0SQpcx+GN1Tkbgx8ow@mail.gmail.com>
	<CA+F8eeThXuQ1C3NoO4fVH9ndyaSyoK6FoQ_34AskbXV8CEM8sg@mail.gmail.com>
Message-ID: <CAHzJPEoi80ZzcskZrh2q6H6YC52=ZW_NaB_sCNBHNbuVhDYNcQ@mail.gmail.com>

I didn't understand that these were proposed for the Collectors tools
class.  I don't see a problem with that either.


On Tue, Apr 9, 2013 at 6:27 AM, Tim Peierls <tim at peierls.net> wrote:

> On Mon, Apr 8, 2013 at 10:32 PM, Joe Bowbeer <joe.bowbeer at gmail.com>wrote:
>
>> Q: Why is the method on the block called counting() instead of the
>> proposed count()?
>>
>
> I like the adverbial form because it reads more like English.
>
>
> Except for possibly count(), I'm not liking any of these, because:
>>
>> 1. There is already enough exposed "reduce" surface area in max/min/sum.
>>
>> 2. map/reduce is where it's at.  It's easier for me to read code that
>> uses those familiar forms than it is to familiarize myself with a bunch of
>> new convenience methods.
>>
>> I don't think these new forms are going to make Collectors easier to
>> learn, or collectors code easier to read (except at a very superficial
>> level).
>>
>
> I think there are many folks for whom these convenience Collectors will
> make the difference between ignoring and using streams. As long as they're
> bundled as static factory methods in a Collectors class, I don't see the
> problem.
>
> --tim
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130409/4eda6516/attachment-0001.html 

From joe.bowbeer at gmail.com  Tue Apr  9 15:34:45 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Tue, 9 Apr 2013 15:34:45 -0700
Subject: Possible groupingBy simplification?
In-Reply-To: <CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
References: <5164854A.3000006@oracle.com>
	<CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
Message-ID: <CAHzJPEqze62k63BDv5rjoBpQYH5rP8THpPVqVmZU0j4Ngnw7sw@mail.gmail.com>

On a positive note, the shining example would be unchanged by this proposal:

  Map<Buyer, Map<Seller, Transaction>>
    biggestTransactionByBuyerSelle**r =
      stream.collect(groupingBy(Txn:**:buyer,
                                groupingBy(Txn::seller,
                                           maxBy(comparing(Txn::amount))

I suggest leading users to the general form by illustrating the equivalence
in the groupingBy(f) documentation.


On Tue, Apr 9, 2013 at 2:56 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> I like the most popular form.  In fact, I think it's the only one that
> I've used.
>
> The argument that users will gain by removing their most common form seems
> kind of far-fetched.
>
> In my experience, I do a ctrl-space and look for my target return type on
> the right-hand-side of the IDE popup, and then I try to fill in the missing
> information, such as parameters.  In this case, having to provide toList()
> would probably be a stumbling block for me, as the IDE is not as good when
> it comes to suggesting expressions for parameters.
>
> I sort of like the symmetry with collect(toList()) but not enough to make
> up for the loss.
>
>
>
> On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>> Paul suggested the following possible simplification for groupingBy.  It
>> is somewhat counterintuitive at first glance, in that it removes the most
>> commonly used form (!), but might make things easier to grasp in the long
>> run (aided by good docs.)
>>
>> Recall we currently have four forms of groupingBy:
>>
>>     // classifier only -- maps keys to list of matching elements
>>     Collector<T, Map<K, List<T>>>
>>     groupingBy(Function<? super T, ? extends K> classifier)
>>
>>     // Like above, but with explicit map ctor
>>     <T, K, M extends Map<K, List<T>>>
>>     Collector<T, M>
>>     groupingBy(Function<? super T, ? extends K> classifier,
>>                Supplier<M> mapFactory)
>>
>>     // basic cascaded form
>>     Collector<T, Map<K, D>>
>>     groupingBy(Function<? super T, ? extends K> classifier,
>>                Collector<T, D> downstream)
>>
>>     // cascaded form with explicit ctor
>>     <T, K, D, M extends Map<K, D>>
>>     Collector<T, M>
>>     groupingBy(Function<? super T, ? extends K> classifier,
>>                Supplier<M> mapFactory,
>>                Collector<T, D> downstream)
>>
>> Plus four corresponding forms for groupingByConcurrent.
>>
>> The first form is likely to be the most common, as it is the traditional
>> "group by".  It is equivalent to:
>>
>>   groupingBy(classifier, toList());
>>
>> The proposal is: Drop the first two forms.  Just as users can learn that
>> to collect elements into a list, you do:
>>
>>   collect(toList())
>>
>> people can learn that to do the simple form of groupBy, you can do:
>>
>>   collect(groupingBy(f, toList());
>>
>> Which also reads perfectly well.
>>
>> By cutting the number of forms in half, it helps users to realize that
>> groupingBy does just one thing -- classifies elements by key, and collects
>> elements associated with that key.  Obviously the docs for groupingBy can
>> show examples of the simple grouping as well as more sophisticated
>> groupings.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130409/c567c6ec/attachment.html 

From Donald.Raab at gs.com  Tue Apr  9 15:56:47 2013
From: Donald.Raab at gs.com (Raab, Donald)
Date: Tue, 9 Apr 2013 18:56:47 -0400
Subject: toMap options
In-Reply-To: <51648831.4060301@oracle.com>
References: <51648831.4060301@oracle.com>
Message-ID: <6712820CB52CFB4D842561213A77C05404C97093FF@GSCMAMP09EX.firmwide.corp.gs.com>

3 sounds good to me.  This is the only form we've supported over the years.  I don't recall anyone complaining about the lack of more sugar here.

http://www.goldmansachs.com/gs-collections/javadoc/3.0.0/com/gs/collections/api/RichIterable.html


> 1.  Leave toMap as is, add toIndexedMap (or toKeyedMap) variants.
> 
> 2.  Leave toMap as is, add a two-function version of toMap:
> 
>      <T,K,U>
>      Collector<T, Map<K,U>>
>      toMap(Function<T, K> keyMapper,
>            Function<T, U> valueMapper)
> 
> in which case the regular toMap becomes sugar for
> 
>      toMap(Function.identity(), mapper)
> 
> 3.  Get rid of the current form of toMap, and just have the two-
> function form as in (2).
> 
> 4.  Break free of the toMap naming (recall that until recently this was
> called mappedTo, and prior to that, joiningWith), and have two
> versions:
> mappedTo and mappedFrom.  This is explicit, but also doesn't address
> the use case where both key and value are functions of the stream
> elements.
> 
> Others?


From spullara at gmail.com  Tue Apr  9 16:28:29 2013
From: spullara at gmail.com (Sam Pullara)
Date: Tue, 9 Apr 2013 16:28:29 -0700
Subject: toMap options
In-Reply-To: <51648831.4060301@oracle.com>
References: <51648831.4060301@oracle.com>
Message-ID: <EEA7B286-CD23-4965-A465-68CE5A27690B@gmail.com>

I like version 3 as well. 

Sam

On Apr 9, 2013, at 2:29 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Currently we have:
> 
>    Collector<T, Map<T,U>>
>    toMap(Function<? super T, ? extends U> mapper)
> 
> and
> 
>    <T, U, M extends Map<T, U>>
>    Collector<T, M>
>    toMap(Function<? super T, ? extends U> mapper,
>          Supplier<M> mapSupplier,
>          BinaryOperator<U> mergeFunction)
> 
> (plus concurrent versions of both of these.)  The former is just sugar for:
> 
>  toMap(mapper, HashMap::new, throwingMerger())
> 
> (We have predefined merge functions for throw-on-duplicates, first-wins, and last-wins, called throwingMerger, firstWinsMerger, and lastWinsMerger.)
> 
> As has been noted, we do not yet serve the use case of creating a map where the stream elements are the values of the map instead of the keys of the map.  Options for addressing this are:
> 
> 1.  Leave toMap as is, add toIndexedMap (or toKeyedMap) variants.
> 
> 2.  Leave toMap as is, add a two-function version of toMap:
> 
>    <T,K,U>
>    Collector<T, Map<K,U>>
>    toMap(Function<T, K> keyMapper,
>          Function<T, U> valueMapper)
> 
> in which case the regular toMap becomes sugar for
> 
>    toMap(Function.identity(), mapper)
> 
> 3.  Get rid of the current form of toMap, and just have the two-function form as in (2).
> 
> 4.  Break free of the toMap naming (recall that until recently this was called mappedTo, and prior to that, joiningWith), and have two versions: mappedTo and mappedFrom.  This is explicit, but also doesn't address the use case where both key and value are functions of the stream elements.
> 
> Others?
> 


From brian.goetz at oracle.com  Tue Apr  9 16:33:45 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 09 Apr 2013 19:33:45 -0400
Subject: toMap options
In-Reply-To: <EEA7B286-CD23-4965-A465-68CE5A27690B@gmail.com>
References: <51648831.4060301@oracle.com>
	<EEA7B286-CD23-4965-A465-68CE5A27690B@gmail.com>
Message-ID: <5164A559.1070600@oracle.com>

I'm good with #3.  Any objections?

On 4/9/2013 7:28 PM, Sam Pullara wrote:
> I like version 3 as well.
>
> Sam
>
> On Apr 9, 2013, at 2:29 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
>> Currently we have:
>>
>>     Collector<T, Map<T,U>>
>>     toMap(Function<? super T, ? extends U> mapper)
>>
>> and
>>
>>     <T, U, M extends Map<T, U>>
>>     Collector<T, M>
>>     toMap(Function<? super T, ? extends U> mapper,
>>           Supplier<M> mapSupplier,
>>           BinaryOperator<U> mergeFunction)
>>
>> (plus concurrent versions of both of these.)  The former is just sugar for:
>>
>>   toMap(mapper, HashMap::new, throwingMerger())
>>
>> (We have predefined merge functions for throw-on-duplicates, first-wins, and last-wins, called throwingMerger, firstWinsMerger, and lastWinsMerger.)
>>
>> As has been noted, we do not yet serve the use case of creating a map where the stream elements are the values of the map instead of the keys of the map.  Options for addressing this are:
>>
>> 1.  Leave toMap as is, add toIndexedMap (or toKeyedMap) variants.
>>
>> 2.  Leave toMap as is, add a two-function version of toMap:
>>
>>     <T,K,U>
>>     Collector<T, Map<K,U>>
>>     toMap(Function<T, K> keyMapper,
>>           Function<T, U> valueMapper)
>>
>> in which case the regular toMap becomes sugar for
>>
>>     toMap(Function.identity(), mapper)
>>
>> 3.  Get rid of the current form of toMap, and just have the two-function form as in (2).
>>
>> 4.  Break free of the toMap naming (recall that until recently this was called mappedTo, and prior to that, joiningWith), and have two versions: mappedTo and mappedFrom.  This is explicit, but also doesn't address the use case where both key and value are functions of the stream elements.
>>
>> Others?
>>
>

From tim at peierls.net  Tue Apr  9 16:48:50 2013
From: tim at peierls.net (Tim Peierls)
Date: Tue, 9 Apr 2013 19:48:50 -0400
Subject: toMap options
In-Reply-To: <5164A559.1070600@oracle.com>
References: <51648831.4060301@oracle.com>
	<EEA7B286-CD23-4965-A465-68CE5A27690B@gmail.com>
	<5164A559.1070600@oracle.com>
Message-ID: <CA+F8eeQrD5qbtMtcmOpa6FGPrwTJiMEjGqnh+ChJpA5ZdjQZ3Q@mail.gmail.com>

No objection, but now it makes me wonder: How do you get the effect of
toMultimap(T->K, T->V)? In other words, how would you get a Map<K,
Collection<V>> from a Stream<T> given T->K and T->V mappings?

--tim

On Tue, Apr 9, 2013 at 7:33 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> I'm good with #3.  Any objections?
>
>
> On 4/9/2013 7:28 PM, Sam Pullara wrote:
>
>> I like version 3 as well.
>>
>> Sam
>>
>> On Apr 9, 2013, at 2:29 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>>
>>  Currently we have:
>>>
>>>     Collector<T, Map<T,U>>
>>>     toMap(Function<? super T, ? extends U> mapper)
>>>
>>> and
>>>
>>>     <T, U, M extends Map<T, U>>
>>>     Collector<T, M>
>>>     toMap(Function<? super T, ? extends U> mapper,
>>>           Supplier<M> mapSupplier,
>>>           BinaryOperator<U> mergeFunction)
>>>
>>> (plus concurrent versions of both of these.)  The former is just sugar
>>> for:
>>>
>>>   toMap(mapper, HashMap::new, throwingMerger())
>>>
>>> (We have predefined merge functions for throw-on-duplicates, first-wins,
>>> and last-wins, called throwingMerger, firstWinsMerger, and lastWinsMerger.)
>>>
>>> As has been noted, we do not yet serve the use case of creating a map
>>> where the stream elements are the values of the map instead of the keys of
>>> the map.  Options for addressing this are:
>>>
>>> 1.  Leave toMap as is, add toIndexedMap (or toKeyedMap) variants.
>>>
>>> 2.  Leave toMap as is, add a two-function version of toMap:
>>>
>>>     <T,K,U>
>>>     Collector<T, Map<K,U>>
>>>     toMap(Function<T, K> keyMapper,
>>>           Function<T, U> valueMapper)
>>>
>>> in which case the regular toMap becomes sugar for
>>>
>>>     toMap(Function.identity(), mapper)
>>>
>>> 3.  Get rid of the current form of toMap, and just have the two-function
>>> form as in (2).
>>>
>>> 4.  Break free of the toMap naming (recall that until recently this was
>>> called mappedTo, and prior to that, joiningWith), and have two versions:
>>> mappedTo and mappedFrom.  This is explicit, but also doesn't address the
>>> use case where both key and value are functions of the stream elements.
>>>
>>> Others?
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130409/ef58bf78/attachment-0001.html 

From brian.goetz at oracle.com  Tue Apr  9 16:51:15 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 09 Apr 2013 19:51:15 -0400
Subject: toMap options
In-Reply-To: <CA+F8eeQrD5qbtMtcmOpa6FGPrwTJiMEjGqnh+ChJpA5ZdjQZ3Q@mail.gmail.com>
References: <51648831.4060301@oracle.com>
	<EEA7B286-CD23-4965-A465-68CE5A27690B@gmail.com>
	<5164A559.1070600@oracle.com>
	<CA+F8eeQrD5qbtMtcmOpa6FGPrwTJiMEjGqnh+ChJpA5ZdjQZ3Q@mail.gmail.com>
Message-ID: <5164A973.10402@oracle.com>

So you've got a Stream<T>, and you want a Map<K, Collection<V>>, and 
you've got a T->K called "f" and a T->V called "g".  Easy:

   Map<K, Collection<V>> multiMap
      stream.collect(groupingBy(f, mapping(g, toList()));

On 4/9/2013 7:48 PM, Tim Peierls wrote:
> No objection, but now it makes me wonder: How do you get the effect of
> toMultimap(T->K, T->V)? In other words, how would you get a Map<K,
> Collection<V>> from a Stream<T> given T->K and T->V mappings?
>
> --tim
>
> On Tue, Apr 9, 2013 at 7:33 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     I'm good with #3.  Any objections?
>
>
>     On 4/9/2013 7:28 PM, Sam Pullara wrote:
>
>         I like version 3 as well.
>
>         Sam
>
>         On Apr 9, 2013, at 2:29 PM, Brian Goetz <brian.goetz at oracle.com
>         <mailto:brian.goetz at oracle.com>> wrote:
>
>             Currently we have:
>
>                  Collector<T, Map<T,U>>
>                  toMap(Function<? super T, ? extends U> mapper)
>
>             and
>
>                  <T, U, M extends Map<T, U>>
>                  Collector<T, M>
>                  toMap(Function<? super T, ? extends U> mapper,
>                        Supplier<M> mapSupplier,
>                        BinaryOperator<U> mergeFunction)
>
>             (plus concurrent versions of both of these.)  The former is
>             just sugar for:
>
>                toMap(mapper, HashMap::new, throwingMerger())
>
>             (We have predefined merge functions for throw-on-duplicates,
>             first-wins, and last-wins, called throwingMerger,
>             firstWinsMerger, and lastWinsMerger.)
>
>             As has been noted, we do not yet serve the use case of
>             creating a map where the stream elements are the values of
>             the map instead of the keys of the map.  Options for
>             addressing this are:
>
>             1.  Leave toMap as is, add toIndexedMap (or toKeyedMap)
>             variants.
>
>             2.  Leave toMap as is, add a two-function version of toMap:
>
>                  <T,K,U>
>                  Collector<T, Map<K,U>>
>                  toMap(Function<T, K> keyMapper,
>                        Function<T, U> valueMapper)
>
>             in which case the regular toMap becomes sugar for
>
>                  toMap(Function.identity(), mapper)
>
>             3.  Get rid of the current form of toMap, and just have the
>             two-function form as in (2).
>
>             4.  Break free of the toMap naming (recall that until
>             recently this was called mappedTo, and prior to that,
>             joiningWith), and have two versions: mappedTo and
>             mappedFrom.  This is explicit, but also doesn't address the
>             use case where both key and value are functions of the
>             stream elements.
>
>             Others?
>
>
>

From tim at peierls.net  Tue Apr  9 17:23:13 2013
From: tim at peierls.net (Tim Peierls)
Date: Tue, 9 Apr 2013 20:23:13 -0400
Subject: toMap options
In-Reply-To: <5164A973.10402@oracle.com>
References: <51648831.4060301@oracle.com>
	<EEA7B286-CD23-4965-A465-68CE5A27690B@gmail.com>
	<5164A559.1070600@oracle.com>
	<CA+F8eeQrD5qbtMtcmOpa6FGPrwTJiMEjGqnh+ChJpA5ZdjQZ3Q@mail.gmail.com>
	<5164A973.10402@oracle.com>
Message-ID: <CA+F8eeTY1b9tdstpjjcwmgDsR=JLR7175U1qEKHscMGTCECUSA@mail.gmail.com>

Easy if you know how! At any rate, it's doable, and this might serve as an
example for groupingBy.

--tim

On Tue, Apr 9, 2013 at 7:51 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> So you've got a Stream<T>, and you want a Map<K, Collection<V>>, and
> you've got a T->K called "f" and a T->V called "g".  Easy:
>
>   Map<K, Collection<V>> multiMap
>      stream.collect(groupingBy(f, mapping(g, toList()));
>
>
> On 4/9/2013 7:48 PM, Tim Peierls wrote:
>
>> No objection, but now it makes me wonder: How do you get the effect of
>> toMultimap(T->K, T->V)? In other words, how would you get a Map<K,
>> Collection<V>> from a Stream<T> given T->K and T->V mappings?
>>
>> --tim
>>
>> On Tue, Apr 9, 2013 at 7:33 PM, Brian Goetz <brian.goetz at oracle.com
>> <mailto:brian.goetz at oracle.com**>> wrote:
>>
>>     I'm good with #3.  Any objections?
>>
>>
>>     On 4/9/2013 7:28 PM, Sam Pullara wrote:
>>
>>         I like version 3 as well.
>>
>>         Sam
>>
>>         On Apr 9, 2013, at 2:29 PM, Brian Goetz <brian.goetz at oracle.com
>>         <mailto:brian.goetz at oracle.com**>> wrote:
>>
>>             Currently we have:
>>
>>                  Collector<T, Map<T,U>>
>>                  toMap(Function<? super T, ? extends U> mapper)
>>
>>             and
>>
>>                  <T, U, M extends Map<T, U>>
>>                  Collector<T, M>
>>                  toMap(Function<? super T, ? extends U> mapper,
>>                        Supplier<M> mapSupplier,
>>                        BinaryOperator<U> mergeFunction)
>>
>>             (plus concurrent versions of both of these.)  The former is
>>             just sugar for:
>>
>>                toMap(mapper, HashMap::new, throwingMerger())
>>
>>             (We have predefined merge functions for throw-on-duplicates,
>>             first-wins, and last-wins, called throwingMerger,
>>             firstWinsMerger, and lastWinsMerger.)
>>
>>             As has been noted, we do not yet serve the use case of
>>             creating a map where the stream elements are the values of
>>             the map instead of the keys of the map.  Options for
>>             addressing this are:
>>
>>             1.  Leave toMap as is, add toIndexedMap (or toKeyedMap)
>>             variants.
>>
>>             2.  Leave toMap as is, add a two-function version of toMap:
>>
>>                  <T,K,U>
>>                  Collector<T, Map<K,U>>
>>                  toMap(Function<T, K> keyMapper,
>>                        Function<T, U> valueMapper)
>>
>>             in which case the regular toMap becomes sugar for
>>
>>                  toMap(Function.identity(), mapper)
>>
>>             3.  Get rid of the current form of toMap, and just have the
>>             two-function form as in (2).
>>
>>             4.  Break free of the toMap naming (recall that until
>>             recently this was called mappedTo, and prior to that,
>>             joiningWith), and have two versions: mappedTo and
>>             mappedFrom.  This is explicit, but also doesn't address the
>>             use case where both key and value are functions of the
>>             stream elements.
>>
>>             Others?
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130409/6fb9d72c/attachment.html 

From brian.goetz at oracle.com  Tue Apr  9 18:52:39 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 09 Apr 2013 21:52:39 -0400
Subject: toMap options
In-Reply-To: <CA+F8eeTY1b9tdstpjjcwmgDsR=JLR7175U1qEKHscMGTCECUSA@mail.gmail.com>
References: <51648831.4060301@oracle.com>
	<EEA7B286-CD23-4965-A465-68CE5A27690B@gmail.com>
	<5164A559.1070600@oracle.com>
	<CA+F8eeQrD5qbtMtcmOpa6FGPrwTJiMEjGqnh+ChJpA5ZdjQZ3Q@mail.gmail.com>
	<5164A973.10402@oracle.com>
	<CA+F8eeTY1b9tdstpjjcwmgDsR=JLR7175U1qEKHscMGTCECUSA@mail.gmail.com>
Message-ID: <5164C5E7.5040501@oracle.com>

Note also that the separation of Collector from Stream allows you to 
write your own, and allows Guava to publish properly typed collectors 
for, say, their implementation of Multimap.

The pedagogical question remains, though -- how to spread these examples 
throughout Javadoc so people are exposed to the idioms.

On 4/9/2013 8:23 PM, Tim Peierls wrote:
> Easy if you know how! At any rate, it's doable, and this might serve as
> an example for groupingBy.
>
> --tim
>
> On Tue, Apr 9, 2013 at 7:51 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     So you've got a Stream<T>, and you want a Map<K, Collection<V>>, and
>     you've got a T->K called "f" and a T->V called "g".  Easy:
>
>        Map<K, Collection<V>> multiMap
>           stream.collect(groupingBy(f, mapping(g, toList()));
>
>
>     On 4/9/2013 7:48 PM, Tim Peierls wrote:
>
>         No objection, but now it makes me wonder: How do you get the
>         effect of
>         toMultimap(T->K, T->V)? In other words, how would you get a Map<K,
>         Collection<V>> from a Stream<T> given T->K and T->V mappings?
>
>         --tim
>
>         On Tue, Apr 9, 2013 at 7:33 PM, Brian Goetz
>         <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>
>         <mailto:brian.goetz at oracle.com
>         <mailto:brian.goetz at oracle.com>__>> wrote:
>
>              I'm good with #3.  Any objections?
>
>
>              On 4/9/2013 7:28 PM, Sam Pullara wrote:
>
>                  I like version 3 as well.
>
>                  Sam
>
>                  On Apr 9, 2013, at 2:29 PM, Brian Goetz
>         <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>
>                  <mailto:brian.goetz at oracle.com
>         <mailto:brian.goetz at oracle.com>__>> wrote:
>
>                      Currently we have:
>
>                           Collector<T, Map<T,U>>
>                           toMap(Function<? super T, ? extends U> mapper)
>
>                      and
>
>                           <T, U, M extends Map<T, U>>
>                           Collector<T, M>
>                           toMap(Function<? super T, ? extends U> mapper,
>                                 Supplier<M> mapSupplier,
>                                 BinaryOperator<U> mergeFunction)
>
>                      (plus concurrent versions of both of these.)  The
>         former is
>                      just sugar for:
>
>                         toMap(mapper, HashMap::new, throwingMerger())
>
>                      (We have predefined merge functions for
>         throw-on-duplicates,
>                      first-wins, and last-wins, called throwingMerger,
>                      firstWinsMerger, and lastWinsMerger.)
>
>                      As has been noted, we do not yet serve the use case of
>                      creating a map where the stream elements are the
>         values of
>                      the map instead of the keys of the map.  Options for
>                      addressing this are:
>
>                      1.  Leave toMap as is, add toIndexedMap (or toKeyedMap)
>                      variants.
>
>                      2.  Leave toMap as is, add a two-function version
>         of toMap:
>
>                           <T,K,U>
>                           Collector<T, Map<K,U>>
>                           toMap(Function<T, K> keyMapper,
>                                 Function<T, U> valueMapper)
>
>                      in which case the regular toMap becomes sugar for
>
>                           toMap(Function.identity(), mapper)
>
>                      3.  Get rid of the current form of toMap, and just
>         have the
>                      two-function form as in (2).
>
>                      4.  Break free of the toMap naming (recall that until
>                      recently this was called mappedTo, and prior to that,
>                      joiningWith), and have two versions: mappedTo and
>                      mappedFrom.  This is explicit, but also doesn't
>         address the
>                      use case where both key and value are functions of the
>                      stream elements.
>
>                      Others?
>
>
>
>

From paul.sandoz at oracle.com  Wed Apr 10 02:35:33 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Wed, 10 Apr 2013 11:35:33 +0200
Subject: Possible groupingBy simplification?
In-Reply-To: <CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
References: <5164854A.3000006@oracle.com>
	<CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
Message-ID: <CBC090C2-208F-403B-B55B-A774405AC06A@oracle.com>


On Apr 9, 2013, at 11:56 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> I like the most popular form.  In fact, I think it's the only one that I've
> used.
> 
> The argument that users will gain by removing their most common form seems
> kind of far-fetched.
> 

If each method in Collectors does just one conceptual thing we can concisely express in documentation it is easier to remember and therefore easier to read the code, easier to find in documentation be it using the IDE or otherwise. Thus to me that suggests removing conceptual variants or renaming them. 

If the list variants were called say groupingByToList that would ensure the "one conceptual thing":  classifies elements by key, and collects elements associated with that key to a list. But i suspect we might not require those methods if the leap of stream.collector(toList()) can be grasped.

The same applies to toMap. I think it is easier to understand/read if it does just one conceptual thing: elements are keys, elements are mapped to values, conflicting keys result in an exception. If that does not fit ones requirements use groupingBy.

Paul.

> In my experience, I do a ctrl-space and look for my target return type on
> the right-hand-side of the IDE popup, and then I try to fill in the missing
> information, such as parameters.  In this case, having to provide toList()
> would probably be a stumbling block for me, as the IDE is not as good when
> it comes to suggesting expressions for parameters.
> 
> I sort of like the symmetry with collect(toList()) but not enough to make
> up for the loss.
> 
> 
> 
> On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
>> Paul suggested the following possible simplification for groupingBy.  It
>> is somewhat counterintuitive at first glance, in that it removes the most
>> commonly used form (!), but might make things easier to grasp in the long
>> run (aided by good docs.)
>> 
>> Recall we currently have four forms of groupingBy:
>> 
>>    // classifier only -- maps keys to list of matching elements
>>    Collector<T, Map<K, List<T>>>
>>    groupingBy(Function<? super T, ? extends K> classifier)
>> 
>>    // Like above, but with explicit map ctor
>>    <T, K, M extends Map<K, List<T>>>
>>    Collector<T, M>
>>    groupingBy(Function<? super T, ? extends K> classifier,
>>               Supplier<M> mapFactory)
>> 
>>    // basic cascaded form
>>    Collector<T, Map<K, D>>
>>    groupingBy(Function<? super T, ? extends K> classifier,
>>               Collector<T, D> downstream)
>> 
>>    // cascaded form with explicit ctor
>>    <T, K, D, M extends Map<K, D>>
>>    Collector<T, M>
>>    groupingBy(Function<? super T, ? extends K> classifier,
>>               Supplier<M> mapFactory,
>>               Collector<T, D> downstream)
>> 
>> Plus four corresponding forms for groupingByConcurrent.
>> 
>> The first form is likely to be the most common, as it is the traditional
>> "group by".  It is equivalent to:
>> 
>>  groupingBy(classifier, toList());
>> 
>> The proposal is: Drop the first two forms.  Just as users can learn that
>> to collect elements into a list, you do:
>> 
>>  collect(toList())
>> 
>> people can learn that to do the simple form of groupBy, you can do:
>> 
>>  collect(groupingBy(f, toList());
>> 
>> Which also reads perfectly well.
>> 
>> By cutting the number of forms in half, it helps users to realize that
>> groupingBy does just one thing -- classifies elements by key, and collects
>> elements associated with that key.  Obviously the docs for groupingBy can
>> show examples of the simple grouping as well as more sophisticated
>> groupings.
>> 
>> 


From joe.bowbeer at gmail.com  Wed Apr 10 09:37:48 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Wed, 10 Apr 2013 09:37:48 -0700
Subject: Possible groupingBy simplification?
In-Reply-To: <CBC090C2-208F-403B-B55B-A774405AC06A@oracle.com>
References: <5164854A.3000006@oracle.com>
	<CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
	<CBC090C2-208F-403B-B55B-A774405AC06A@oracle.com>
Message-ID: <CAHzJPEoRYE0E4OJvgpUVYebXnrjWbqn89r64hh2DGBDJ6XDdsg@mail.gmail.com>

For consistency with minBy and friends, all the 'By' methods should take a
single argument: f. Hence grouping(f).

No-arg and one-arg forms are the easiest to use and maintain. Just the
additional comma, and which pair of parens contains it, is a significant
burden.

The most readable forms of collect that have an explicit toList() would be
of the form:

collect(grouping(f)).toList();

or maybe

collect(toList(), groupingBy(f));

Joe
 On Apr 10, 2013 2:35 AM, "Paul Sandoz" <paul.sandoz at oracle.com> wrote:

>
> On Apr 9, 2013, at 11:56 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
>
> > I like the most popular form.  In fact, I think it's the only one that
> I've
> > used.
> >
> > The argument that users will gain by removing their most common form
> seems
> > kind of far-fetched.
> >
>
> If each method in Collectors does just one conceptual thing we can
> concisely express in documentation it is easier to remember and therefore
> easier to read the code, easier to find in documentation be it using the
> IDE or otherwise. Thus to me that suggests removing conceptual variants or
> renaming them.
>
> If the list variants were called say groupingByToList that would ensure
> the "one conceptual thing":  classifies elements by key, and collects
> elements associated with that key to a list. But i suspect we might not
> require those methods if the leap of stream.collector(toList()) can be
> grasped.
>
> The same applies to toMap. I think it is easier to understand/read if it
> does just one conceptual thing: elements are keys, elements are mapped to
> values, conflicting keys result in an exception. If that does not fit ones
> requirements use groupingBy.
>
> Paul.
>
> > In my experience, I do a ctrl-space and look for my target return type on
> > the right-hand-side of the IDE popup, and then I try to fill in the
> missing
> > information, such as parameters.  In this case, having to provide
> toList()
> > would probably be a stumbling block for me, as the IDE is not as good
> when
> > it comes to suggesting expressions for parameters.
> >
> > I sort of like the symmetry with collect(toList()) but not enough to make
> > up for the loss.
> >
> >
> >
> > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz <brian.goetz at oracle.com>
> wrote:
> >
> >> Paul suggested the following possible simplification for groupingBy.  It
> >> is somewhat counterintuitive at first glance, in that it removes the
> most
> >> commonly used form (!), but might make things easier to grasp in the
> long
> >> run (aided by good docs.)
> >>
> >> Recall we currently have four forms of groupingBy:
> >>
> >>    // classifier only -- maps keys to list of matching elements
> >>    Collector<T, Map<K, List<T>>>
> >>    groupingBy(Function<? super T, ? extends K> classifier)
> >>
> >>    // Like above, but with explicit map ctor
> >>    <T, K, M extends Map<K, List<T>>>
> >>    Collector<T, M>
> >>    groupingBy(Function<? super T, ? extends K> classifier,
> >>               Supplier<M> mapFactory)
> >>
> >>    // basic cascaded form
> >>    Collector<T, Map<K, D>>
> >>    groupingBy(Function<? super T, ? extends K> classifier,
> >>               Collector<T, D> downstream)
> >>
> >>    // cascaded form with explicit ctor
> >>    <T, K, D, M extends Map<K, D>>
> >>    Collector<T, M>
> >>    groupingBy(Function<? super T, ? extends K> classifier,
> >>               Supplier<M> mapFactory,
> >>               Collector<T, D> downstream)
> >>
> >> Plus four corresponding forms for groupingByConcurrent.
> >>
> >> The first form is likely to be the most common, as it is the traditional
> >> "group by".  It is equivalent to:
> >>
> >>  groupingBy(classifier, toList());
> >>
> >> The proposal is: Drop the first two forms.  Just as users can learn that
> >> to collect elements into a list, you do:
> >>
> >>  collect(toList())
> >>
> >> people can learn that to do the simple form of groupBy, you can do:
> >>
> >>  collect(groupingBy(f, toList());
> >>
> >> Which also reads perfectly well.
> >>
> >> By cutting the number of forms in half, it helps users to realize that
> >> groupingBy does just one thing -- classifies elements by key, and
> collects
> >> elements associated with that key.  Obviously the docs for groupingBy
> can
> >> show examples of the simple grouping as well as more sophisticated
> >> groupings.
> >>
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130410/a0977b8a/attachment-0001.html 

From joe.bowbeer at gmail.com  Wed Apr 10 09:42:28 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Wed, 10 Apr 2013 09:42:28 -0700
Subject: Possible groupingBy simplification?
In-Reply-To: <CAHzJPEoRYE0E4OJvgpUVYebXnrjWbqn89r64hh2DGBDJ6XDdsg@mail.gmail.com>
References: <5164854A.3000006@oracle.com>
	<CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
	<CBC090C2-208F-403B-B55B-A774405AC06A@oracle.com>
	<CAHzJPEoRYE0E4OJvgpUVYebXnrjWbqn89r64hh2DGBDJ6XDdsg@mail.gmail.com>
Message-ID: <CAHzJPEoy_+qEKU=EBOUdTfE0M=hh9LhQcNENPbAsvBFDjRUGMQ@mail.gmail.com>

Correction: All the grouping(f) should be groupingBy(f)
On Apr 10, 2013 9:37 AM, "Joe Bowbeer" <joe.bowbeer at gmail.com> wrote:

> For consistency with minBy and friends, all the 'By' methods should take a
> single argument: f. Hence grouping(f).
>
> No-arg and one-arg forms are the easiest to use and maintain. Just the
> additional comma, and which pair of parens contains it, is a significant
> burden.
>
> The most readable forms of collect that have an explicit toList() would be
> of the form:
>
> collect(grouping(f)).toList();
>
> or maybe
>
> collect(toList(), groupingBy(f));
>
> Joe
>  On Apr 10, 2013 2:35 AM, "Paul Sandoz" <paul.sandoz at oracle.com> wrote:
>
>>
>> On Apr 9, 2013, at 11:56 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
>>
>> > I like the most popular form.  In fact, I think it's the only one that
>> I've
>> > used.
>> >
>> > The argument that users will gain by removing their most common form
>> seems
>> > kind of far-fetched.
>> >
>>
>> If each method in Collectors does just one conceptual thing we can
>> concisely express in documentation it is easier to remember and therefore
>> easier to read the code, easier to find in documentation be it using the
>> IDE or otherwise. Thus to me that suggests removing conceptual variants or
>> renaming them.
>>
>> If the list variants were called say groupingByToList that would ensure
>> the "one conceptual thing":  classifies elements by key, and collects
>> elements associated with that key to a list. But i suspect we might not
>> require those methods if the leap of stream.collector(toList()) can be
>> grasped.
>>
>> The same applies to toMap. I think it is easier to understand/read if it
>> does just one conceptual thing: elements are keys, elements are mapped to
>> values, conflicting keys result in an exception. If that does not fit ones
>> requirements use groupingBy.
>>
>> Paul.
>>
>> > In my experience, I do a ctrl-space and look for my target return type
>> on
>> > the right-hand-side of the IDE popup, and then I try to fill in the
>> missing
>> > information, such as parameters.  In this case, having to provide
>> toList()
>> > would probably be a stumbling block for me, as the IDE is not as good
>> when
>> > it comes to suggesting expressions for parameters.
>> >
>> > I sort of like the symmetry with collect(toList()) but not enough to
>> make
>> > up for the loss.
>> >
>> >
>> >
>> > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz <brian.goetz at oracle.com>
>> wrote:
>> >
>> >> Paul suggested the following possible simplification for groupingBy.
>>  It
>> >> is somewhat counterintuitive at first glance, in that it removes the
>> most
>> >> commonly used form (!), but might make things easier to grasp in the
>> long
>> >> run (aided by good docs.)
>> >>
>> >> Recall we currently have four forms of groupingBy:
>> >>
>> >>    // classifier only -- maps keys to list of matching elements
>> >>    Collector<T, Map<K, List<T>>>
>> >>    groupingBy(Function<? super T, ? extends K> classifier)
>> >>
>> >>    // Like above, but with explicit map ctor
>> >>    <T, K, M extends Map<K, List<T>>>
>> >>    Collector<T, M>
>> >>    groupingBy(Function<? super T, ? extends K> classifier,
>> >>               Supplier<M> mapFactory)
>> >>
>> >>    // basic cascaded form
>> >>    Collector<T, Map<K, D>>
>> >>    groupingBy(Function<? super T, ? extends K> classifier,
>> >>               Collector<T, D> downstream)
>> >>
>> >>    // cascaded form with explicit ctor
>> >>    <T, K, D, M extends Map<K, D>>
>> >>    Collector<T, M>
>> >>    groupingBy(Function<? super T, ? extends K> classifier,
>> >>               Supplier<M> mapFactory,
>> >>               Collector<T, D> downstream)
>> >>
>> >> Plus four corresponding forms for groupingByConcurrent.
>> >>
>> >> The first form is likely to be the most common, as it is the
>> traditional
>> >> "group by".  It is equivalent to:
>> >>
>> >>  groupingBy(classifier, toList());
>> >>
>> >> The proposal is: Drop the first two forms.  Just as users can learn
>> that
>> >> to collect elements into a list, you do:
>> >>
>> >>  collect(toList())
>> >>
>> >> people can learn that to do the simple form of groupBy, you can do:
>> >>
>> >>  collect(groupingBy(f, toList());
>> >>
>> >> Which also reads perfectly well.
>> >>
>> >> By cutting the number of forms in half, it helps users to realize that
>> >> groupingBy does just one thing -- classifies elements by key, and
>> collects
>> >> elements associated with that key.  Obviously the docs for groupingBy
>> can
>> >> show examples of the simple grouping as well as more sophisticated
>> >> groupings.
>> >>
>> >>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130410/0b38d599/attachment.html 

From forax at univ-mlv.fr  Wed Apr 10 10:10:25 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 10 Apr 2013 19:10:25 +0200
Subject: Possible groupingBy simplification?
In-Reply-To: <CAHzJPEoy_+qEKU=EBOUdTfE0M=hh9LhQcNENPbAsvBFDjRUGMQ@mail.gmail.com>
References: <5164854A.3000006@oracle.com>
	<CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
	<CBC090C2-208F-403B-B55B-A774405AC06A@oracle.com>
	<CAHzJPEoRYE0E4OJvgpUVYebXnrjWbqn89r64hh2DGBDJ6XDdsg@mail.gmail.com>
	<CAHzJPEoy_+qEKU=EBOUdTfE0M=hh9LhQcNENPbAsvBFDjRUGMQ@mail.gmail.com>
Message-ID: <51659D01.2020606@univ-mlv.fr>

Joe,
collect(toList(), groupingBy(f));
   => how do you express the fact that you may want to group in cascade ?

collect(groupingBy(f)).toList()
   => what is the resulting type of collect(groupingBy(f)) ?
         is it a super-type of Stream ?

Brian,
I'm fine with the proposed changes.

R?mi

On 04/10/2013 06:42 PM, Joe Bowbeer wrote:
>
> Correction: All the grouping(f) should be groupingBy(f)
>
> On Apr 10, 2013 9:37 AM, "Joe Bowbeer" <joe.bowbeer at gmail.com 
> <mailto:joe.bowbeer at gmail.com>> wrote:
>
>     For consistency with minBy and friends, all the 'By' methods
>     should take a single argument: f. Hence grouping(f).
>
>     No-arg and one-arg forms are the easiest to use and maintain. Just
>     the additional comma, and which pair of parens contains it, is a
>     significant burden.
>
>     The most readable forms of collect that have an explicit toList()
>     would be of the form:
>
>     collect(grouping(f)).toList();
>
>     or maybe
>
>     collect(toList(), groupingBy(f));
>
>     Joe
>
>     On Apr 10, 2013 2:35 AM, "Paul Sandoz" <paul.sandoz at oracle.com
>     <mailto:paul.sandoz at oracle.com>> wrote:
>
>
>         On Apr 9, 2013, at 11:56 PM, Joe Bowbeer
>         <joe.bowbeer at gmail.com <mailto:joe.bowbeer at gmail.com>> wrote:
>
>         > I like the most popular form.  In fact, I think it's the
>         only one that I've
>         > used.
>         >
>         > The argument that users will gain by removing their most
>         common form seems
>         > kind of far-fetched.
>         >
>
>         If each method in Collectors does just one conceptual thing we
>         can concisely express in documentation it is easier to
>         remember and therefore easier to read the code, easier to find
>         in documentation be it using the IDE or otherwise. Thus to me
>         that suggests removing conceptual variants or renaming them.
>
>         If the list variants were called say groupingByToList that
>         would ensure the "one conceptual thing":  classifies elements
>         by key, and collects elements associated with that key to a
>         list. But i suspect we might not require those methods if the
>         leap of stream.collector(toList()) can be grasped.
>
>         The same applies to toMap. I think it is easier to
>         understand/read if it does just one conceptual thing: elements
>         are keys, elements are mapped to values, conflicting keys
>         result in an exception. If that does not fit ones requirements
>         use groupingBy.
>
>         Paul.
>
>         > In my experience, I do a ctrl-space and look for my target
>         return type on
>         > the right-hand-side of the IDE popup, and then I try to fill
>         in the missing
>         > information, such as parameters.  In this case, having to
>         provide toList()
>         > would probably be a stumbling block for me, as the IDE is
>         not as good when
>         > it comes to suggesting expressions for parameters.
>         >
>         > I sort of like the symmetry with collect(toList()) but not
>         enough to make
>         > up for the loss.
>         >
>         >
>         >
>         > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz
>         <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>> wrote:
>         >
>         >> Paul suggested the following possible simplification for
>         groupingBy.  It
>         >> is somewhat counterintuitive at first glance, in that it
>         removes the most
>         >> commonly used form (!), but might make things easier to
>         grasp in the long
>         >> run (aided by good docs.)
>         >>
>         >> Recall we currently have four forms of groupingBy:
>         >>
>         >>    // classifier only -- maps keys to list of matching elements
>         >>    Collector<T, Map<K, List<T>>>
>         >>    groupingBy(Function<? super T, ? extends K> classifier)
>         >>
>         >>    // Like above, but with explicit map ctor
>         >>    <T, K, M extends Map<K, List<T>>>
>         >>    Collector<T, M>
>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>         >>               Supplier<M> mapFactory)
>         >>
>         >>    // basic cascaded form
>         >>    Collector<T, Map<K, D>>
>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>         >>               Collector<T, D> downstream)
>         >>
>         >>    // cascaded form with explicit ctor
>         >>    <T, K, D, M extends Map<K, D>>
>         >>    Collector<T, M>
>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>         >>               Supplier<M> mapFactory,
>         >>               Collector<T, D> downstream)
>         >>
>         >> Plus four corresponding forms for groupingByConcurrent.
>         >>
>         >> The first form is likely to be the most common, as it is
>         the traditional
>         >> "group by".  It is equivalent to:
>         >>
>         >>  groupingBy(classifier, toList());
>         >>
>         >> The proposal is: Drop the first two forms.  Just as users
>         can learn that
>         >> to collect elements into a list, you do:
>         >>
>         >>  collect(toList())
>         >>
>         >> people can learn that to do the simple form of groupBy, you
>         can do:
>         >>
>         >>  collect(groupingBy(f, toList());
>         >>
>         >> Which also reads perfectly well.
>         >>
>         >> By cutting the number of forms in half, it helps users to
>         realize that
>         >> groupingBy does just one thing -- classifies elements by
>         key, and collects
>         >> elements associated with that key.  Obviously the docs for
>         groupingBy can
>         >> show examples of the simple grouping as well as more
>         sophisticated
>         >> groupings.
>         >>
>         >>
>


From brian.goetz at oracle.com  Wed Apr 10 11:11:19 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 10 Apr 2013 14:11:19 -0400
Subject: Possible groupingBy simplification?
In-Reply-To: <51659D01.2020606@univ-mlv.fr>
References: <5164854A.3000006@oracle.com>
	<CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
	<CBC090C2-208F-403B-B55B-A774405AC06A@oracle.com>
	<CAHzJPEoRYE0E4OJvgpUVYebXnrjWbqn89r64hh2DGBDJ6XDdsg@mail.gmail.com>
	<CAHzJPEoy_+qEKU=EBOUdTfE0M=hh9LhQcNENPbAsvBFDjRUGMQ@mail.gmail.com>
	<51659D01.2020606@univ-mlv.fr>
Message-ID: <5165AB47.5050408@oracle.com>

After staring at groupingBy and toMap for a while, I think there's a 
nice middle ground which should address the key use cases while reducing 
a little bit of the "which one do I use":

   groupingBy(f)
   groupingBy(f, downstreamCollector)
   groupingBy(f, mapSupplier, downstreamCollector)

   toMap(keyFn, valFn)
   toMap(keyFn, valFn, mergeFn)
   toMap(keyFn, valFn, mergeFn, mapSupplier)

This cuts variants of each from 4 to 3, but more importantly, orders 
them into a nice telescoping set.

Those wanting the groupingBy(f, mapSUpplier) version should be able to 
figure out easily (with aid from doc) that they can use groupingBy(f, 
mapSUpplier, toList()).

On 4/10/2013 1:10 PM, Remi Forax wrote:
> Joe,
> collect(toList(), groupingBy(f));
>    => how do you express the fact that you may want to group in cascade ?
>
> collect(groupingBy(f)).toList()
>    => what is the resulting type of collect(groupingBy(f)) ?
>          is it a super-type of Stream ?
>
> Brian,
> I'm fine with the proposed changes.
>
> R?mi
>
> On 04/10/2013 06:42 PM, Joe Bowbeer wrote:
>>
>> Correction: All the grouping(f) should be groupingBy(f)
>>
>> On Apr 10, 2013 9:37 AM, "Joe Bowbeer" <joe.bowbeer at gmail.com
>> <mailto:joe.bowbeer at gmail.com>> wrote:
>>
>>     For consistency with minBy and friends, all the 'By' methods
>>     should take a single argument: f. Hence grouping(f).
>>
>>     No-arg and one-arg forms are the easiest to use and maintain. Just
>>     the additional comma, and which pair of parens contains it, is a
>>     significant burden.
>>
>>     The most readable forms of collect that have an explicit toList()
>>     would be of the form:
>>
>>     collect(grouping(f)).toList();
>>
>>     or maybe
>>
>>     collect(toList(), groupingBy(f));
>>
>>     Joe
>>
>>     On Apr 10, 2013 2:35 AM, "Paul Sandoz" <paul.sandoz at oracle.com
>>     <mailto:paul.sandoz at oracle.com>> wrote:
>>
>>
>>         On Apr 9, 2013, at 11:56 PM, Joe Bowbeer
>>         <joe.bowbeer at gmail.com <mailto:joe.bowbeer at gmail.com>> wrote:
>>
>>         > I like the most popular form.  In fact, I think it's the
>>         only one that I've
>>         > used.
>>         >
>>         > The argument that users will gain by removing their most
>>         common form seems
>>         > kind of far-fetched.
>>         >
>>
>>         If each method in Collectors does just one conceptual thing we
>>         can concisely express in documentation it is easier to
>>         remember and therefore easier to read the code, easier to find
>>         in documentation be it using the IDE or otherwise. Thus to me
>>         that suggests removing conceptual variants or renaming them.
>>
>>         If the list variants were called say groupingByToList that
>>         would ensure the "one conceptual thing":  classifies elements
>>         by key, and collects elements associated with that key to a
>>         list. But i suspect we might not require those methods if the
>>         leap of stream.collector(toList()) can be grasped.
>>
>>         The same applies to toMap. I think it is easier to
>>         understand/read if it does just one conceptual thing: elements
>>         are keys, elements are mapped to values, conflicting keys
>>         result in an exception. If that does not fit ones requirements
>>         use groupingBy.
>>
>>         Paul.
>>
>>         > In my experience, I do a ctrl-space and look for my target
>>         return type on
>>         > the right-hand-side of the IDE popup, and then I try to fill
>>         in the missing
>>         > information, such as parameters.  In this case, having to
>>         provide toList()
>>         > would probably be a stumbling block for me, as the IDE is
>>         not as good when
>>         > it comes to suggesting expressions for parameters.
>>         >
>>         > I sort of like the symmetry with collect(toList()) but not
>>         enough to make
>>         > up for the loss.
>>         >
>>         >
>>         >
>>         > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz
>>         <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>> wrote:
>>         >
>>         >> Paul suggested the following possible simplification for
>>         groupingBy.  It
>>         >> is somewhat counterintuitive at first glance, in that it
>>         removes the most
>>         >> commonly used form (!), but might make things easier to
>>         grasp in the long
>>         >> run (aided by good docs.)
>>         >>
>>         >> Recall we currently have four forms of groupingBy:
>>         >>
>>         >>    // classifier only -- maps keys to list of matching
>> elements
>>         >>    Collector<T, Map<K, List<T>>>
>>         >>    groupingBy(Function<? super T, ? extends K> classifier)
>>         >>
>>         >>    // Like above, but with explicit map ctor
>>         >>    <T, K, M extends Map<K, List<T>>>
>>         >>    Collector<T, M>
>>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>>         >>               Supplier<M> mapFactory)
>>         >>
>>         >>    // basic cascaded form
>>         >>    Collector<T, Map<K, D>>
>>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>>         >>               Collector<T, D> downstream)
>>         >>
>>         >>    // cascaded form with explicit ctor
>>         >>    <T, K, D, M extends Map<K, D>>
>>         >>    Collector<T, M>
>>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>>         >>               Supplier<M> mapFactory,
>>         >>               Collector<T, D> downstream)
>>         >>
>>         >> Plus four corresponding forms for groupingByConcurrent.
>>         >>
>>         >> The first form is likely to be the most common, as it is
>>         the traditional
>>         >> "group by".  It is equivalent to:
>>         >>
>>         >>  groupingBy(classifier, toList());
>>         >>
>>         >> The proposal is: Drop the first two forms.  Just as users
>>         can learn that
>>         >> to collect elements into a list, you do:
>>         >>
>>         >>  collect(toList())
>>         >>
>>         >> people can learn that to do the simple form of groupBy, you
>>         can do:
>>         >>
>>         >>  collect(groupingBy(f, toList());
>>         >>
>>         >> Which also reads perfectly well.
>>         >>
>>         >> By cutting the number of forms in half, it helps users to
>>         realize that
>>         >> groupingBy does just one thing -- classifies elements by
>>         key, and collects
>>         >> elements associated with that key.  Obviously the docs for
>>         groupingBy can
>>         >> show examples of the simple grouping as well as more
>>         sophisticated
>>         >> groupings.
>>         >>
>>         >>
>>
>

From joe.bowbeer at gmail.com  Wed Apr 10 12:45:34 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Wed, 10 Apr 2013 12:45:34 -0700
Subject: Possible groupingBy simplification?
In-Reply-To: <5165AB47.5050408@oracle.com>
References: <5164854A.3000006@oracle.com>
	<CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
	<CBC090C2-208F-403B-B55B-A774405AC06A@oracle.com>
	<CAHzJPEoRYE0E4OJvgpUVYebXnrjWbqn89r64hh2DGBDJ6XDdsg@mail.gmail.com>
	<CAHzJPEoy_+qEKU=EBOUdTfE0M=hh9LhQcNENPbAsvBFDjRUGMQ@mail.gmail.com>
	<51659D01.2020606@univ-mlv.fr> <5165AB47.5050408@oracle.com>
Message-ID: <CAHzJPEoiX3sX9FSyDiKgKcLgpbj7F-YBMR-5hKPpKtD7ObajGA@mail.gmail.com>

Looks good. I like the retention of the simple forms, and the telescopes.
On Apr 10, 2013 11:11 AM, "Brian Goetz" <brian.goetz at oracle.com> wrote:

> After staring at groupingBy and toMap for a while, I think there's a nice
> middle ground which should address the key use cases while reducing a
> little bit of the "which one do I use":
>
>   groupingBy(f)
>   groupingBy(f, downstreamCollector)
>   groupingBy(f, mapSupplier, downstreamCollector)
>
>   toMap(keyFn, valFn)
>   toMap(keyFn, valFn, mergeFn)
>   toMap(keyFn, valFn, mergeFn, mapSupplier)
>
> This cuts variants of each from 4 to 3, but more importantly, orders them
> into a nice telescoping set.
>
> Those wanting the groupingBy(f, mapSUpplier) version should be able to
> figure out easily (with aid from doc) that they can use groupingBy(f,
> mapSUpplier, toList()).
>
> On 4/10/2013 1:10 PM, Remi Forax wrote:
>
>> Joe,
>> collect(toList(), groupingBy(f));
>>    => how do you express the fact that you may want to group in cascade ?
>>
>> collect(groupingBy(f)).toList(**)
>>    => what is the resulting type of collect(groupingBy(f)) ?
>>          is it a super-type of Stream ?
>>
>> Brian,
>> I'm fine with the proposed changes.
>>
>> R?mi
>>
>> On 04/10/2013 06:42 PM, Joe Bowbeer wrote:
>>
>>>
>>> Correction: All the grouping(f) should be groupingBy(f)
>>>
>>> On Apr 10, 2013 9:37 AM, "Joe Bowbeer" <joe.bowbeer at gmail.com
>>> <mailto:joe.bowbeer at gmail.com>**> wrote:
>>>
>>>     For consistency with minBy and friends, all the 'By' methods
>>>     should take a single argument: f. Hence grouping(f).
>>>
>>>     No-arg and one-arg forms are the easiest to use and maintain. Just
>>>     the additional comma, and which pair of parens contains it, is a
>>>     significant burden.
>>>
>>>     The most readable forms of collect that have an explicit toList()
>>>     would be of the form:
>>>
>>>     collect(grouping(f)).toList();
>>>
>>>     or maybe
>>>
>>>     collect(toList(), groupingBy(f));
>>>
>>>     Joe
>>>
>>>     On Apr 10, 2013 2:35 AM, "Paul Sandoz" <paul.sandoz at oracle.com
>>>     <mailto:paul.sandoz at oracle.com**>> wrote:
>>>
>>>
>>>         On Apr 9, 2013, at 11:56 PM, Joe Bowbeer
>>>         <joe.bowbeer at gmail.com <mailto:joe.bowbeer at gmail.com>**> wrote:
>>>
>>>         > I like the most popular form.  In fact, I think it's the
>>>         only one that I've
>>>         > used.
>>>         >
>>>         > The argument that users will gain by removing their most
>>>         common form seems
>>>         > kind of far-fetched.
>>>         >
>>>
>>>         If each method in Collectors does just one conceptual thing we
>>>         can concisely express in documentation it is easier to
>>>         remember and therefore easier to read the code, easier to find
>>>         in documentation be it using the IDE or otherwise. Thus to me
>>>         that suggests removing conceptual variants or renaming them.
>>>
>>>         If the list variants were called say groupingByToList that
>>>         would ensure the "one conceptual thing":  classifies elements
>>>         by key, and collects elements associated with that key to a
>>>         list. But i suspect we might not require those methods if the
>>>         leap of stream.collector(toList()) can be grasped.
>>>
>>>         The same applies to toMap. I think it is easier to
>>>         understand/read if it does just one conceptual thing: elements
>>>         are keys, elements are mapped to values, conflicting keys
>>>         result in an exception. If that does not fit ones requirements
>>>         use groupingBy.
>>>
>>>         Paul.
>>>
>>>         > In my experience, I do a ctrl-space and look for my target
>>>         return type on
>>>         > the right-hand-side of the IDE popup, and then I try to fill
>>>         in the missing
>>>         > information, such as parameters.  In this case, having to
>>>         provide toList()
>>>         > would probably be a stumbling block for me, as the IDE is
>>>         not as good when
>>>         > it comes to suggesting expressions for parameters.
>>>         >
>>>         > I sort of like the symmetry with collect(toList()) but not
>>>         enough to make
>>>         > up for the loss.
>>>         >
>>>         >
>>>         >
>>>         > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz
>>>         <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com**>>
>>> wrote:
>>>         >
>>>         >> Paul suggested the following possible simplification for
>>>         groupingBy.  It
>>>         >> is somewhat counterintuitive at first glance, in that it
>>>         removes the most
>>>         >> commonly used form (!), but might make things easier to
>>>         grasp in the long
>>>         >> run (aided by good docs.)
>>>         >>
>>>         >> Recall we currently have four forms of groupingBy:
>>>         >>
>>>         >>    // classifier only -- maps keys to list of matching
>>> elements
>>>         >>    Collector<T, Map<K, List<T>>>
>>>         >>    groupingBy(Function<? super T, ? extends K> classifier)
>>>         >>
>>>         >>    // Like above, but with explicit map ctor
>>>         >>    <T, K, M extends Map<K, List<T>>>
>>>         >>    Collector<T, M>
>>>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>>>         >>               Supplier<M> mapFactory)
>>>         >>
>>>         >>    // basic cascaded form
>>>         >>    Collector<T, Map<K, D>>
>>>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>>>         >>               Collector<T, D> downstream)
>>>         >>
>>>         >>    // cascaded form with explicit ctor
>>>         >>    <T, K, D, M extends Map<K, D>>
>>>         >>    Collector<T, M>
>>>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>>>         >>               Supplier<M> mapFactory,
>>>         >>               Collector<T, D> downstream)
>>>         >>
>>>         >> Plus four corresponding forms for groupingByConcurrent.
>>>         >>
>>>         >> The first form is likely to be the most common, as it is
>>>         the traditional
>>>         >> "group by".  It is equivalent to:
>>>         >>
>>>         >>  groupingBy(classifier, toList());
>>>         >>
>>>         >> The proposal is: Drop the first two forms.  Just as users
>>>         can learn that
>>>         >> to collect elements into a list, you do:
>>>         >>
>>>         >>  collect(toList())
>>>         >>
>>>         >> people can learn that to do the simple form of groupBy, you
>>>         can do:
>>>         >>
>>>         >>  collect(groupingBy(f, toList());
>>>         >>
>>>         >> Which also reads perfectly well.
>>>         >>
>>>         >> By cutting the number of forms in half, it helps users to
>>>         realize that
>>>         >> groupingBy does just one thing -- classifies elements by
>>>         key, and collects
>>>         >> elements associated with that key.  Obviously the docs for
>>>         groupingBy can
>>>         >> show examples of the simple grouping as well as more
>>>         sophisticated
>>>         >> groupings.
>>>         >>
>>>         >>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130410/a66aa5f7/attachment.html 

From tim at peierls.net  Wed Apr 10 13:00:19 2013
From: tim at peierls.net (Tim Peierls)
Date: Wed, 10 Apr 2013 16:00:19 -0400
Subject: Possible groupingBy simplification?
In-Reply-To: <CAHzJPEoiX3sX9FSyDiKgKcLgpbj7F-YBMR-5hKPpKtD7ObajGA@mail.gmail.com>
References: <5164854A.3000006@oracle.com>
	<CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
	<CBC090C2-208F-403B-B55B-A774405AC06A@oracle.com>
	<CAHzJPEoRYE0E4OJvgpUVYebXnrjWbqn89r64hh2DGBDJ6XDdsg@mail.gmail.com>
	<CAHzJPEoy_+qEKU=EBOUdTfE0M=hh9LhQcNENPbAsvBFDjRUGMQ@mail.gmail.com>
	<51659D01.2020606@univ-mlv.fr> <5165AB47.5050408@oracle.com>
	<CAHzJPEoiX3sX9FSyDiKgKcLgpbj7F-YBMR-5hKPpKtD7ObajGA@mail.gmail.com>
Message-ID: <CA+F8eeRu99cY9KKdF-+wo7+-ukt6QA-W3iKb9NAzPKnJ5DP2BQ@mail.gmail.com>

Agreed.

What mergeFn is used in two-arg toMap?

--tim

On Wed, Apr 10, 2013 at 3:45 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> Looks good. I like the retention of the simple forms, and the telescopes.
> On Apr 10, 2013 11:11 AM, "Brian Goetz" <brian.goetz at oracle.com> wrote:
>
>> After staring at groupingBy and toMap for a while, I think there's a nice
>> middle ground which should address the key use cases while reducing a
>> little bit of the "which one do I use":
>>
>>   groupingBy(f)
>>   groupingBy(f, downstreamCollector)
>>   groupingBy(f, mapSupplier, downstreamCollector)
>>
>>   toMap(keyFn, valFn)
>>   toMap(keyFn, valFn, mergeFn)
>>   toMap(keyFn, valFn, mergeFn, mapSupplier)
>>
>> This cuts variants of each from 4 to 3, but more importantly, orders them
>> into a nice telescoping set.
>>
>> Those wanting the groupingBy(f, mapSUpplier) version should be able to
>> figure out easily (with aid from doc) that they can use groupingBy(f,
>> mapSUpplier, toList()).
>>
>> On 4/10/2013 1:10 PM, Remi Forax wrote:
>>
>>> Joe,
>>> collect(toList(), groupingBy(f));
>>>    => how do you express the fact that you may want to group in cascade ?
>>>
>>> collect(groupingBy(f)).toList(**)
>>>    => what is the resulting type of collect(groupingBy(f)) ?
>>>          is it a super-type of Stream ?
>>>
>>> Brian,
>>> I'm fine with the proposed changes.
>>>
>>> R?mi
>>>
>>> On 04/10/2013 06:42 PM, Joe Bowbeer wrote:
>>>
>>>>
>>>> Correction: All the grouping(f) should be groupingBy(f)
>>>>
>>>> On Apr 10, 2013 9:37 AM, "Joe Bowbeer" <joe.bowbeer at gmail.com
>>>> <mailto:joe.bowbeer at gmail.com>**> wrote:
>>>>
>>>>     For consistency with minBy and friends, all the 'By' methods
>>>>     should take a single argument: f. Hence grouping(f).
>>>>
>>>>     No-arg and one-arg forms are the easiest to use and maintain. Just
>>>>     the additional comma, and which pair of parens contains it, is a
>>>>     significant burden.
>>>>
>>>>     The most readable forms of collect that have an explicit toList()
>>>>     would be of the form:
>>>>
>>>>     collect(grouping(f)).toList();
>>>>
>>>>     or maybe
>>>>
>>>>     collect(toList(), groupingBy(f));
>>>>
>>>>     Joe
>>>>
>>>>     On Apr 10, 2013 2:35 AM, "Paul Sandoz" <paul.sandoz at oracle.com
>>>>     <mailto:paul.sandoz at oracle.com**>> wrote:
>>>>
>>>>
>>>>         On Apr 9, 2013, at 11:56 PM, Joe Bowbeer
>>>>         <joe.bowbeer at gmail.com <mailto:joe.bowbeer at gmail.com>**> wrote:
>>>>
>>>>         > I like the most popular form.  In fact, I think it's the
>>>>         only one that I've
>>>>         > used.
>>>>         >
>>>>         > The argument that users will gain by removing their most
>>>>         common form seems
>>>>         > kind of far-fetched.
>>>>         >
>>>>
>>>>         If each method in Collectors does just one conceptual thing we
>>>>         can concisely express in documentation it is easier to
>>>>         remember and therefore easier to read the code, easier to find
>>>>         in documentation be it using the IDE or otherwise. Thus to me
>>>>         that suggests removing conceptual variants or renaming them.
>>>>
>>>>         If the list variants were called say groupingByToList that
>>>>         would ensure the "one conceptual thing":  classifies elements
>>>>         by key, and collects elements associated with that key to a
>>>>         list. But i suspect we might not require those methods if the
>>>>         leap of stream.collector(toList()) can be grasped.
>>>>
>>>>         The same applies to toMap. I think it is easier to
>>>>         understand/read if it does just one conceptual thing: elements
>>>>         are keys, elements are mapped to values, conflicting keys
>>>>         result in an exception. If that does not fit ones requirements
>>>>         use groupingBy.
>>>>
>>>>         Paul.
>>>>
>>>>         > In my experience, I do a ctrl-space and look for my target
>>>>         return type on
>>>>         > the right-hand-side of the IDE popup, and then I try to fill
>>>>         in the missing
>>>>         > information, such as parameters.  In this case, having to
>>>>         provide toList()
>>>>         > would probably be a stumbling block for me, as the IDE is
>>>>         not as good when
>>>>         > it comes to suggesting expressions for parameters.
>>>>         >
>>>>         > I sort of like the symmetry with collect(toList()) but not
>>>>         enough to make
>>>>         > up for the loss.
>>>>         >
>>>>         >
>>>>         >
>>>>         > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz
>>>>         <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com**>>
>>>> wrote:
>>>>         >
>>>>         >> Paul suggested the following possible simplification for
>>>>         groupingBy.  It
>>>>         >> is somewhat counterintuitive at first glance, in that it
>>>>         removes the most
>>>>         >> commonly used form (!), but might make things easier to
>>>>         grasp in the long
>>>>         >> run (aided by good docs.)
>>>>         >>
>>>>         >> Recall we currently have four forms of groupingBy:
>>>>         >>
>>>>         >>    // classifier only -- maps keys to list of matching
>>>> elements
>>>>         >>    Collector<T, Map<K, List<T>>>
>>>>         >>    groupingBy(Function<? super T, ? extends K> classifier)
>>>>         >>
>>>>         >>    // Like above, but with explicit map ctor
>>>>         >>    <T, K, M extends Map<K, List<T>>>
>>>>         >>    Collector<T, M>
>>>>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>>>>         >>               Supplier<M> mapFactory)
>>>>         >>
>>>>         >>    // basic cascaded form
>>>>         >>    Collector<T, Map<K, D>>
>>>>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>>>>         >>               Collector<T, D> downstream)
>>>>         >>
>>>>         >>    // cascaded form with explicit ctor
>>>>         >>    <T, K, D, M extends Map<K, D>>
>>>>         >>    Collector<T, M>
>>>>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>>>>         >>               Supplier<M> mapFactory,
>>>>         >>               Collector<T, D> downstream)
>>>>         >>
>>>>         >> Plus four corresponding forms for groupingByConcurrent.
>>>>         >>
>>>>         >> The first form is likely to be the most common, as it is
>>>>         the traditional
>>>>         >> "group by".  It is equivalent to:
>>>>         >>
>>>>         >>  groupingBy(classifier, toList());
>>>>         >>
>>>>         >> The proposal is: Drop the first two forms.  Just as users
>>>>         can learn that
>>>>         >> to collect elements into a list, you do:
>>>>         >>
>>>>         >>  collect(toList())
>>>>         >>
>>>>         >> people can learn that to do the simple form of groupBy, you
>>>>         can do:
>>>>         >>
>>>>         >>  collect(groupingBy(f, toList());
>>>>         >>
>>>>         >> Which also reads perfectly well.
>>>>         >>
>>>>         >> By cutting the number of forms in half, it helps users to
>>>>         realize that
>>>>         >> groupingBy does just one thing -- classifies elements by
>>>>         key, and collects
>>>>         >> elements associated with that key.  Obviously the docs for
>>>>         groupingBy can
>>>>         >> show examples of the simple grouping as well as more
>>>>         sophisticated
>>>>         >> groupings.
>>>>         >>
>>>>         >>
>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130410/b1194396/attachment-0001.html 

From spullara at gmail.com  Wed Apr 10 13:38:21 2013
From: spullara at gmail.com (Sam Pullara)
Date: Wed, 10 Apr 2013 13:38:21 -0700
Subject: Possible groupingBy simplification?
In-Reply-To: <CA+F8eeRu99cY9KKdF-+wo7+-ukt6QA-W3iKb9NAzPKnJ5DP2BQ@mail.gmail.com>
References: <5164854A.3000006@oracle.com>
	<CAHzJPErv6iYwoe=JL+hCkCW_E2k1J33Yh_SxrKOri4j=UR3_2A@mail.gmail.com>
	<CBC090C2-208F-403B-B55B-A774405AC06A@oracle.com>
	<CAHzJPEoRYE0E4OJvgpUVYebXnrjWbqn89r64hh2DGBDJ6XDdsg@mail.gmail.com>
	<CAHzJPEoy_+qEKU=EBOUdTfE0M=hh9LhQcNENPbAsvBFDjRUGMQ@mail.gmail.com>
	<51659D01.2020606@univ-mlv.fr> <5165AB47.5050408@oracle.com>
	<CAHzJPEoiX3sX9FSyDiKgKcLgpbj7F-YBMR-5hKPpKtD7ObajGA@mail.gmail.com>
	<CA+F8eeRu99cY9KKdF-+wo7+-ukt6QA-W3iKb9NAzPKnJ5DP2BQ@mail.gmail.com>
Message-ID: <DDE88BA0-A158-4711-94E5-BB4C9D2F5F2A@gmail.com>

Don't know what it is, but I'd like it to throw an exception on clobber. My assumption is that in that case you know the keys are unique.

Sam

On Apr 10, 2013, at 1:00 PM, Tim Peierls <tim at peierls.net> wrote:

> Agreed. 
> 
> What mergeFn is used in two-arg toMap?
> 
> --tim
> 
> On Wed, Apr 10, 2013 at 3:45 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:
> Looks good. I like the retention of the simple forms, and the telescopes.
> 
> On Apr 10, 2013 11:11 AM, "Brian Goetz" <brian.goetz at oracle.com> wrote:
> After staring at groupingBy and toMap for a while, I think there's a nice middle ground which should address the key use cases while reducing a little bit of the "which one do I use":
> 
>   groupingBy(f)
>   groupingBy(f, downstreamCollector)
>   groupingBy(f, mapSupplier, downstreamCollector)
> 
>   toMap(keyFn, valFn)
>   toMap(keyFn, valFn, mergeFn)
>   toMap(keyFn, valFn, mergeFn, mapSupplier)
> 
> This cuts variants of each from 4 to 3, but more importantly, orders them into a nice telescoping set.
> 
> Those wanting the groupingBy(f, mapSUpplier) version should be able to figure out easily (with aid from doc) that they can use groupingBy(f, mapSUpplier, toList()).
> 
> On 4/10/2013 1:10 PM, Remi Forax wrote:
> Joe,
> collect(toList(), groupingBy(f));
>    => how do you express the fact that you may want to group in cascade ?
> 
> collect(groupingBy(f)).toList()
>    => what is the resulting type of collect(groupingBy(f)) ?
>          is it a super-type of Stream ?
> 
> Brian,
> I'm fine with the proposed changes.
> 
> R?mi
> 
> On 04/10/2013 06:42 PM, Joe Bowbeer wrote:
> 
> Correction: All the grouping(f) should be groupingBy(f)
> 
> On Apr 10, 2013 9:37 AM, "Joe Bowbeer" <joe.bowbeer at gmail.com
> <mailto:joe.bowbeer at gmail.com>> wrote:
> 
>     For consistency with minBy and friends, all the 'By' methods
>     should take a single argument: f. Hence grouping(f).
> 
>     No-arg and one-arg forms are the easiest to use and maintain. Just
>     the additional comma, and which pair of parens contains it, is a
>     significant burden.
> 
>     The most readable forms of collect that have an explicit toList()
>     would be of the form:
> 
>     collect(grouping(f)).toList();
> 
>     or maybe
> 
>     collect(toList(), groupingBy(f));
> 
>     Joe
> 
>     On Apr 10, 2013 2:35 AM, "Paul Sandoz" <paul.sandoz at oracle.com
>     <mailto:paul.sandoz at oracle.com>> wrote:
> 
> 
>         On Apr 9, 2013, at 11:56 PM, Joe Bowbeer
>         <joe.bowbeer at gmail.com <mailto:joe.bowbeer at gmail.com>> wrote:
> 
>         > I like the most popular form.  In fact, I think it's the
>         only one that I've
>         > used.
>         >
>         > The argument that users will gain by removing their most
>         common form seems
>         > kind of far-fetched.
>         >
> 
>         If each method in Collectors does just one conceptual thing we
>         can concisely express in documentation it is easier to
>         remember and therefore easier to read the code, easier to find
>         in documentation be it using the IDE or otherwise. Thus to me
>         that suggests removing conceptual variants or renaming them.
> 
>         If the list variants were called say groupingByToList that
>         would ensure the "one conceptual thing":  classifies elements
>         by key, and collects elements associated with that key to a
>         list. But i suspect we might not require those methods if the
>         leap of stream.collector(toList()) can be grasped.
> 
>         The same applies to toMap. I think it is easier to
>         understand/read if it does just one conceptual thing: elements
>         are keys, elements are mapped to values, conflicting keys
>         result in an exception. If that does not fit ones requirements
>         use groupingBy.
> 
>         Paul.
> 
>         > In my experience, I do a ctrl-space and look for my target
>         return type on
>         > the right-hand-side of the IDE popup, and then I try to fill
>         in the missing
>         > information, such as parameters.  In this case, having to
>         provide toList()
>         > would probably be a stumbling block for me, as the IDE is
>         not as good when
>         > it comes to suggesting expressions for parameters.
>         >
>         > I sort of like the symmetry with collect(toList()) but not
>         enough to make
>         > up for the loss.
>         >
>         >
>         >
>         > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz
>         <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>> wrote:
>         >
>         >> Paul suggested the following possible simplification for
>         groupingBy.  It
>         >> is somewhat counterintuitive at first glance, in that it
>         removes the most
>         >> commonly used form (!), but might make things easier to
>         grasp in the long
>         >> run (aided by good docs.)
>         >>
>         >> Recall we currently have four forms of groupingBy:
>         >>
>         >>    // classifier only -- maps keys to list of matching
> elements
>         >>    Collector<T, Map<K, List<T>>>
>         >>    groupingBy(Function<? super T, ? extends K> classifier)
>         >>
>         >>    // Like above, but with explicit map ctor
>         >>    <T, K, M extends Map<K, List<T>>>
>         >>    Collector<T, M>
>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>         >>               Supplier<M> mapFactory)
>         >>
>         >>    // basic cascaded form
>         >>    Collector<T, Map<K, D>>
>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>         >>               Collector<T, D> downstream)
>         >>
>         >>    // cascaded form with explicit ctor
>         >>    <T, K, D, M extends Map<K, D>>
>         >>    Collector<T, M>
>         >>    groupingBy(Function<? super T, ? extends K> classifier,
>         >>               Supplier<M> mapFactory,
>         >>               Collector<T, D> downstream)
>         >>
>         >> Plus four corresponding forms for groupingByConcurrent.
>         >>
>         >> The first form is likely to be the most common, as it is
>         the traditional
>         >> "group by".  It is equivalent to:
>         >>
>         >>  groupingBy(classifier, toList());
>         >>
>         >> The proposal is: Drop the first two forms.  Just as users
>         can learn that
>         >> to collect elements into a list, you do:
>         >>
>         >>  collect(toList())
>         >>
>         >> people can learn that to do the simple form of groupBy, you
>         can do:
>         >>
>         >>  collect(groupingBy(f, toList());
>         >>
>         >> Which also reads perfectly well.
>         >>
>         >> By cutting the number of forms in half, it helps users to
>         realize that
>         >> groupingBy does just one thing -- classifies elements by
>         key, and collects
>         >> elements associated with that key.  Obviously the docs for
>         groupingBy can
>         >> show examples of the simple grouping as well as more
>         sophisticated
>         >> groupings.
>         >>
>         >>
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130410/c45b48a7/attachment.html 

From paul.sandoz at oracle.com  Thu Apr 11 05:09:58 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Thu, 11 Apr 2013 14:09:58 +0200
Subject: Whither FlatMapper?
In-Reply-To: <CB902830-3601-473E-BBDC-4C3088A7121F@gmail.com>
References: <5161F773.6050705@oracle.com>
	<CAMUF1SmgMq67e3kksT6QcDws1yitNNGSZLVDXd9Wu6tSFSKFsw@mail.gmail.com>
	<51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com>
	<51634C6A.1080301@oracle.com>
	<CB902830-3601-473E-BBDC-4C3088A7121F@gmail.com>
Message-ID: <6CCCD759-F544-48C8-9C79-62C316200B11@oracle.com>

An initial version of StreamBuilder has been pushed:

  http://hg.openjdk.java.net/lambda/lambda/jdk/rev/105d2c765fae

It is optimized for 0 and 1 elements (reused for singleton streams). In addition an optimization has been implemented when using forEach on the head of the stream. Those two optmizations should reduce the performance gap between the stream-based flatMap and the consumer-based flatMap.

Currently StreamBuilder does not allow for reuse, easy to add that though.

Paul.

On Apr 9, 2013, at 1:14 AM, Sam Pullara <spullara at gmail.com> wrote:

> That seems reasonable to me.
> 
> Sam
> 
> On Apr 8, 2013, at 4:02 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
>> Actually, there is an allocation-free path to get almost the Consumer-version performance with the non-consumer version, using the proposed StreamBuilder type (that also implements Spliterator and Stream, so "building" is allocation-free), and stuffing that into a ThreadLocal:
>> 
>> ThreadLocal<StreamBuilder> tl = ...
>> 
>> ...
>> 
>> stream.flatMap(e -> {
>>     StreamBuilder sb = tl.get();
>>     sb.init();
>>     // stuff elements into sb
>>     return sb.build();  // basically a no-op
>> });
>> 
>> So I recant my earlier statement that there's no efficient way to simulate the consumer form.  Its just ugly.
>> 
>> And the above can be captured by a wrapping helper:
>> 
>> Function<T, Stream<U>> = wrapWithThreadLocalStreamBuilder(
>>    (T t, Consumer<U> target) -> { /* old way */ });
>> 
>> So, I'm even more firmly in the "remove it" camp.
>> 
>> On 4/8/2013 4:05 PM, Brian Goetz wrote:
>>> A slight correction: if we remove the flatMap(FlatMapper), there is no
>>> fluent form that is as efficient as the removed form that accepts (T,
>>> Consumer<T>), since there's no other way to get your hands on the
>>> downstream Sink.  (Not that this dampens my enthusiasm for removing it
>>> much.)
>>> 
>>> For the truly diffident, a middle ground does exist: remove FlatMapper
>>> and its six brothers as a named SAM, and replace it with BiConsumer<T,
>>> Consumer<T>>, leaving both forms of flatMap methods in place:
>>>  flatMap(Function<T,STream<U>>)
>>>  flapMap(BiConsumer<T, Consumer<U>>)
>>> 
>>> The main advantage being that the package javadoc is not polluted by
>>> seven forms of FlatMapper.
>>> 
>>> On 4/8/2013 3:27 PM, Doug Lea wrote:
>>>> On 04/07/13 19:01, Sam Pullara wrote:
>>>>> I'm a big fan of the current FlatMapper stuff that takes a Consumer.
>>>>> Much more
>>>>> efficient and straightforward when you don't have a stream or
>>>>> collection to just
>>>>> return. Here is some code that uses 3 of them for good effect:
>>>> 
>>>> I think the main issue is whether, given the user reactions so far, we
>>>> should insist on people using a generally better but non-obvious
>>>> approach to flat-mapping. Considering that anyone *could* write their own
>>>> FlatMappers layered on top of existing functionality (we could
>>>> even show how to do it as a code example somewhere), I'm with
>>>> Brian on this: give people the obvious forms in the API. People
>>>> who are most likely to use it are the least likely to be obsessive
>>>> about its performance. And when they are, they can learn about
>>>> alternatives.
>>>> 
>>>> -Doug
>>>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130411/0e52a8e9/attachment-0001.html 

From mike.duigou at oracle.com  Thu Apr 11 09:32:09 2013
From: mike.duigou at oracle.com (Mike Duigou)
Date: Thu, 11 Apr 2013 09:32:09 -0700
Subject: Map Default Methods
Message-ID: <C985CBA6-79F4-46FE-9954-081EB704C641@oracle.com>

Hi Doug;

I wanted to call your attention to three points in the the current ongoing review of the proposed Map default methods.

- I've added an additional default getOrDefault() to ConcurrentMap which preserves the atomic behaviour of ConcurrentMap at the cost of not supporting null values in maps.

- I've changed the method documentation warning regarding synchronization, atomicity, concurrency. Please ensure that it still matches your intent:

     * <p>The default implementation makes no guarantees about synchronization
     * or atomicity properties of this method. Any class which wishes to provide
     * specific synchronization, atomicity or concurrency behaviour should
     * override this method.

- The retry behaviour of the compute(), computeIfPresent() and merge() defaults makes sense for concurrent maps but possibly not for non-concurrent maps. For non-concurrent maps the retry behaviour will mask concurrent usage errors. How do you feel about moving these defaults (along with computeIfAbsent()) to ConcurrentMap and providing implementations that generate ConcurrentModificationException for the Map defaults?

Thanks,

Mike

From brian.goetz at oracle.com  Thu Apr 11 09:49:24 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 11 Apr 2013 12:49:24 -0400
Subject: Map Default Methods
In-Reply-To: <C985CBA6-79F4-46FE-9954-081EB704C641@oracle.com>
References: <C985CBA6-79F4-46FE-9954-081EB704C641@oracle.com>
Message-ID: <5166E994.5000603@oracle.com>

If getOrDefault is going to hold up the train here, we should consider 
peeling it off and handle separately, since it was only added as a 
"while we're here" and not currently used by any of the code that this 
putback is blocking.

On 4/11/2013 12:32 PM, Mike Duigou wrote:
> Hi Doug;
>
> I wanted to call your attention to three points in the the current ongoing review of the proposed Map default methods.
>
> - I've added an additional default getOrDefault() to ConcurrentMap which preserves the atomic behaviour of ConcurrentMap at the cost of not supporting null values in maps.
>
> - I've changed the method documentation warning regarding synchronization, atomicity, concurrency. Please ensure that it still matches your intent:
>
>       * <p>The default implementation makes no guarantees about synchronization
>       * or atomicity properties of this method. Any class which wishes to provide
>       * specific synchronization, atomicity or concurrency behaviour should
>       * override this method.
>
> - The retry behaviour of the compute(), computeIfPresent() and merge() defaults makes sense for concurrent maps but possibly not for non-concurrent maps. For non-concurrent maps the retry behaviour will mask concurrent usage errors. How do you feel about moving these defaults (along with computeIfAbsent()) to ConcurrentMap and providing implementations that generate ConcurrentModificationException for the Map defaults?
>
> Thanks,
>
> Mike
>

From mike.duigou at oracle.com  Thu Apr 11 09:52:58 2013
From: mike.duigou at oracle.com (Mike Duigou)
Date: Thu, 11 Apr 2013 09:52:58 -0700
Subject: Map Default Methods
In-Reply-To: <5166E994.5000603@oracle.com>
References: <C985CBA6-79F4-46FE-9954-081EB704C641@oracle.com>
	<5166E994.5000603@oracle.com>
Message-ID: <565BCBEF-41E4-450A-9D36-B23A377CE9BD@oracle.com>

I don't think any of these are blockers for the current review. We can change our answers later in future commits.

Mike

On Apr 11 2013, at 09:49 , Brian Goetz wrote:

> If getOrDefault is going to hold up the train here, we should consider peeling it off and handle separately, since it was only added as a "while we're here" and not currently used by any of the code that this putback is blocking.
> 
> On 4/11/2013 12:32 PM, Mike Duigou wrote:
>> Hi Doug;
>> 
>> I wanted to call your attention to three points in the the current ongoing review of the proposed Map default methods.
>> 
>> - I've added an additional default getOrDefault() to ConcurrentMap which preserves the atomic behaviour of ConcurrentMap at the cost of not supporting null values in maps.
>> 
>> - I've changed the method documentation warning regarding synchronization, atomicity, concurrency. Please ensure that it still matches your intent:
>> 
>>      * <p>The default implementation makes no guarantees about synchronization
>>      * or atomicity properties of this method. Any class which wishes to provide
>>      * specific synchronization, atomicity or concurrency behaviour should
>>      * override this method.
>> 
>> - The retry behaviour of the compute(), computeIfPresent() and merge() defaults makes sense for concurrent maps but possibly not for non-concurrent maps. For non-concurrent maps the retry behaviour will mask concurrent usage errors. How do you feel about moving these defaults (along with computeIfAbsent()) to ConcurrentMap and providing implementations that generate ConcurrentModificationException for the Map defaults?
>> 
>> Thanks,
>> 
>> Mike
>> 


From dl at cs.oswego.edu  Thu Apr 11 10:31:19 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 11 Apr 2013 13:31:19 -0400
Subject: Map Default Methods
In-Reply-To: <C985CBA6-79F4-46FE-9954-081EB704C641@oracle.com>
References: <C985CBA6-79F4-46FE-9954-081EB704C641@oracle.com>
Message-ID: <5166F367.70202@cs.oswego.edu>

On 04/11/13 12:32, Mike Duigou wrote:

> - I've added an additional default getOrDefault() to ConcurrentMap which preserves the atomic behaviour of ConcurrentMap at the cost of not supporting null values in maps.
>

I suppose this is OK. As mentioned in some list discussion, the
unfortunate part is that ConcurrentMap does not explicitly ban
null either. So all this does is push the issue one level deeper.

On the other hand, all known implementations ban nulls because it
would be  stupid to support them -- for example putIfAbsent is
useless is such cases. So the on-paper issue doesn't have any
interesting impact.


> - I've changed the method documentation warning regarding synchronization, atomicity, concurrency. Please ensure that it still matches your intent:
>
>       * <p>The default implementation makes no guarantees about synchronization
>       * or atomicity properties of this method. Any class which wishes to provide
>       * specific synchronization, atomicity or concurrency behaviour should
>       * override this method.
>

Change to to avoid wishery:

... Any implementation providing atomicity guarantees must override this method
and document its concurrency properties.

-Doug


From brian.goetz at oracle.com  Thu Apr 11 10:51:00 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 11 Apr 2013 13:51:00 -0400
Subject: Dividing Streams.java
Message-ID: <5166F804.50101@oracle.com>

Joe quite correctly pointed out in the survey that Streams.java is a mix 
of two things for two audiences:

  - Utility methods for users to generate streams, like intRange()
  - Low level methods for library writers to generate streams from 
things like iterators or spliterators.

Merging them in one file is confusing, because users come away with the 
idea that writing spliterators is something they're supposed to do, 
whereas in reality, if we've done our jobs, they should never even be 
aware that spliterators exist.  So I think we should separate them into 
a "high level" and "low level" bag of tricks.

Since today, Paul has added some new ones:
  - singletonStream(v) (four flavors)
  - builder() (four flavors)

So, we have to identify appropriate homes for the two groupings, and 
separate them.  Here's a first cut at separating them:

High level:
   xxxRange
   xxxBuilder
   emptyXxxStream
   singletonXxxStream
   concat
   zip

Low level:
   all spliterator-related stream building methods

Not sure where (or even if):
   iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...)
   generate (infinite stream of independent applications of a generator, 
good for infinite constant and random streams, though not much else, 
used by impl of Random.{ints,longs,gaussians}).

Others that we've talked about adding:
   ints(), longs()  // to enable things like ints().filter(...).limit(n)
   indexedGenerate(i -> T)


I think the high-level stuff should stay in Streams.  So we need a name 
for the low-level stuff.  (Which also then becomes the right home for 
"how do I turn my data sturcture into a stream" doc.)

What should we call that?

From tim at peierls.net  Thu Apr 11 11:08:44 2013
From: tim at peierls.net (Tim Peierls)
Date: Thu, 11 Apr 2013 14:08:44 -0400
Subject: Dividing Streams.java
In-Reply-To: <5166F804.50101@oracle.com>
References: <5166F804.50101@oracle.com>
Message-ID: <CA+F8eeTo6sqjvE0H5x_eqiDAhTbBokgTNV_Z7KFQOmmdsn9yFQ@mail.gmail.com>

On Thu, Apr 11, 2013 at 1:51 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> I think the high-level stuff should stay in Streams.  So we need a name
> for the low-level stuff.  (Which also then becomes the right home for "how
> do I turn my data sturcture into a stream" doc.)
>
> What should we call that?
>

Streams.Internal

Never mind that they aren't really internal. It needs to sound like you're
breaking the manufacturer's seal if you use it.

And having it nested means it's not too far away, but not in your face if
you're looking at Streams.

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130411/1582215b/attachment.html 

From joe.bowbeer at gmail.com  Thu Apr 11 15:05:49 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 11 Apr 2013 15:05:49 -0700
Subject: Dividing Streams.java
In-Reply-To: <CA+F8eeTo6sqjvE0H5x_eqiDAhTbBokgTNV_Z7KFQOmmdsn9yFQ@mail.gmail.com>
References: <5166F804.50101@oracle.com>
	<CA+F8eeTo6sqjvE0H5x_eqiDAhTbBokgTNV_Z7KFQOmmdsn9yFQ@mail.gmail.com>
Message-ID: <CAHzJPEpQ=0N04U0hcOO5CQYxajn3uO3wMWfDgYFZYmYPrkF0aw@mail.gmail.com>

I would hide everything that mentions Spliterator (or descendant) in its
signature.

I would not hide infinite stream toys such as iterate or generate.  These
are easy to understand and use, even if they have limited use -- which is
not the case with doubleParallelStream and friends.


On Thu, Apr 11, 2013 at 11:08 AM, Tim Peierls <tim at peierls.net> wrote:

> On Thu, Apr 11, 2013 at 1:51 PM, Brian Goetz <brian.goetz at oracle.com>wrote:
>
>> I think the high-level stuff should stay in Streams.  So we need a name
>> for the low-level stuff.  (Which also then becomes the right home for "how
>> do I turn my data sturcture into a stream" doc.)
>>
>> What should we call that?
>>
>
> Streams.Internal
>
> Never mind that they aren't really internal. It needs to sound like you're
> breaking the manufacturer's seal if you use it.
>
> And having it nested means it's not too far away, but not in your face if
> you're looking at Streams.
>
> --tim
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130411/2f192a25/attachment.html 

From brian.goetz at oracle.com  Thu Apr 11 15:20:11 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 11 Apr 2013 18:20:11 -0400
Subject: Dividing Streams.java
In-Reply-To: <CAHzJPEpQ=0N04U0hcOO5CQYxajn3uO3wMWfDgYFZYmYPrkF0aw@mail.gmail.com>
References: <5166F804.50101@oracle.com>
	<CA+F8eeTo6sqjvE0H5x_eqiDAhTbBokgTNV_Z7KFQOmmdsn9yFQ@mail.gmail.com>
	<CAHzJPEpQ=0N04U0hcOO5CQYxajn3uO3wMWfDgYFZYmYPrkF0aw@mail.gmail.com>
Message-ID: <5167371B.9020605@oracle.com>

StreamImplementors ?
StreamViews ?
SpliteratorToStream ?


On 4/11/2013 6:05 PM, Joe Bowbeer wrote:
> I would hide everything that mentions Spliterator (or descendant) in its
> signature.
>
> I would not hide infinite stream toys such as iterate or generate.
>   These are easy to understand and use, even if they have limited use --
> which is not the case with doubleParallelStream and friends.
>
>
> On Thu, Apr 11, 2013 at 11:08 AM, Tim Peierls <tim at peierls.net
> <mailto:tim at peierls.net>> wrote:
>
>     On Thu, Apr 11, 2013 at 1:51 PM, Brian Goetz <brian.goetz at oracle.com
>     <mailto:brian.goetz at oracle.com>> wrote:
>
>         I think the high-level stuff should stay in Streams.  So we need
>         a name for the low-level stuff.  (Which also then becomes the
>         right home for "how do I turn my data sturcture into a stream" doc.)
>
>         What should we call that?
>
>
>     Streams.Internal
>
>     Never mind that they aren't really internal. It needs to sound like
>     you're breaking the manufacturer's seal if you use it.
>
>     And having it nested means it's not too far away, but not in your
>     face if you're looking at Streams.
>
>     --tim
>
>

From dl at cs.oswego.edu  Thu Apr 11 15:49:28 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 11 Apr 2013 18:49:28 -0400
Subject: Dividing Streams.java
In-Reply-To: <5166F804.50101@oracle.com>
References: <5166F804.50101@oracle.com>
Message-ID: <51673DF8.1050301@cs.oswego.edu>

On 04/11/13 13:51, Brian Goetz wrote:
> Joe quite correctly pointed out in the survey that Streams.java is a mix of two
> things for two audiences:
>
>   - Utility methods for users to generate streams, like intRange()
>   - Low level methods for library writers to generate streams from things like
> iterators or spliterators.
>

I'm not too tempted by this. Classes Collections and Arrays have lots
of stuff and people don't seem to complain.

-Doug


From joe.bowbeer at gmail.com  Thu Apr 11 16:01:44 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 11 Apr 2013 16:01:44 -0700
Subject: Dividing Streams.java
In-Reply-To: <51673DF8.1050301@cs.oswego.edu>
References: <5166F804.50101@oracle.com>
	<51673DF8.1050301@cs.oswego.edu>
Message-ID: <CAHzJPEqCAvWTKzLZWObdmysJ8MMnA5VBfm1CuwXWbCe1qoSahw@mail.gmail.com>

But I am (and represent) Joe Programmer, and I've already complained :O

At the top of the list is the confusing name doubleParallelStream, which
does not create two parallel streams!

It's very difficult to find anything useful in there, and the ones that
take Spliterator arguments are a devil to figure out how to use, which adds
to Joe's frustration.

Simply removing everything that references a spliterator thing cleans it up
a lot.


On Thu, Apr 11, 2013 at 3:49 PM, Doug Lea <dl at cs.oswego.edu> wrote:

> On 04/11/13 13:51, Brian Goetz wrote:
>
>> Joe quite correctly pointed out in the survey that Streams.java is a mix
>> of two
>> things for two audiences:
>>
>>   - Utility methods for users to generate streams, like intRange()
>>   - Low level methods for library writers to generate streams from things
>> like
>> iterators or spliterators.
>>
>>
> I'm not too tempted by this. Classes Collections and Arrays have lots
> of stuff and people don't seem to complain.
>
> -Doug
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130411/55499bfb/attachment.html 

From tim at peierls.net  Thu Apr 11 16:13:44 2013
From: tim at peierls.net (Tim Peierls)
Date: Thu, 11 Apr 2013 19:13:44 -0400
Subject: Dividing Streams.java
In-Reply-To: <CAHzJPEqCAvWTKzLZWObdmysJ8MMnA5VBfm1CuwXWbCe1qoSahw@mail.gmail.com>
References: <5166F804.50101@oracle.com> <51673DF8.1050301@cs.oswego.edu>
	<CAHzJPEqCAvWTKzLZWObdmysJ8MMnA5VBfm1CuwXWbCe1qoSahw@mail.gmail.com>
Message-ID: <CA+F8eeTou7k6SSQ7twOe7Mr5H2p1P6D2mHq_t+6Ah63G2CRMxQ@mail.gmail.com>

Agreed. What harm is there in parceling the Spliterator stuff off?

On Thu, Apr 11, 2013 at 7:01 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> But I am (and represent) Joe Programmer, and I've already complained :O
>
> At the top of the list is the confusing name doubleParallelStream, which
> does not create two parallel streams!
>
> It's very difficult to find anything useful in there, and the ones that
> take Spliterator arguments are a devil to figure out how to use, which adds
> to Joe's frustration.
>
> Simply removing everything that references a spliterator thing cleans it
> up a lot.
>
>
> On Thu, Apr 11, 2013 at 3:49 PM, Doug Lea <dl at cs.oswego.edu> wrote:
>
>> On 04/11/13 13:51, Brian Goetz wrote:
>>
>>> Joe quite correctly pointed out in the survey that Streams.java is a mix
>>> of two
>>> things for two audiences:
>>>
>>>   - Utility methods for users to generate streams, like intRange()
>>>   - Low level methods for library writers to generate streams from
>>> things like
>>> iterators or spliterators.
>>>
>>>
>> I'm not too tempted by this. Classes Collections and Arrays have lots
>> of stuff and people don't seem to complain.
>>
>> -Doug
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130411/71d0d588/attachment.html 

From dl at cs.oswego.edu  Thu Apr 11 16:20:28 2013
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 11 Apr 2013 19:20:28 -0400
Subject: Dividing Streams.java
In-Reply-To: <5166F804.50101@oracle.com>
References: <5166F804.50101@oracle.com>
Message-ID: <5167453C.8070102@cs.oswego.edu>

On 04/11/13 13:51, Brian Goetz wrote:
> What should we call that?

Still not too tempted, but I don't care enough to argue.

One name with precedent is StreamSupport (like j.u.c.locks.LockSupport).

-Doug


From tim at peierls.net  Thu Apr 11 16:24:53 2013
From: tim at peierls.net (Tim Peierls)
Date: Thu, 11 Apr 2013 19:24:53 -0400
Subject: Dividing Streams.java
In-Reply-To: <5167453C.8070102@cs.oswego.edu>
References: <5166F804.50101@oracle.com>
	<5167453C.8070102@cs.oswego.edu>
Message-ID: <CA+F8eeQ7gjGHN4Vose3M2hAxxa9Bp6C2BG6aK5p4UN1XtmQ4GA@mail.gmail.com>

I like StreamSupport.

--tim

On Thu, Apr 11, 2013 at 7:20 PM, Doug Lea <dl at cs.oswego.edu> wrote:

> On 04/11/13 13:51, Brian Goetz wrote:
>
>> What should we call that?
>>
>
> Still not too tempted, but I don't care enough to argue.
>
> One name with precedent is StreamSupport (like j.u.c.locks.LockSupport).
>
> -Doug
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130411/f86ff6e2/attachment.html 

From joe.bowbeer at gmail.com  Thu Apr 11 16:39:51 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Thu, 11 Apr 2013 16:39:51 -0700
Subject: Dividing Streams.java
In-Reply-To: <CA+F8eeQ7gjGHN4Vose3M2hAxxa9Bp6C2BG6aK5p4UN1XtmQ4GA@mail.gmail.com>
References: <5166F804.50101@oracle.com> <5167453C.8070102@cs.oswego.edu>
	<CA+F8eeQ7gjGHN4Vose3M2hAxxa9Bp6C2BG6aK5p4UN1XtmQ4GA@mail.gmail.com>
Message-ID: <CAHzJPEoKErsDrzuCOef54DWZvcvfDdvKjhbgqtKUzfTYA6i0rA@mail.gmail.com>

I also like StreamSupport


On Thu, Apr 11, 2013 at 4:24 PM, Tim Peierls <tim at peierls.net> wrote:

> I like StreamSupport.
>
> --tim
>
>
> On Thu, Apr 11, 2013 at 7:20 PM, Doug Lea <dl at cs.oswego.edu> wrote:
>
>> On 04/11/13 13:51, Brian Goetz wrote:
>>
>>> What should we call that?
>>>
>>
>> Still not too tempted, but I don't care enough to argue.
>>
>> One name with precedent is StreamSupport (like j.u.c.locks.LockSupport).
>>
>> -Doug
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130411/b27ec6eb/attachment-0001.html 

From brian.goetz at oracle.com  Thu Apr 11 17:14:38 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 11 Apr 2013 20:14:38 -0400
Subject: Dividing Streams.java
In-Reply-To: <CAHzJPEqCAvWTKzLZWObdmysJ8MMnA5VBfm1CuwXWbCe1qoSahw@mail.gmail.com>
References: <5166F804.50101@oracle.com> <51673DF8.1050301@cs.oswego.edu>
	<CAHzJPEqCAvWTKzLZWObdmysJ8MMnA5VBfm1CuwXWbCe1qoSahw@mail.gmail.com>
Message-ID: <516751EE.7090207@oracle.com>

I'm with Joe on this one.  Because streams are new, people are looking 
for the best way to get the stream they want, and often settle 
(incorrectly) on "I guess I have to write a spliterator."  Which is not 
the best way to have a good experience.

We have the same problem with docs.  There's a whole lot of 
documentation (that needs to be written) for people writing 
spliterators, that is totally confusing and overwhelming for people who 
just want an integer range.

On 4/11/2013 7:01 PM, Joe Bowbeer wrote:
> But I am (and represent) Joe Programmer, and I've already complained :O
>
> At the top of the list is the confusing name doubleParallelStream, which
> does not create two parallel streams!
>
> It's very difficult to find anything useful in there, and the ones that
> take Spliterator arguments are a devil to figure out how to use, which
> adds to Joe's frustration.
>
> Simply removing everything that references a spliterator thing cleans it
> up a lot.
>
>
> On Thu, Apr 11, 2013 at 3:49 PM, Doug Lea <dl at cs.oswego.edu
> <mailto:dl at cs.oswego.edu>> wrote:
>
>     On 04/11/13 13:51, Brian Goetz wrote:
>
>         Joe quite correctly pointed out in the survey that Streams.java
>         is a mix of two
>         things for two audiences:
>
>            - Utility methods for users to generate streams, like intRange()
>            - Low level methods for library writers to generate streams
>         from things like
>         iterators or spliterators.
>
>
>     I'm not too tempted by this. Classes Collections and Arrays have lots
>     of stuff and people don't seem to complain.
>
>     -Doug
>
>

From brian.goetz at oracle.com  Thu Apr 11 17:15:10 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 11 Apr 2013 20:15:10 -0400
Subject: Dividing Streams.java
In-Reply-To: <CAHzJPEoKErsDrzuCOef54DWZvcvfDdvKjhbgqtKUzfTYA6i0rA@mail.gmail.com>
References: <5166F804.50101@oracle.com> <5167453C.8070102@cs.oswego.edu>
	<CA+F8eeQ7gjGHN4Vose3M2hAxxa9Bp6C2BG6aK5p4UN1XtmQ4GA@mail.gmail.com>
	<CAHzJPEoKErsDrzuCOef54DWZvcvfDdvKjhbgqtKUzfTYA6i0rA@mail.gmail.com>
Message-ID: <5167520E.6010206@oracle.com>

Better than anything I came up with!

StreamSupport it is.

On 4/11/2013 7:39 PM, Joe Bowbeer wrote:
> I also like StreamSupport
>
>
> On Thu, Apr 11, 2013 at 4:24 PM, Tim Peierls <tim at peierls.net
> <mailto:tim at peierls.net>> wrote:
>
>     I like StreamSupport.
>
>     --tim
>
>
>     On Thu, Apr 11, 2013 at 7:20 PM, Doug Lea <dl at cs.oswego.edu
>     <mailto:dl at cs.oswego.edu>> wrote:
>
>         On 04/11/13 13:51, Brian Goetz wrote:
>
>             What should we call that?
>
>
>         Still not too tempted, but I don't care enough to argue.
>
>         One name with precedent is StreamSupport (like
>         j.u.c.locks.LockSupport).
>
>         -Doug
>
>
>
>
>

From brian.goetz at oracle.com  Thu Apr 11 17:37:32 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 11 Apr 2013 20:37:32 -0400
Subject: Dividing Streams.java
In-Reply-To: <CAHzJPEoKErsDrzuCOef54DWZvcvfDdvKjhbgqtKUzfTYA6i0rA@mail.gmail.com>
References: <5166F804.50101@oracle.com> <5167453C.8070102@cs.oswego.edu>
	<CA+F8eeQ7gjGHN4Vose3M2hAxxa9Bp6C2BG6aK5p4UN1XtmQ4GA@mail.gmail.com>
	<CAHzJPEoKErsDrzuCOef54DWZvcvfDdvKjhbgqtKUzfTYA6i0rA@mail.gmail.com>
Message-ID: <5167574C.7090902@oracle.com>

Done.

On 4/11/2013 7:39 PM, Joe Bowbeer wrote:
> I also like StreamSupport
>
>
> On Thu, Apr 11, 2013 at 4:24 PM, Tim Peierls <tim at peierls.net
> <mailto:tim at peierls.net>> wrote:
>
>     I like StreamSupport.
>
>     --tim
>
>
>     On Thu, Apr 11, 2013 at 7:20 PM, Doug Lea <dl at cs.oswego.edu
>     <mailto:dl at cs.oswego.edu>> wrote:
>
>         On 04/11/13 13:51, Brian Goetz wrote:
>
>             What should we call that?
>
>
>         Still not too tempted, but I don't care enough to argue.
>
>         One name with precedent is StreamSupport (like
>         j.u.c.locks.LockSupport).
>
>         -Doug
>
>
>
>
>

From brian.goetz at oracle.com  Thu Apr 11 18:13:04 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 11 Apr 2013 21:13:04 -0400
Subject: Dividing Streams.java
In-Reply-To: <5166F804.50101@oracle.com>
References: <5166F804.50101@oracle.com>
Message-ID: <51675FA0.3080800@oracle.com>

> Not sure where (or even if):
>    iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...)
>    generate (infinite stream of independent applications of a generator,
> good for infinite constant and random streams, though not much else,
> used by impl of Random.{ints,longs,gaussians}).

Anyone want to argue for narrowing or expanding this list?

> Others that we've talked about adding:
>    ints(), longs()  // to enable things like ints().filter(...).limit(n)

Anyone compelled by these?  I kind of like them.

Do we want to add inclusive as well as half-open ranges?

>    indexedGenerate(i -> T)

Anyone compelled by this one?


From paul.sandoz at oracle.com  Fri Apr 12 07:50:49 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Fri, 12 Apr 2013 16:50:49 +0200
Subject: Streams.generate: infinite or finite?
Message-ID: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com>

Hi,

Currently Streams.generate produces an infinite stream. This is theoretically nice but splits poorly (right-balanced trees). 

Implementation-wise Streams.generate creates a spliterator from an iterator:

    public static<T> Stream<T> generate(Supplier<T> s) {
        Objects.requireNonNull(s);
        InfiniteIterator<T> iterator = s::get;
        return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
                iterator,
                Spliterator.ORDERED | Spliterator.IMMUTABLE));
    }

The method is used in java.util.Random:

    public IntStream ints() {
        return Streams.generateInt(this::nextInt);
    }

There might be a nasty surprise in store for developers that expect the randomly generated stream of int values to have the best parallel performance.


We can change Streams.generate to be finite (or not know to be finite in the time allotted to do some computation) by implementing as follows:

    public static<T> Stream<T> generate(Supplier<T> s) {
      return Streams.longRange(0, Long.MAX_VALUE).mapToObj(i -> s.get());
    }

This will yield better parallel performance because the splits are balanced. 

We can further change to:

    public static<T> Stream<T> generate(Supplier<T> s) {
      return Streams.longs().mapToObj(i -> s.get());
    }

if we introduce the longs() idiom.


I think we should go finite! and add Streams.longs().  Agree? or disagree?

Then it is actually questionable if Streams.generate should exist at all. It does have some pedagogic value since the idiom Streams.longs().map() may not be obvious. So i would be mostly inclined to keep it for that reason.

Paul.

From brian.goetz at oracle.com  Fri Apr 12 08:14:45 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 12 Apr 2013 11:14:45 -0400
Subject: Streams.generate: infinite or finite?
In-Reply-To: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com>
References: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com>
Message-ID: <516824E5.6000409@oracle.com>

I think this is slightly unfortunate but I think we're probably stuck 
doing it anyway.  The theoretical benefit of generating an infinite 
stream is not really worth the very real cost to people trying to use 
these in parallel and getting surprising performance.

+1 on ints(), longs()
+1 on making these finite
+1 on making generate(f) essentially be longs().map(f::get)

On 4/12/2013 10:50 AM, Paul Sandoz wrote:
> Hi,
>
> Currently Streams.generate produces an infinite stream. This is theoretically nice but splits poorly (right-balanced trees).
>
> Implementation-wise Streams.generate creates a spliterator from an iterator:
>
>      public static<T> Stream<T> generate(Supplier<T> s) {
>          Objects.requireNonNull(s);
>          InfiniteIterator<T> iterator = s::get;
>          return StreamSupport.stream(Spliterators.spliteratorUnknownSize(
>                  iterator,
>                  Spliterator.ORDERED | Spliterator.IMMUTABLE));
>      }
>
> The method is used in java.util.Random:
>
>      public IntStream ints() {
>          return Streams.generateInt(this::nextInt);
>      }
>
> There might be a nasty surprise in store for developers that expect the randomly generated stream of int values to have the best parallel performance.
>
>
> We can change Streams.generate to be finite (or not know to be finite in the time allotted to do some computation) by implementing as follows:
>
>      public static<T> Stream<T> generate(Supplier<T> s) {
>        return Streams.longRange(0, Long.MAX_VALUE).mapToObj(i -> s.get());
>      }
>
> This will yield better parallel performance because the splits are balanced.
>
> We can further change to:
>
>      public static<T> Stream<T> generate(Supplier<T> s) {
>        return Streams.longs().mapToObj(i -> s.get());
>      }
>
> if we introduce the longs() idiom.
>
>
> I think we should go finite! and add Streams.longs().  Agree? or disagree?
>
> Then it is actually questionable if Streams.generate should exist at all. It does have some pedagogic value since the idiom Streams.longs().map() may not be obvious. So i would be mostly inclined to keep it for that reason.
>
> Paul.
>

From brian.goetz at oracle.com  Fri Apr 12 14:22:22 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Fri, 12 Apr 2013 17:22:22 -0400
Subject: StreamBuilder
Message-ID: <51687B0E.2030602@oracle.com>

In the wake of taking away flatMap(FlatMapper), we had to provide a way 
for people to build streams by generation.  For object-valued streams, 
they could just use an ArrayList, but for primitive-valued streams, 
there's no easy buffering tool.  (Hopefully also we can make 
StreamBuffer more efficient that ArrayList (at least it doesn't have to 
copy elements on resize)).

What we've got now is:

interface StreamBuilder<T> extends Consumer<T> {
     Stream<T> build();
}

with nested specializations for OfInt, OfLong, OfDouble.

and factories in Streams to get one:

     static<T> StreamBuilder<T> builder();

Someone commented that it wasn't obvious that StreamBuilder is just a 
buffer, and the Stream class itself is a sort of builder for streams 
(you add stages one by one), so maybe a better name might be 
StreamBuffer?  And I guess the corresponding factories are 
Streams.makeBuffer()?  .newBuffer()?


From brian.goetz at oracle.com  Sat Apr 13 08:24:02 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 13 Apr 2013 11:24:02 -0400
Subject: Stream constructors for stream(Iterator) in StreamSupport?
Message-ID: <51697892.5010205@oracle.com>

Currently StreamSupport contains seq/par versions of
   stream(Spliterator)
   stream(Supplier<Spliterator>)
for ref/int/long/double.

In java.util.Spliterators, there are adapters to turn an Iterator into a 
Spliterator.

I think we should add convenience factories for

   stream(Iterator)

to StreamSupport as well.

From tim at peierls.net  Sat Apr 13 09:06:40 2013
From: tim at peierls.net (Tim Peierls)
Date: Sat, 13 Apr 2013 12:06:40 -0400
Subject: Stream constructors for stream(Iterator) in StreamSupport?
In-Reply-To: <51697892.5010205@oracle.com>
References: <51697892.5010205@oracle.com>
Message-ID: <CA+F8eeR383TGDE6Vhmc2yZbx5mkvM6Pu5PjZupQ6tMjDwfRcjw@mail.gmail.com>

Doesn't that seem like something that belongs in Streams? If you're stuck
with a legacy API that exposes Iterator but not Iterable, you'd still want
to be able to make a Stream out of it, and you wouldn't want to have to
look in StreamSupport for that. It's a lot different from
stream(Spliterator).

On Sat, Apr 13, 2013 at 11:24 AM, Brian Goetz <brian.goetz at oracle.com>wrote:

> Currently StreamSupport contains seq/par versions of
>   stream(Spliterator)
>   stream(Supplier<Spliterator>)
> for ref/int/long/double.
>
> In java.util.Spliterators, there are adapters to turn an Iterator into a
> Spliterator.
>
> I think we should add convenience factories for
>
>   stream(Iterator)
>
> to StreamSupport as well.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130413/da2e345f/attachment.html 

From brian.goetz at oracle.com  Sat Apr 13 12:25:54 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 13 Apr 2013 15:25:54 -0400
Subject: Stream constructors for stream(Iterator) in StreamSupport?
In-Reply-To: <CA+F8eeR383TGDE6Vhmc2yZbx5mkvM6Pu5PjZupQ6tMjDwfRcjw@mail.gmail.com>
References: <51697892.5010205@oracle.com>
	<CA+F8eeR383TGDE6Vhmc2yZbx5mkvM6Pu5PjZupQ6tMjDwfRcjw@mail.gmail.com>
Message-ID: <5169B142.6000002@oracle.com>

Good question.  Here's my reasoning about why I thought it lives better 
in SS than S; let me know if you find this argument compelling.  (Also, 
this speaks to an area currently missing in the docs.)

There are lots of ways to make a stream, and some are better than 
others.  The absolute worst is via an Iterator.

Best way is to get one from your data source directly (e.g., 
ArrayList.stream()).  The streams provided by collections and other JDK 
classes have highly optimized spliterators (thanks Doug!), work directly 
with knowledge of the data structure, are late-binding to minimize 
CME-like interference, and preserve the most information (such as 
sorted-ness, sized-ness, distinct-ness) that the streams framework can 
use directly to optimize execution.

The next best way is via one of the factories in Streams -- things like 
intRange, iterate, generate.  These are mire flexible than they first 
appear; for example, if you have a function int -> T, and you want to 
generate a sequence of f(0), f(1), ... f(n) in a parallel-friendly way, 
you can just do:

   intRange(0, n).map(f);

The next best way is via a Spliterator that properly declares its 
properties, is SIZED, SUBSIZED, and has a good trySplit implementation. 
  These will ensure that things decompose well.  Many of the JDK 
spliterators have these characteristics.

We then slide down the scale of spliterator quality; SUBSIZED is 
probably the first to go, then SIZED, then trySplit.  As the spliterator 
quality degrades, the quality of decomposition and opportunity for 
pipeline optimization degrades too.

We then come to the bottom of the barrel, iterators.  Making a 
Spliterator from an iterator sucks in at least the following ways:
  - Splitting will suck.  We can still extract some parallelism for 
high-Q problems, but it will never be good, placing a lid on how much 
parallelism you can get.
  - Iterators throw away a lot of useful information about the 
underlying data source, such as its size.  It may be that whoever wrote 
the Iterator knows the size, but the Iterator does not.  (We've got an 
iterator+size to spliterator conversion, but that's brittle because of 
"early binding" to the size information.)
  - Element access overhead.  One of the reasons for doing Spliterator 
is that Iterator sucks so badly!  (High per-element cost; two method 
calls per element, often with redundant computation due to required 
defensive coding; Iterator protocol often requires lookahead and 
buffering; inherent race between hasNext() and next().)  So you're 
taking a sucky way to get elements out of a source, and wrapping it with 
more junk.


So, while Iterator to Stream is still a fine last resort, putting it in 
Streams will likely have the unfortunate effect of guiding users to the 
worst way of making a stream, without fully understanding the tradeoffs.


On 4/13/2013 12:06 PM, Tim Peierls wrote:
> Doesn't that seem like something that belongs in Streams? If you're
> stuck with a legacy API that exposes Iterator but not Iterable, you'd
> still want to be able to make a Stream out of it, and you wouldn't want
> to have to look in StreamSupport for that. It's a lot different from
> stream(Spliterator).
>
> On Sat, Apr 13, 2013 at 11:24 AM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     Currently StreamSupport contains seq/par versions of
>        stream(Spliterator)
>        stream(Supplier<Spliterator>)
>     for ref/int/long/double.
>
>     In java.util.Spliterators, there are adapters to turn an Iterator
>     into a Spliterator.
>
>     I think we should add convenience factories for
>
>        stream(Iterator)
>
>     to StreamSupport as well.
>
>

From tim at peierls.net  Sat Apr 13 13:57:20 2013
From: tim at peierls.net (Tim Peierls)
Date: Sat, 13 Apr 2013 16:57:20 -0400
Subject: Stream constructors for stream(Iterator) in StreamSupport?
In-Reply-To: <5169B142.6000002@oracle.com>
References: <51697892.5010205@oracle.com>
	<CA+F8eeR383TGDE6Vhmc2yZbx5mkvM6Pu5PjZupQ6tMjDwfRcjw@mail.gmail.com>
	<5169B142.6000002@oracle.com>
Message-ID: <CA+F8eeSc6r-JY=atnuZzcTmyqS-476xmvL1UKQUJ+hkkfDCbAQ@mail.gmail.com>

On Sat, Apr 13, 2013 at 3:25 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> There are lots of ways to make a stream, and some are better than others.
>  ...
>
> Best way is to get one from your data source directly (e.g.,
> ArrayList.stream()).  ...
> The next best way is via one of the factories in Streams -- things like
> intRange, iterate, generate. ...
> The next best way is via a Spliterator that properly declares its
> properties, is SIZED, SUBSIZED, and has a good trySplit implementation.
> We then slide down the scale of spliterator quality; ...
> We then come to the bottom of the barrel, iterators.  ...
> So, while Iterator to Stream is still a fine last resort, putting it in
> Streams will likely have the unfortunate effect of guiding users to the
> worst way of making a stream, without fully understanding the tradeoffs.


That's a great taxonomy of ways to make a stream, but the division of
static factory methods into Streams and StreamSupport wasn't, as I
understood it, along those lines. It was about keeping concepts that most
users aren't going to want to mess with (i.e., Spliterator) out of their
line of sight.

If all you have is an Iterator, you don't want have to go down into the
basement to get something that turns it into a Stream. Put those tradeoffs
on the packaging but leave the package in the kitchen.

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130413/a056c0f0/attachment.html 

From brian.goetz at oracle.com  Sat Apr 13 14:15:30 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 13 Apr 2013 17:15:30 -0400
Subject: Stream constructors for stream(Iterator) in StreamSupport?
In-Reply-To: <CA+F8eeSc6r-JY=atnuZzcTmyqS-476xmvL1UKQUJ+hkkfDCbAQ@mail.gmail.com>
References: <51697892.5010205@oracle.com>
	<CA+F8eeR383TGDE6Vhmc2yZbx5mkvM6Pu5PjZupQ6tMjDwfRcjw@mail.gmail.com>
	<5169B142.6000002@oracle.com>
	<CA+F8eeSc6r-JY=atnuZzcTmyqS-476xmvL1UKQUJ+hkkfDCbAQ@mail.gmail.com>
Message-ID: <5169CAF2.5070109@oracle.com>

> That's a great taxonomy of ways to make a stream, but the division of
> static factory methods into Streams and StreamSupport wasn't, as I
> understood it, along those lines. It was about keeping concepts that
> most users aren't going to want to mess with (i.e., Spliterator) out of
> their line of sight.

That was indeed part of it.  But the other part of it was guiding them 
away from low-level tools they might mistakenly (mis)use because we'd 
not sufficiently labeled things into bins of "for users" and "for 
library writers."  I still think stream-from-iterator is low-level, 
because it involves choosing things like stream flags (and doing it 
wrong will have bad results.)

> If all you have is an Iterator, you don't want have to go down into the
> basement to get something that turns it into a Stream. Put those
> tradeoffs on the packaging but leave the package in the kitchen.

So, the tension here is:
  - helping the poor users for whom all they can get is an Iterator, and 
they want a stream;
  - avoiding the moral hazard of encouraging people to think that 
Iterator is actually a *good* way to make a stream (which might even 
encourage them to write more Iterators!)  I want them to think is 
"Iterator is the last possible resort for making a stream, including a 
number of resorts that I should learn about first before writing an 
Iterator."

The current status quo is either better or worse in this, depending on 
which of the two above forces you are more compelled by.  The way to 
make a Stream from an iterator currently is:

   Streams.stream(Spliterators.spliteratorUnknownSize(iterator, flags));
   Streams.stream(Spliterators.spliterator(iterator, size, flags));

Which do the job but suffer from poor discoverability.  On the other 
hand, it has none of the moral hazard -- its pretty clear you're nailing 
bags on bags, and I don't think this status quo is so awful.


Another direction (as discussed previously without convergence) would be 
to augment Iterable with a stream() method.  This helps users of 
non-Collection Iterable classes, but still has some of the moral hazard 
as it does not put enough pressure on writers of Iterable classes to 
write better stream() implementations.


From joe.bowbeer at gmail.com  Sat Apr 13 14:32:05 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sat, 13 Apr 2013 14:32:05 -0700
Subject: Stream constructors for stream(Iterator) in StreamSupport?
In-Reply-To: <5169CAF2.5070109@oracle.com>
References: <51697892.5010205@oracle.com>
	<CA+F8eeR383TGDE6Vhmc2yZbx5mkvM6Pu5PjZupQ6tMjDwfRcjw@mail.gmail.com>
	<5169B142.6000002@oracle.com>
	<CA+F8eeSc6r-JY=atnuZzcTmyqS-476xmvL1UKQUJ+hkkfDCbAQ@mail.gmail.com>
	<5169CAF2.5070109@oracle.com>
Message-ID: <CAHzJPEpy+vyZN-5yvd_bggscW5a+Aq_5G5Zp8tzrTx0e55wxiQ@mail.gmail.com>

I think the signature fits better in support. (The default method
alternative negates this.) However another argument is based on the
expected users. If they are not library writers then it should not go in
the support class.
On Apr 13, 2013 2:15 PM, "Brian Goetz" <brian.goetz at oracle.com> wrote:

> That's a great taxonomy of ways to make a stream, but the division of
>> static factory methods into Streams and StreamSupport wasn't, as I
>> understood it, along those lines. It was about keeping concepts that
>> most users aren't going to want to mess with (i.e., Spliterator) out of
>> their line of sight.
>>
>
> That was indeed part of it.  But the other part of it was guiding them
> away from low-level tools they might mistakenly (mis)use because we'd not
> sufficiently labeled things into bins of "for users" and "for library
> writers."  I still think stream-from-iterator is low-level, because it
> involves choosing things like stream flags (and doing it wrong will have
> bad results.)
>
>  If all you have is an Iterator, you don't want have to go down into the
>> basement to get something that turns it into a Stream. Put those
>> tradeoffs on the packaging but leave the package in the kitchen.
>>
>
> So, the tension here is:
>  - helping the poor users for whom all they can get is an Iterator, and
> they want a stream;
>  - avoiding the moral hazard of encouraging people to think that Iterator
> is actually a *good* way to make a stream (which might even encourage them
> to write more Iterators!)  I want them to think is "Iterator is the last
> possible resort for making a stream, including a number of resorts that I
> should learn about first before writing an Iterator."
>
> The current status quo is either better or worse in this, depending on
> which of the two above forces you are more compelled by.  The way to make a
> Stream from an iterator currently is:
>
>   Streams.stream(Spliterators.**spliteratorUnknownSize(**iterator,
> flags));
>   Streams.stream(Spliterators.**spliterator(iterator, size, flags));
>
> Which do the job but suffer from poor discoverability.  On the other hand,
> it has none of the moral hazard -- its pretty clear you're nailing bags on
> bags, and I don't think this status quo is so awful.
>
>
> Another direction (as discussed previously without convergence) would be
> to augment Iterable with a stream() method.  This helps users of
> non-Collection Iterable classes, but still has some of the moral hazard as
> it does not put enough pressure on writers of Iterable classes to write
> better stream() implementations.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130413/fae89baf/attachment.html 

From brian.goetz at oracle.com  Sat Apr 13 15:02:55 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 13 Apr 2013 18:02:55 -0400
Subject: Stream constructors for stream(Iterator) in StreamSupport?
In-Reply-To: <CAHzJPEpy+vyZN-5yvd_bggscW5a+Aq_5G5Zp8tzrTx0e55wxiQ@mail.gmail.com>
References: <51697892.5010205@oracle.com>
	<CA+F8eeR383TGDE6Vhmc2yZbx5mkvM6Pu5PjZupQ6tMjDwfRcjw@mail.gmail.com>
	<5169B142.6000002@oracle.com>
	<CA+F8eeSc6r-JY=atnuZzcTmyqS-476xmvL1UKQUJ+hkkfDCbAQ@mail.gmail.com>
	<5169CAF2.5070109@oracle.com>
	<CAHzJPEpy+vyZN-5yvd_bggscW5a+Aq_5G5Zp8tzrTx0e55wxiQ@mail.gmail.com>
Message-ID: <5169D60F.3010300@oracle.com>

I think Tim's concern is that he recognizes two categories of potential 
users:

  - Library writers who want to expose a stream() method but are not 
ready to take the plunge to Spliterator;
  - Poor users who want a stream and all they can get out of their damn 
library is an Iterator.

The question is, is there a way to not hose the second category without 
making things worse?

On 4/13/2013 5:32 PM, Joe Bowbeer wrote:
> I think the signature fits better in support. (The default method
> alternative negates this.) However another argument is based on the
> expected users. If they are not library writers then it should not go in
> the support class.
>
> On Apr 13, 2013 2:15 PM, "Brian Goetz" <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>         That's a great taxonomy of ways to make a stream, but the
>         division of
>         static factory methods into Streams and StreamSupport wasn't, as I
>         understood it, along those lines. It was about keeping concepts that
>         most users aren't going to want to mess with (i.e., Spliterator)
>         out of
>         their line of sight.
>
>
>     That was indeed part of it.  But the other part of it was guiding
>     them away from low-level tools they might mistakenly (mis)use
>     because we'd not sufficiently labeled things into bins of "for
>     users" and "for library writers."  I still think
>     stream-from-iterator is low-level, because it involves choosing
>     things like stream flags (and doing it wrong will have bad results.)
>
>         If all you have is an Iterator, you don't want have to go down
>         into the
>         basement to get something that turns it into a Stream. Put those
>         tradeoffs on the packaging but leave the package in the kitchen.
>
>
>     So, the tension here is:
>       - helping the poor users for whom all they can get is an Iterator,
>     and they want a stream;
>       - avoiding the moral hazard of encouraging people to think that
>     Iterator is actually a *good* way to make a stream (which might even
>     encourage them to write more Iterators!)  I want them to think is
>     "Iterator is the last possible resort for making a stream, including
>     a number of resorts that I should learn about first before writing
>     an Iterator."
>
>     The current status quo is either better or worse in this, depending
>     on which of the two above forces you are more compelled by.  The way
>     to make a Stream from an iterator currently is:
>
>        Streams.stream(Spliterators.__spliteratorUnknownSize(__iterator,
>     flags));
>        Streams.stream(Spliterators.__spliterator(iterator, size, flags));
>
>     Which do the job but suffer from poor discoverability.  On the other
>     hand, it has none of the moral hazard -- its pretty clear you're
>     nailing bags on bags, and I don't think this status quo is so awful.
>
>
>     Another direction (as discussed previously without convergence)
>     would be to augment Iterable with a stream() method.  This helps
>     users of non-Collection Iterable classes, but still has some of the
>     moral hazard as it does not put enough pressure on writers of
>     Iterable classes to write better stream() implementations.
>

From brian.goetz at oracle.com  Sun Apr 14 15:48:11 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sun, 14 Apr 2013 18:48:11 -0400
Subject: Default methods for SAM types
Message-ID: <516B322B.8070207@oracle.com>

Here's a list of the default *instance* methods we've currently got (or 
should have, for consistency) for SAM types in java.util.function. 
Static methods will follow in a separate message.

Predicate<T>:

   Predicate<T> and(Predicate<? super T>)
   Predicate<T> or(Predicate<? super T>)
   Predicate<T> xor(Predicate<? super T>)
   Predicate<T> negate()

(same for {Int,Long,Double}Predicate, BiPredicate.)

Function<T,U>:

     <V> Function<V, R> compose(Function<? super V, ? extends T> before)
     <V> Function<T, V> andThen(Function<? super R, ? extends V> after)

BiFunction:

     <V> BiFunction<T, U, V> andThen(Function<? super R, ? extends V> after)

Consumer<T>:

     Consumer<T> chain(Consumer<? super T> other)

(Same for {Int,Long,Double}Consumer, BiConsumer.)

This seems a reasonable minimal set; not even clear whether 
BiFunction.andThen carries its weight.  Is there anything that's 
obviously missing?  Are there any of these that don't carry their weight?


From brian.goetz at oracle.com  Sun Apr 14 17:27:24 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sun, 14 Apr 2013 20:27:24 -0400
Subject: Static methods for SAM types
Message-ID: <516B496C.9040506@oracle.com>

As of Java 8 we can have static methods in interfaces.  Here's a small 
set of static methods for the java.util.function SAM types.

Function:
     public static <T> Function<T, T> identity()

Actually, that's the end of my must-have list!

But there have also been some others suggested, here's a sampling.  Do 
any of these speak to anyone?

Predicate:

     // Like o::isEquals, but also works if target is null
     public static<T> Predicate<T> isEqual(Object target)

Function:

     static <T> Function<T, T> substitute(T subOut, T subIn) {
         return t -> Objects.equals(subOut, t) ? subIn : t;
     }

     static <T, R> Function<T, R> constant(R constant) {
         return t -> constant;
     }

     // Or could be default Predicate.asFunction(forTrue, forFalse)
     static<T, R> forPredicate(Predicate<T>, R forTrue, R forFalse)

     // Like map::get, but throws if not present
     static <R, T> Function<T, R> forMap(Map<? super T, ? extends R> map)


From mike.duigou at oracle.com  Mon Apr 15 16:30:08 2013
From: mike.duigou at oracle.com (Mike Duigou)
Date: Mon, 15 Apr 2013 16:30:08 -0700
Subject: RFR : 8010953: Add primitive summary statistics utils
Message-ID: <D67B8527-D5EF-4F7D-BDEE-3FBE4BFEF9FD@oracle.com>

Hello all;

Another integration review in the JSR-335 libraries series. These three classes provide a utility for conveniently finding count, sum, min,  max and average of ints, longs or doubles. They can be used with existing code but will most likely be used with the Collectors utilities or directly with primitive or boxed streams.

http://cr.openjdk.java.net/~mduigou/JDK-8010953/1/webrev/

(this is an updated version of the webrev sent to core-libs-dev).

Mike

From david.holmes at oracle.com  Tue Apr 16 03:10:17 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 16 Apr 2013 20:10:17 +1000
Subject: RFR : 8010953: Add primitive summary statistics utils
In-Reply-To: <D67B8527-D5EF-4F7D-BDEE-3FBE4BFEF9FD@oracle.com>
References: <D67B8527-D5EF-4F7D-BDEE-3FBE4BFEF9FD@oracle.com>
Message-ID: <516D2389.2050609@oracle.com>

Hi Mike,

On 16/04/2013 9:30 AM, Mike Duigou wrote:
> Hello all;
>
> Another integration review in the JSR-335 libraries series. These three classes provide a utility for conveniently finding count, sum, min,  max and average of ints, longs or doubles. They can be used with existing code but will most likely be used with the Collectors utilities or directly with primitive or boxed streams.
>
> http://cr.openjdk.java.net/~mduigou/JDK-8010953/1/webrev/
>
> (this is an updated version of the webrev sent to core-libs-dev).

A couple of minor nits:

DoubleSummaryStatistics:

getMin/getMax:

The main doc should read the same as the @return. Presently the initial 
sentence:

  120      * Returns the recorded value closest to {@code 
Double.NEGATIVE_INFINITY},
  121      * {@code Double.POSITIVE_INFINITY} if no values have been 
recorded or if
  122      * any recorded value is NaN, then the result is NaN.

if very difficult to read and parse. The @return is much simpler - just 
say minimum/maximum value recorded, rather than "value closest to ...".

In all classes:

minimal -> minimum
maximal -> maximum

David

> Mike
>

From mike.duigou at oracle.com  Tue Apr 16 12:45:26 2013
From: mike.duigou at oracle.com (Mike Duigou)
Date: Tue, 16 Apr 2013 12:45:26 -0700
Subject: RFR : 8010953: Add primitive summary statistics utils
In-Reply-To: <516D2389.2050609@oracle.com>
References: <D67B8527-D5EF-4F7D-BDEE-3FBE4BFEF9FD@oracle.com>
	<516D2389.2050609@oracle.com>
Message-ID: <3E218825-9EE6-4F59-BC76-6424A6C67F19@oracle.com>


On Apr 16 2013, at 03:10 , David Holmes wrote:

> Hi Mike,
> 
> On 16/04/2013 9:30 AM, Mike Duigou wrote:
>> Hello all;
>> 
>> Another integration review in the JSR-335 libraries series. These three classes provide a utility for conveniently finding count, sum, min,  max and average of ints, longs or doubles. They can be used with existing code but will most likely be used with the Collectors utilities or directly with primitive or boxed streams.
>> 
>> http://cr.openjdk.java.net/~mduigou/JDK-8010953/1/webrev/
>> 
>> (this is an updated version of the webrev sent to core-libs-dev).
> 
> A couple of minor nits:
> 
> DoubleSummaryStatistics:
> 
> getMin/getMax:
> 
> The main doc should read the same as the @return. Presently the initial sentence:
> 
> 120      * Returns the recorded value closest to {@code Double.NEGATIVE_INFINITY},
> 121      * {@code Double.POSITIVE_INFINITY} if no values have been recorded or if
> 122      * any recorded value is NaN, then the result is NaN.
> 
> if very difficult to read and parse. The @return is much simpler - just say minimum/maximum value recorded, rather than "value closest to ...".

Done. (the  "value closest to ..." text was copied from Math.min/max").

> 
> In all classes:
> 
> minimal -> minimum
> maximal -> maximum

Done.


> David
> 
>> Mike
>> 


From brian.goetz at oracle.com  Tue Apr 16 12:47:32 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Tue, 16 Apr 2013 15:47:32 -0400
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <516315C1.3080509@oracle.com>
References: <516315C1.3080509@oracle.com>
Message-ID: <516DAAD4.1070506@oracle.com>

We never converged on this one.  Here's another stab at framing the 
problem.  (I'm pretty much ready to time out and make these collectors 
declare UNORDERED unless someone can convince me otherwise.)

Streams consist of source + intermediate ops + terminal.

Denote ordered/unordered variants of these as SO/SU, IO/IU/IA 
(A=agnostic), and TA/TU.  We can define the ordered-ness of any stream 
pipeline as follows:

ordered(SO) = true
ordered(SU) = false

ordered(X+IO) = true
ordered(X+IU) = false
ordered(X+IA) = ordered(X)

ordered(X+TA) = ordered(X)
ordered(X+TU) = false

A concurrent calculation may be performed if the stream is unordered 
*and* the destination is concurrent.

Collectors like toSet() are marked TU, and toList() are marked TA. 
Collectors like groupingByConcurrent will definitely be marked 
concurrent.  Question is, should it be marked TA or TU?  Either choice 
is defensible.

Note that collectors individually get to choose whether they are TA or 
TU.  Choices we make for our canned collectors need not affect 
user-written collectors.  The model can handle both and users can 
predict the behavior of both.

On 4/8/2013 3:08 PM, Brian Goetz wrote:
> Now that we've removed collectUnordered in favor of a more general
> unordered() op, we should consider what should be the default behavior for:
>
>    orderedStream.collect(groupingByConcurrent(f))
>
> Currently, the collect-to-ConcurrentMap collectors are *not* defined as
> UNORDERED.  Which means, if the stream is ordered, we will attempt to do
> an ordered collection anyway, which is incompatible with concurrent
> collection, and we will do the plain old partition-and-merge with
> ConcurrentMap.
>
> Here, we have competing evidence for the user intent.  On the one hand,
> the stream is ordered, and the user could have chosen unordered.  On the
> other, the user has asked for concurrent grouping.  Its not 100% obvious
> which should win.
>
> On the other hand, ordered map collections are so awful that they will
> almost certainly be unhappy with the performance if they forget to say
> unordered here in the parallel case (and it makes no difference in the
> sequential case.)  So I'm inclined to make groupingByConcurrent /
> toConcurrentMap be UNORDERED collections.

From brian.goetz at oracle.com  Wed Apr 17 11:48:39 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 17 Apr 2013 14:48:39 -0400
Subject: Survey: API review for Collectors
Message-ID: <516EEE87.3000103@oracle.com>

I've posted a survey for the static methods in Collectors at:
   https://www.surveymonkey.com/s/LGV85RH

I think the API here is mostly done; the spec and tutorial material 
still need work.

Usual password.


From brian.goetz at oracle.com  Wed Apr 17 12:56:09 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 17 Apr 2013 15:56:09 -0400
Subject: Survey: API review for Collectors
In-Reply-To: <516EEE87.3000103@oracle.com>
References: <516EEE87.3000103@oracle.com>
Message-ID: <516EFE59.8060006@oracle.com>

Sam asks:

Why not specifically return an immutable set from toSet()?

I'd like this too.  This is due to the limitation of not being able to 
support a post-function in Collector.  (See my recent post on this on 
lambda-dev: 
http://mail.openjdk.java.net/pipermail/lambda-dev/2013-April/009394.html).

Related question: why does toStringJoiner not expose prefix/suffix?

This one is related -- we don't have any easy way to treat the root 
caculation differently from sub-results.  If we used () -> new 
StringJoiner(", ", "[", "]") as our result container, then if we did a 
parallel collect of intRange(1,6) where it happens to get split in half, 
the reslut would be:

  [1,2,3],[4,5,6]

instead of

  [1,2,3,4,5,6]

TO be able to do this right, we'd have to use a different construction 
of the stringjoiner for non-root results.  Extending Collector to handle 
all these cases (efficiently) was going to be pretty disruptive.  So we 
said goodbye to these pretty use cases.


On 4/17/2013 2:48 PM, Brian Goetz wrote:
> I've posted a survey for the static methods in Collectors at:
>    https://www.surveymonkey.com/s/LGV85RH
>
> I think the API here is mostly done; the spec and tutorial material
> still need work.
>
> Usual password.
>
>

From tim at peierls.net  Wed Apr 17 14:06:47 2013
From: tim at peierls.net (Tim Peierls)
Date: Wed, 17 Apr 2013 17:06:47 -0400
Subject: Survey: API review for Collectors
In-Reply-To: <516EFE59.8060006@oracle.com>
References: <516EEE87.3000103@oracle.com>
	<516EFE59.8060006@oracle.com>
Message-ID: <CA+F8eeRKKZE0GTqmu=kMm-yNLaHyaeEs6OjHuWaJhyACQ42q2A@mail.gmail.com>

On Wed, Apr 17, 2013 at 3:56 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> TO be able to do this right, we'd have to use a different construction of
> the stringjoiner for non-root results.  Extending Collector to handle all
> these cases (efficiently) was going to be pretty disruptive.  So we said
> goodbye to these pretty use cases.


Good! "Cure" would have been worse than the disease.

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130417/7f27d603/attachment.html 

From brian.goetz at oracle.com  Wed Apr 17 18:21:02 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 17 Apr 2013 21:21:02 -0400
Subject: mergers
Message-ID: <516F4A7E.2080602@oracle.com>

Collectors defines three merge functions:

   throwingMerger -- always throws
   firstWinsMerger -- takes first
   lastWinsMerger -- takes last

These are plain old BinaryOperators that can be used for Map.merge as 
well as the toMap collectors.

Someone commented that these look a little out of place in Collectors, 
and they are certainly not Collector-specific.  Is there a better place 
for them?


From paul.sandoz at oracle.com  Thu Apr 18 01:35:28 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Thu, 18 Apr 2013 10:35:28 +0200
Subject: mergers
In-Reply-To: <516F4A7E.2080602@oracle.com>
References: <516F4A7E.2080602@oracle.com>
Message-ID: <9DBCAEC3-5E4F-494D-9352-4D830FEC8716@oracle.com>


On Apr 18, 2013, at 3:21 AM, Brian Goetz <brian.goetz at Oracle.COM> wrote:

> Collectors defines three merge functions:
> 
>  throwingMerger -- always throws
>  firstWinsMerger -- takes first
>  lastWinsMerger -- takes last
> 
> These are plain old BinaryOperators that can be used for Map.merge as well as the toMap collectors.
> 
> Someone commented that these look a little out of place in Collectors, and they are certainly not Collector-specific.  Is there a better place for them?
> 

Someone also commented (separately from a survey) that  method refs could be used instead? e.g. like using Integer::sum.

e.g. Objects::first, Objects::second, Objects:throwing

But i thought that might make it harder to correlate with map merging. 

They tend to read well when used with toMap code, but perhaps make more sense as static methods on Map due to Map.merge being present?

Paul.


From Paul.Sandoz at oracle.com  Thu Apr 18 02:25:54 2013
From: Paul.Sandoz at oracle.com (Paul Sandoz)
Date: Thu, 18 Apr 2013 11:25:54 +0200
Subject: Setting of UNORDERED on concurrent collectors
In-Reply-To: <516DAAD4.1070506@oracle.com>
References: <516315C1.3080509@oracle.com> <516DAAD4.1070506@oracle.com>
Message-ID: <27593794-2973-4D5B-943D-A8DD750678E4@oracle.com>


On Apr 16, 2013, at 9:47 PM, Brian Goetz <brian.goetz at Oracle.COM> wrote:

> We never converged on this one.  Here's another stab at framing the problem.  (I'm pretty much ready to time out and make these collectors declare UNORDERED unless someone can convince me otherwise.)
> 
> Streams consist of source + intermediate ops + terminal.
> 
> Denote ordered/unordered variants of these as SO/SU, IO/IU/IA (A=agnostic), and TA/TU.  We can define the ordered-ness of any stream pipeline as follows:
> 
> ordered(SO) = true
> ordered(SU) = false
> 
> ordered(X+IO) = true
> ordered(X+IU) = false
> ordered(X+IA) = ordered(X)
> 
> ordered(X+TA) = ordered(X)
> ordered(X+TU) = false
> 
> A concurrent calculation may be performed if the stream is unordered *and* the destination is concurrent.
> 
> Collectors like toSet() are marked TU, and toList() are marked TA. Collectors like groupingByConcurrent will definitely be marked concurrent.  Question is, should it be marked TA or TU?  Either choice is defensible.
> 

I think it should be TU,  even though it is only triggered when the upstream is unordered. The intent is, when triggered, that our concurrent collectors should be used with a forEach-like mechanism by which the collector concurrently receives elements in a temporal order.
 
Paul.


> Note that collectors individually get to choose whether they are TA or TU.  Choices we make for our canned collectors need not affect user-written collectors.  The model can handle both and users can predict the behavior of both.

From brian.goetz at oracle.com  Thu Apr 18 10:40:40 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 18 Apr 2013 13:40:40 -0400
Subject: mergers
In-Reply-To: <9DBCAEC3-5E4F-494D-9352-4D830FEC8716@oracle.com>
References: <516F4A7E.2080602@oracle.com>
	<9DBCAEC3-5E4F-494D-9352-4D830FEC8716@oracle.com>
Message-ID: <51703018.3070403@oracle.com>

I am OK with using method refs instead of function-returning methods. 
But I think key is that "merge" needs to appear in the name, because, 
while a function that returns the first of its arguments is useful, the 
key here is that we're trying to identify a set of reasonable merging 
policies that are useful when doing "dump a stream into a map".  I think 
even these three simple ones will greatly reduce people's need to write 
mergers themselves for toMap.

Having them live in some place more Mappy would be fine too, but I don't 
want to create a Maps class for them.  Are they important enough to be 
static methods on Map?  (I doubt it.)  So it mostly seems like they're 
in the "desirable to have, but not a great place to shove them" place 
now.  Is Collectors good enough, or do we have to think harder about 
making a better place?

On 4/18/2013 4:35 AM, Paul Sandoz wrote:
>
> On Apr 18, 2013, at 3:21 AM, Brian Goetz <brian.goetz at Oracle.COM> wrote:
>
>> Collectors defines three merge functions:
>>
>>   throwingMerger -- always throws
>>   firstWinsMerger -- takes first
>>   lastWinsMerger -- takes last
>>
>> These are plain old BinaryOperators that can be used for Map.merge as well as the toMap collectors.
>>
>> Someone commented that these look a little out of place in Collectors, and they are certainly not Collector-specific.  Is there a better place for them?
>>
>
> Someone also commented (separately from a survey) that  method refs could be used instead? e.g. like using Integer::sum.
>
> e.g. Objects::first, Objects::second, Objects:throwing
>
> But i thought that might make it harder to correlate with map merging.
>
> They tend to read well when used with toMap code, but perhaps make more sense as static methods on Map due to Map.merge being present?
>
> Paul.
>
>
>

From brian.goetz at oracle.com  Thu Apr 18 19:18:58 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Thu, 18 Apr 2013 22:18:58 -0400
Subject: Dividing Streams.java
In-Reply-To: <5166F804.50101@oracle.com>
References: <5166F804.50101@oracle.com>
Message-ID: <5170A992.80703@oracle.com>

Now that we've cleared away the spliterator methods from Streams, all, 
or nearly all, of the remaining methods in Streams are candidates for 
moving to the respective interfaces.  And in many ways get nicer when 
they do.

We've got:

   builder()
   emptyStream()
   singletonStream()
   iterate()
   generate()

for all the types (so emptyStream(), emptyIntStream(), etc), plus ranges 
for the numeric types.  Plus

   concat()
   zip()

for ref streams.

All of these are good candidate for statics in their respective interfaces:

   Stream.builder()
   Stream.emptyStream();
   IntStream.generate(f);
   IntStream.range(f);

They read well, most are "important" enough to live with the main 
interface, and the names get less redundant since we don't have to say 
"intRange" but just "range".

All of them?  Most of them?  None of them?

On 4/11/2013 1:51 PM, Brian Goetz wrote:
> Joe quite correctly pointed out in the survey that Streams.java is a mix
> of two things for two audiences:
>
>   - Utility methods for users to generate streams, like intRange()
>   - Low level methods for library writers to generate streams from
> things like iterators or spliterators.
>
> Merging them in one file is confusing, because users come away with the
> idea that writing spliterators is something they're supposed to do,
> whereas in reality, if we've done our jobs, they should never even be
> aware that spliterators exist.  So I think we should separate them into
> a "high level" and "low level" bag of tricks.
>
> Since today, Paul has added some new ones:
>   - singletonStream(v) (four flavors)
>   - builder() (four flavors)
>
> So, we have to identify appropriate homes for the two groupings, and
> separate them.  Here's a first cut at separating them:
>
> High level:
>    xxxRange
>    xxxBuilder
>    emptyXxxStream
>    singletonXxxStream
>    concat
>    zip
>
> Low level:
>    all spliterator-related stream building methods
>
> Not sure where (or even if):
>    iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...)
>    generate (infinite stream of independent applications of a generator,
> good for infinite constant and random streams, though not much else,
> used by impl of Random.{ints,longs,gaussians}).
>
> Others that we've talked about adding:
>    ints(), longs()  // to enable things like ints().filter(...).limit(n)
>    indexedGenerate(i -> T)
>
>
>
> I think the high-level stuff should stay in Streams.  So we need a name
> for the low-level stuff.  (Which also then becomes the right home for
> "how do I turn my data sturcture into a stream" doc.)
>
> What should we call that?

From paul.sandoz at oracle.com  Fri Apr 19 03:01:41 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Fri, 19 Apr 2013 12:01:41 +0200
Subject: mergers
In-Reply-To: <51703018.3070403@oracle.com>
References: <516F4A7E.2080602@oracle.com>
	<9DBCAEC3-5E4F-494D-9352-4D830FEC8716@oracle.com>
	<51703018.3070403@oracle.com>
Message-ID: <35E1D4C2-B71C-4088-B35C-7E7E8DAA7D9A@oracle.com>


On Apr 18, 2013, at 7:40 PM, Brian Goetz <brian.goetz at Oracle.COM> wrote:

> I am OK with using method refs instead of function-returning methods. But I think key is that "merge" needs to appear in the name, because, while a function that returns the first of its arguments is useful, the key here is that we're trying to identify a set of reasonable merging policies that are useful when doing "dump a stream into a map".  I think even these three simple ones will greatly reduce people's need to write mergers themselves for toMap.

> 
> Having them live in some place more Mappy would be fine too, but I don't want to create a Maps class for them.  Are they important enough to be static methods on Map?  (I doubt it.)  So it mostly seems like they're in the "desirable to have, but not a great place to shove them" place now.  Is Collectors good enough, or do we have to think harder about making a better place?
> 

My inclination is Collectors is OK since those methods are designed to be closely associated with Collectors.toMap.

FWIW i think it is also possible to offset some of the need for "merge" name with some documentation in Collectors.toMap, however i still like the way it reads in code when those methods are used.

Paul.

> On 4/18/2013 4:35 AM, Paul Sandoz wrote:
>> 
>> On Apr 18, 2013, at 3:21 AM, Brian Goetz <brian.goetz at Oracle.COM> wrote:
>> 
>>> Collectors defines three merge functions:
>>> 
>>>  throwingMerger -- always throws
>>>  firstWinsMerger -- takes first
>>>  lastWinsMerger -- takes last
>>> 
>>> These are plain old BinaryOperators that can be used for Map.merge as well as the toMap collectors.
>>> 
>>> Someone commented that these look a little out of place in Collectors, and they are certainly not Collector-specific.  Is there a better place for them?
>>> 
>> 
>> Someone also commented (separately from a survey) that  method refs could be used instead? e.g. like using Integer::sum.
>> 
>> e.g. Objects::first, Objects::second, Objects:throwing
>> 
>> But i thought that might make it harder to correlate with map merging.
>> 
>> They tend to read well when used with toMap code, but perhaps make more sense as static methods on Map due to Map.merge being present?
>> 
>> Paul.
>> 
>> 
>> 


From spullara at gmail.com  Fri Apr 19 08:29:58 2013
From: spullara at gmail.com (Sam Pullara)
Date: Fri, 19 Apr 2013 08:29:58 -0700
Subject: Dividing Streams.java
In-Reply-To: <5170A992.80703@oracle.com>
References: <5166F804.50101@oracle.com> <5170A992.80703@oracle.com>
Message-ID: <6E8C35D9-3391-4F3A-9B6E-80F3DDF6F9C4@gmail.com>

I think it is a good idea to move all of them to their interfaces. Much easier to find.

Sam

On Apr 18, 2013, at 7:18 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Now that we've cleared away the spliterator methods from Streams, all, or nearly all, of the remaining methods in Streams are candidates for moving to the respective interfaces.  And in many ways get nicer when they do.
> 
> We've got:
> 
>  builder()
>  emptyStream()
>  singletonStream()
>  iterate()
>  generate()
> 
> for all the types (so emptyStream(), emptyIntStream(), etc), plus ranges for the numeric types.  Plus
> 
>  concat()
>  zip()
> 
> for ref streams.
> 
> All of these are good candidate for statics in their respective interfaces:
> 
>  Stream.builder()
>  Stream.emptyStream();
>  IntStream.generate(f);
>  IntStream.range(f);
> 
> They read well, most are "important" enough to live with the main interface, and the names get less redundant since we don't have to say "intRange" but just "range".
> 
> All of them?  Most of them?  None of them?
> 
> On 4/11/2013 1:51 PM, Brian Goetz wrote:
>> Joe quite correctly pointed out in the survey that Streams.java is a mix
>> of two things for two audiences:
>> 
>>  - Utility methods for users to generate streams, like intRange()
>>  - Low level methods for library writers to generate streams from
>> things like iterators or spliterators.
>> 
>> Merging them in one file is confusing, because users come away with the
>> idea that writing spliterators is something they're supposed to do,
>> whereas in reality, if we've done our jobs, they should never even be
>> aware that spliterators exist.  So I think we should separate them into
>> a "high level" and "low level" bag of tricks.
>> 
>> Since today, Paul has added some new ones:
>>  - singletonStream(v) (four flavors)
>>  - builder() (four flavors)
>> 
>> So, we have to identify appropriate homes for the two groupings, and
>> separate them.  Here's a first cut at separating them:
>> 
>> High level:
>>   xxxRange
>>   xxxBuilder
>>   emptyXxxStream
>>   singletonXxxStream
>>   concat
>>   zip
>> 
>> Low level:
>>   all spliterator-related stream building methods
>> 
>> Not sure where (or even if):
>>   iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...)
>>   generate (infinite stream of independent applications of a generator,
>> good for infinite constant and random streams, though not much else,
>> used by impl of Random.{ints,longs,gaussians}).
>> 
>> Others that we've talked about adding:
>>   ints(), longs()  // to enable things like ints().filter(...).limit(n)
>>   indexedGenerate(i -> T)
>> 
>> 
>> 
>> I think the high-level stuff should stay in Streams.  So we need a name
>> for the low-level stuff.  (Which also then becomes the right home for
>> "how do I turn my data sturcture into a stream" doc.)
>> 
>> What should we call that?


From brian.goetz at oracle.com  Sat Apr 20 08:46:10 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 20 Apr 2013 11:46:10 -0400
Subject: Varargs stream factory methods
Message-ID: <5172B842.5090804@oracle.com>

Currently we have, in Arrays:

     public static <T> Stream<T> stream(T[] array) {
         return stream(array, 0, array.length);
     }

     public static IntStream stream(int[] array) {
         return stream(array, 0, array.length);
     }

etc.

We *could* make these varargs methods, which is useful as creating 
ad-hoc stream literals:

   Arrays.stream(1, 2, 4, 8).map(...)

The downside is that we would have to lose (or rename) methods like:

     public static IntStream stream(int[] array,
                                    int fromIndex, int toIndex) {

since stream(1, 2, 3) would be ambiguous.

Probably better, make these static factories in the various stream 
interfaces:

   Stream.of("foo", "bar")

   IntStream.of(1, 2, 4, 8)


From brian.goetz at oracle.com  Sat Apr 20 08:46:46 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 20 Apr 2013 11:46:46 -0400
Subject: Dividing Streams.java
In-Reply-To: <6E8C35D9-3391-4F3A-9B6E-80F3DDF6F9C4@gmail.com>
References: <5166F804.50101@oracle.com> <5170A992.80703@oracle.com>
	<6E8C35D9-3391-4F3A-9B6E-80F3DDF6F9C4@gmail.com>
Message-ID: <5172B866.3050007@oracle.com>

Unless anyone objects, I plan to do this.

On 4/19/2013 11:29 AM, Sam Pullara wrote:
> I think it is a good idea to move all of them to their interfaces. Much easier to find.
>
> Sam
>
> On Apr 18, 2013, at 7:18 PM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
>> Now that we've cleared away the spliterator methods from Streams, all, or nearly all, of the remaining methods in Streams are candidates for moving to the respective interfaces.  And in many ways get nicer when they do.
>>
>> We've got:
>>
>>   builder()
>>   emptyStream()
>>   singletonStream()
>>   iterate()
>>   generate()
>>
>> for all the types (so emptyStream(), emptyIntStream(), etc), plus ranges for the numeric types.  Plus
>>
>>   concat()
>>   zip()
>>
>> for ref streams.
>>
>> All of these are good candidate for statics in their respective interfaces:
>>
>>   Stream.builder()
>>   Stream.emptyStream();
>>   IntStream.generate(f);
>>   IntStream.range(f);
>>
>> They read well, most are "important" enough to live with the main interface, and the names get less redundant since we don't have to say "intRange" but just "range".
>>
>> All of them?  Most of them?  None of them?
>>
>> On 4/11/2013 1:51 PM, Brian Goetz wrote:
>>> Joe quite correctly pointed out in the survey that Streams.java is a mix
>>> of two things for two audiences:
>>>
>>>   - Utility methods for users to generate streams, like intRange()
>>>   - Low level methods for library writers to generate streams from
>>> things like iterators or spliterators.
>>>
>>> Merging them in one file is confusing, because users come away with the
>>> idea that writing spliterators is something they're supposed to do,
>>> whereas in reality, if we've done our jobs, they should never even be
>>> aware that spliterators exist.  So I think we should separate them into
>>> a "high level" and "low level" bag of tricks.
>>>
>>> Since today, Paul has added some new ones:
>>>   - singletonStream(v) (four flavors)
>>>   - builder() (four flavors)
>>>
>>> So, we have to identify appropriate homes for the two groupings, and
>>> separate them.  Here's a first cut at separating them:
>>>
>>> High level:
>>>    xxxRange
>>>    xxxBuilder
>>>    emptyXxxStream
>>>    singletonXxxStream
>>>    concat
>>>    zip
>>>
>>> Low level:
>>>    all spliterator-related stream building methods
>>>
>>> Not sure where (or even if):
>>>    iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...)
>>>    generate (infinite stream of independent applications of a generator,
>>> good for infinite constant and random streams, though not much else,
>>> used by impl of Random.{ints,longs,gaussians}).
>>>
>>> Others that we've talked about adding:
>>>    ints(), longs()  // to enable things like ints().filter(...).limit(n)
>>>    indexedGenerate(i -> T)
>>>
>>>
>>>
>>> I think the high-level stuff should stay in Streams.  So we need a name
>>> for the low-level stuff.  (Which also then becomes the right home for
>>> "how do I turn my data sturcture into a stream" doc.)
>>>
>>> What should we call that?
>

From tim at peierls.net  Sat Apr 20 08:50:35 2013
From: tim at peierls.net (Tim Peierls)
Date: Sat, 20 Apr 2013 11:50:35 -0400
Subject: Varargs stream factory methods
In-Reply-To: <5172B842.5090804@oracle.com>
References: <5172B842.5090804@oracle.com>
Message-ID: <CA+F8eeQdd7nEefM1vxtb6E5tfDhNxC2ycevAgGHBwr22BH3AgQ@mail.gmail.com>

On Sat, Apr 20, 2013 at 11:46 AM, Brian Goetz <brian.goetz at oracle.com>wrote:

> Currently we have, in Arrays:
>
>     public static <T> Stream<T> stream(T[] array) {
>         return stream(array, 0, array.length);
>     }
>
>     public static IntStream stream(int[] array) {
>         return stream(array, 0, array.length);
>     }
>
> etc.
>
> We *could* make these varargs methods, which is useful as creating ad-hoc
> stream literals:
>
>   Arrays.stream(1, 2, 4, 8).map(...)
>
> The downside is that we would have to lose (or rename) methods like:
>
>     public static IntStream stream(int[] array,
>                                    int fromIndex, int toIndex) {
>
> since stream(1, 2, 3) would be ambiguous.
>
> Probably better, make these static factories in the various stream
> interfaces:
>
>   Stream.of("foo", "bar")
>
>   IntStream.of(1, 2, 4, 8)
>


I'm used to varargs static factories named "of" from Guava, so that last
approach appeals to me.

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130420/79e5b936/attachment.html 

From spullara at gmail.com  Sat Apr 20 10:02:12 2013
From: spullara at gmail.com (Sam Pullara)
Date: Sat, 20 Apr 2013 10:02:12 -0700
Subject: Varargs stream factory methods
In-Reply-To: <5172B842.5090804@oracle.com>
References: <5172B842.5090804@oracle.com>
Message-ID: <6244092293909388376@unknownmsgid>

I like the .of() idea better than overloading .stream().

Sam

On Apr 20, 2013, at 8:47 AM, Brian Goetz <brian.goetz at oracle.com> wrote:

> Currently we have, in Arrays:
>
>    public static <T> Stream<T> stream(T[] array) {
>        return stream(array, 0, array.length);
>    }
>
>    public static IntStream stream(int[] array) {
>        return stream(array, 0, array.length);
>    }
>
> etc.
>
> We *could* make these varargs methods, which is useful as creating ad-hoc stream literals:
>
>  Arrays.stream(1, 2, 4, 8).map(...)
>
> The downside is that we would have to lose (or rename) methods like:
>
>    public static IntStream stream(int[] array,
>                                   int fromIndex, int toIndex) {
>
> since stream(1, 2, 3) would be ambiguous.
>
> Probably better, make these static factories in the various stream interfaces:
>
>  Stream.of("foo", "bar")
>
>  IntStream.of(1, 2, 4, 8)
>

From forax at univ-mlv.fr  Sat Apr 20 13:38:28 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Sat, 20 Apr 2013 22:38:28 +0200
Subject: Varargs stream factory methods
In-Reply-To: <6244092293909388376@unknownmsgid>
References: <5172B842.5090804@oracle.com> <6244092293909388376@unknownmsgid>
Message-ID: <5172FCC4.5030403@univ-mlv.fr>

On 04/20/2013 07:02 PM, Sam Pullara wrote:
> I like the .of() idea better than overloading .stream().
>
> Sam

I agree,
'of' is already used in EnumSet for that purpose.

R?mi

>
> On Apr 20, 2013, at 8:47 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
>
>> Currently we have, in Arrays:
>>
>>     public static <T> Stream<T> stream(T[] array) {
>>         return stream(array, 0, array.length);
>>     }
>>
>>     public static IntStream stream(int[] array) {
>>         return stream(array, 0, array.length);
>>     }
>>
>> etc.
>>
>> We *could* make these varargs methods, which is useful as creating ad-hoc stream literals:
>>
>>   Arrays.stream(1, 2, 4, 8).map(...)
>>
>> The downside is that we would have to lose (or rename) methods like:
>>
>>     public static IntStream stream(int[] array,
>>                                    int fromIndex, int toIndex) {
>>
>> since stream(1, 2, 3) would be ambiguous.
>>
>> Probably better, make these static factories in the various stream interfaces:
>>
>>   Stream.of("foo", "bar")
>>
>>   IntStream.of(1, 2, 4, 8)
>>


From brian.goetz at oracle.com  Sat Apr 20 14:47:49 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 20 Apr 2013 17:47:49 -0400
Subject: Drop Arrays.parallelStream()?
Message-ID: <51730D05.5030007@oracle.com>

We dropped the parallel versions of all the static generator/factory 
methods in Streams a while ago, in favor of just letting people do (say) 
IntStream.range(...).parallel().  Since then, we have also greatly 
reduce the runtime cost of Stream.parallel().

We still have the separate .parallelStream() method on Collection and in 
the static methods in Arrays.

I still really like Collection.parallelStream; it has huge 
discoverability advantages, and offers a pretty big return on API 
surface area -- one more method, but provides value in a lot of places, 
since Collection will be a really common case of a stream source.

Arrays are in a middle ground.  We have eight Arrays.stream() methods 
and eight Arrays.parallelStream() methods (four types, both whole-array 
and slice versions).  I'm having a bit of a YAGNI twinge for the 
Arrays.parallelStream forms, and could see ditching them.  (The 
implementations are trivial and small, so that is not an argument to 
ditch them -- we should make this decision purely on API considerations.)

If we did this, Collection would have the sole parallelStream method; 
everything else would have to go through .parallel().  Which seems fine 
to me.


From mike.duigou at oracle.com  Sat Apr 20 15:01:29 2013
From: mike.duigou at oracle.com (Mike Duigou)
Date: Sat, 20 Apr 2013 15:01:29 -0700
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <51730D05.5030007@oracle.com>
References: <51730D05.5030007@oracle.com>
Message-ID: <D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>


On Apr 20 2013, at 14:47 , Brian Goetz wrote:

> We dropped the parallel versions of all the static generator/factory methods in Streams a while ago, in favor of just letting people do (say) IntStream.range(...).parallel().  Since then, we have also greatly reduce the runtime cost of Stream.parallel().
> 
> We still have the separate .parallelStream() method on Collection and in the static methods in Arrays.
> 
> I still really like Collection.parallelStream; it has huge discoverability advantages, and offers a pretty big return on API surface area -- one more method, but provides value in a lot of places, since Collection will be a really common case of a stream source.
> 
> Arrays are in a middle ground.  We have eight Arrays.stream() methods and eight Arrays.parallelStream() methods (four types, both whole-array and slice versions).  I'm having a bit of a YAGNI twinge for the Arrays.parallelStream forms, and could see ditching them.  (The implementations are trivial and small, so that is not an argument to ditch them -- we should make this decision purely on API considerations.)
> 
> If we did this, Collection would have the sole parallelStream method; everything else would have to go through .parallel().  Which seems fine to me.
> 

I would probably always use always .stream().parallel() idiomatically for consistency unless parallelStream() told me why I should use it instead. I say toss all of the parallelStream() methods unless there's an impl efficiency dependent reason to retain some of them.

Mike

From tim at peierls.net  Sat Apr 20 15:10:33 2013
From: tim at peierls.net (Tim Peierls)
Date: Sat, 20 Apr 2013 18:10:33 -0400
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
Message-ID: <CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>

On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou <mike.duigou at oracle.com> wrote:

> I would probably always use always .stream().parallel() idiomatically for
> consistency unless parallelStream() told me why I should use it instead. I
> say toss all of the parallelStream() methods unless there's an impl
> efficiency dependent reason to retain some of them.
>

Agreed.

I see the discoverability of Collection.parallelStream() as a potential
pedagogical drawback. "Do I use parallelStream() or stream().parallel()?"

For most folks, the expectation and intuition will be sequential, so take
advantage of that: Let people come to c.stream().parallel() slowly and
deliberately, after getting their feet wet with c.stream().

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130420/18ff0775/attachment.html 

From joe.bowbeer at gmail.com  Sat Apr 20 15:16:46 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sat, 20 Apr 2013 15:16:46 -0700
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
Message-ID: <CAHzJPEr7WyWEqADjAu3_qSKA1fezsPZS+A=95UuFRkrDw+p8=A@mail.gmail.com>

I agree with Mike and Tim.  I'd remove all the parallelStream() methods now
- and add some or all back later if they ARE needed.

I don't like the inconsistency of having parallelStream available on some
stream factories and not on others.


On Sat, Apr 20, 2013 at 3:10 PM, Tim Peierls <tim at peierls.net> wrote:

> On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou <mike.duigou at oracle.com>wrote:
>
>> I would probably always use always .stream().parallel() idiomatically for
>> consistency unless parallelStream() told me why I should use it instead. I
>> say toss all of the parallelStream() methods unless there's an impl
>> efficiency dependent reason to retain some of them.
>>
>
> Agreed.
>
> I see the discoverability of Collection.parallelStream() as a potential
> pedagogical drawback. "Do I use parallelStream() or stream().parallel()?"
>
> For most folks, the expectation and intuition will be sequential, so take
> advantage of that: Let people come to c.stream().parallel() slowly and
> deliberately, after getting their feet wet with c.stream().
>
> --tim
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130420/057828b5/attachment.html 

From brian.goetz at oracle.com  Sat Apr 20 15:28:04 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 20 Apr 2013 18:28:04 -0400
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
Message-ID: <51731674.4010401@oracle.com>

> For most folks, the expectation and intuition will be sequential, so
> take advantage of that: Let people come to c.stream().parallel() slowly
> and deliberately, after getting their feet wet with c.stream().

I have a slightly different viewpoint about the value of this sequential 
intuition -- I view the pervasive "sequential expectation" as one if the 
biggest challenges of this entire effort; people are *constantly* 
bringing their incorrect sequential bias, which leads them to do stupid 
things like using a one-element array as a way to "trick" the "stupid" 
compiler into letting them capture a mutable local, or using lambdas as 
arguments to map that mutate state that will be used during the 
computation (in a non-thread-safe way), and then, when its pointed out 
that what they're doing, shrug it off and say "yeah, but I'm not doing 
it in parallel."

We've made a lot of design tradeoffs to merge sequential and parallel 
streams.  The result, I believe, is a clean one and will add to the 
library's chances of still being useful in 10+ years, but I don't 
particularly like the idea of encouraging people to think this is a 
sequential library with some parallel bags nailed on the side.


From brian.goetz at oracle.com  Sat Apr 20 15:37:08 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 20 Apr 2013 18:37:08 -0400
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <CAHzJPEr7WyWEqADjAu3_qSKA1fezsPZS+A=95UuFRkrDw+p8=A@mail.gmail.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
	<CAHzJPEr7WyWEqADjAu3_qSKA1fezsPZS+A=95UuFRkrDw+p8=A@mail.gmail.com>
Message-ID: <51731894.80107@oracle.com>

For what its worth, the internal tracking title of this project is "Bulk 
data-parallel operations on Collections."  I'm not willing to relegate 
such central functionality to something that is tucked into a remote 
corner of the API -- it was *already* a huge (but warranted) 
discoverability compromise to have the stream/parallelStream "bun" 
methods in the first place!  Two buns for this case -- which will be 
probably 90% of stream constructions -- would be too much.  So, I cannot 
see my way to removing Collection.parallelStream.  However, I am willing 
to ditch the parallel versions of the static stream factory methods, 
largely on the basis that the Collection versions will be used 100x as 
much as any one of the static factories.

The "inconsistency" of this position doesn't bother me one tiny bit; it 
is a pragmatic compromise.  (In fact, I'm not even sure its an 
inconsistency at all, since they're kind of different beasts -- one is a 
static factory, the other is a view onto an existing data structure.)

So I'm willing to meet you 95% of the way there.


On 4/20/2013 6:16 PM, Joe Bowbeer wrote:
> I agree with Mike and Tim.  I'd remove all the parallelStream() methods
> now - and add some or all back later if they ARE needed.
>
> I don't like the inconsistency of having parallelStream available on
> some stream factories and not on others.
>
>
>
>
> On Sat, Apr 20, 2013 at 3:10 PM, Tim Peierls <tim at peierls.net
> <mailto:tim at peierls.net>> wrote:
>
>     On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou <mike.duigou at oracle.com
>     <mailto:mike.duigou at oracle.com>> wrote:
>
>         I would probably always use always .stream().parallel()
>         idiomatically for consistency unless parallelStream() told me
>         why I should use it instead. I say toss all of the
>         parallelStream() methods unless there's an impl efficiency
>         dependent reason to retain some of them.
>
>
>     Agreed.
>
>     I see the discoverability of Collection.parallelStream() as a
>     potential pedagogical drawback. "Do I use parallelStream() or
>     stream().parallel()?"
>
>     For most folks, the expectation and intuition will be sequential, so
>     take advantage of that: Let people come to c.stream().parallel()
>     slowly and deliberately, after getting their feet wet with c.stream().
>
>     --tim
>
>

From brian.goetz at oracle.com  Sat Apr 20 15:38:34 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 20 Apr 2013 18:38:34 -0400
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
Message-ID: <517318EA.7060801@oracle.com>

> For most folks, the expectation and intuition will be sequential, so
> take advantage of that: Let people come to c.stream().parallel() slowly
> and deliberately, after getting their feet wet with c.stream().

I have a slightly different viewpoint about the value of this sequential 
intuition -- I view the pervasive "sequential expectation" as one if the 
biggest challenges of this entire effort; people are *constantly* 
bringing their incorrect sequential bias, which leads them to do stupid 
things like using a one-element array as a way to "trick" the "stupid" 
compiler into letting them capture a mutable local, or using lambdas as 
arguments to map that mutate state that will be used during the 
computation (in a non-thread-safe way), and then, when its pointed out 
that what they're doing, shrug it off and say "yeah, but I'm not doing 
it in parallel."

We've made a lot of design tradeoffs to merge sequential and parallel 
streams.  The result, I believe, is a clean one and will add to the 
library's chances of still being useful in 10+ years, but I don't 
particularly like the idea of encouraging people to think this is a 
sequential library with some parallel bags nailed on the side.


From joe.bowbeer at gmail.com  Sat Apr 20 16:12:12 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sat, 20 Apr 2013 16:12:12 -0700
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <51731894.80107@oracle.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
	<CAHzJPEr7WyWEqADjAu3_qSKA1fezsPZS+A=95UuFRkrDw+p8=A@mail.gmail.com>
	<51731894.80107@oracle.com>
Message-ID: <CAHzJPEqisZJrH=9G329igyHDypCMkNk2DW8LRM4ZPGR_7xNkFA@mail.gmail.com>

Brian, What do you mean by the following?

it was *already* a huge (but warranted) discoverability compromise to have
> the stream/parallelStream "bun" methods in the first place! Two buns for
> this case [...] would be too much.


Are you referring to the fact the there is no ParallelStream type?

Note that the common point that Mike, Tim, and I raised is consistency.
 Your proposal to remove methods is creating the inconsistency, so I don't
understand the comment that you're meeting us 95% of the way there...

That said, I think I can view Collection and Arrays as two different things
that have little bearing on each other (if I squint).

Still, why, if you're so interested in advertising the parallel features,
do you *want* to remove these methods from Arrays?

Finally, Brian writes:

but I don't particularly like the idea of encouraging people to think this
> is a sequential library with some parallel bags nailed on the side


Then again, users like consistency...

Joe


On Sat, Apr 20, 2013 at 3:37 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> For what its worth, the internal tracking title of this project is "Bulk
> data-parallel operations on Collections."  I'm not willing to relegate such
> central functionality to something that is tucked into a remote corner of
> the API -- it was *already* a huge (but warranted) discoverability
> compromise to have the stream/parallelStream "bun" methods in the first
> place!  Two buns for this case -- which will be probably 90% of stream
> constructions -- would be too much.  So, I cannot see my way to removing
> Collection.parallelStream.  However, I am willing to ditch the parallel
> versions of the static stream factory methods, largely on the basis that
> the Collection versions will be used 100x as much as any one of the static
> factories.
>
> The "inconsistency" of this position doesn't bother me one tiny bit; it is
> a pragmatic compromise.  (In fact, I'm not even sure its an inconsistency
> at all, since they're kind of different beasts -- one is a static factory,
> the other is a view onto an existing data structure.)
>
> So I'm willing to meet you 95% of the way there.
>
>
>
> On 4/20/2013 6:16 PM, Joe Bowbeer wrote:
>
>> I agree with Mike and Tim.  I'd remove all the parallelStream() methods
>> now - and add some or all back later if they ARE needed.
>>
>> I don't like the inconsistency of having parallelStream available on
>> some stream factories and not on others.
>>
>>
>>
>>
>> On Sat, Apr 20, 2013 at 3:10 PM, Tim Peierls <tim at peierls.net
>> <mailto:tim at peierls.net>> wrote:
>>
>>     On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou <mike.duigou at oracle.com
>>     <mailto:mike.duigou at oracle.com**>> wrote:
>>
>>         I would probably always use always .stream().parallel()
>>         idiomatically for consistency unless parallelStream() told me
>>         why I should use it instead. I say toss all of the
>>         parallelStream() methods unless there's an impl efficiency
>>         dependent reason to retain some of them.
>>
>>
>>     Agreed.
>>
>>     I see the discoverability of Collection.parallelStream() as a
>>     potential pedagogical drawback. "Do I use parallelStream() or
>>     stream().parallel()?"
>>
>>     For most folks, the expectation and intuition will be sequential, so
>>     take advantage of that: Let people come to c.stream().parallel()
>>     slowly and deliberately, after getting their feet wet with c.stream().
>>
>>     --tim
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130420/0433d9fe/attachment.html 

From brian.goetz at oracle.com  Sat Apr 20 16:29:00 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sat, 20 Apr 2013 19:29:00 -0400
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <CAHzJPEqisZJrH=9G329igyHDypCMkNk2DW8LRM4ZPGR_7xNkFA@mail.gmail.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
	<CAHzJPEr7WyWEqADjAu3_qSKA1fezsPZS+A=95UuFRkrDw+p8=A@mail.gmail.com>
	<51731894.80107@oracle.com>
	<CAHzJPEqisZJrH=9G329igyHDypCMkNk2DW8LRM4ZPGR_7xNkFA@mail.gmail.com>
Message-ID: <517324BC.2050003@oracle.com>

> Brian, What do you mean by the following?
>
>     it was *already* a huge (but warranted) discoverability compromise
>     to have the stream/parallelStream "bun" methods in the first place!
>     Two buns for this case [...] would be too much.
>
> Are you referring to the fact the there is no ParallelStream type?

No, that's a big plus -- one Stream to rule them all!

The negative was having to have the

   .stream()
and
   .parallelStream()

methods at all.  We originally really liked the idea of

   collection.filter(..)...
             ^ no view method!

but for various reasons reluctantly concluded it was untenable.  But 
that still doesn't mean we like having the new functionality be so far 
removed from Collection.  And if one layer removed is suboptimal, two is 
worse.

> Note that the common point that Mike, Tim, and I raised is consistency.
>   Your proposal to remove methods is creating the inconsistency, so I

Not really.  We were already inconsistent; they were present for 
Collection and for Array factories but not for range factories, 
generator factories, etc.  You could argue the new proposed state is 
more consistent (all the view methods have a parallel counterpart; all 
the static factories don't) but that's not what I consider its primary 
benefit.

> don't understand the comment that you're meeting us 95% of the way there...

Removing all but one of the parallelStream methods.

> Still, why, if you're so interested in advertising the parallel
> features, do you *want* to remove these methods from Arrays?

Simply: return on API surface area.  The return for having it the one 
extra method on Collection is large; the return for having 100 extra 
methods for 100 infrequently-used factory methods is small (even in the 
aggregate).  Arrays.parallelStream() was eight more methods -- four 
types x two forms.  (Others here have argued that "too many forms of the 
same method is a smell"; also there have been plenty of calls for a 
round of YAGNIism.)

>     but I don't particularly like the idea of encouraging people to
>     think this is a sequential library with some parallel bags nailed on
>     the side
>
> Then again, users like consistency...

I'm not saying consistency is unimportant.  But in my experience, 
"consistency" can be used to justify nearly any position -- one can 
always find a precedent in a complex system to be "consistent" with.  So 
I want more than mere consistency.  (Not to mention many consistencies 
are of the "foolish hobgoblin" variety.)

From joe.bowbeer at gmail.com  Sat Apr 20 16:47:08 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sat, 20 Apr 2013 16:47:08 -0700
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <517324BC.2050003@oracle.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
	<CAHzJPEr7WyWEqADjAu3_qSKA1fezsPZS+A=95UuFRkrDw+p8=A@mail.gmail.com>
	<51731894.80107@oracle.com>
	<CAHzJPEqisZJrH=9G329igyHDypCMkNk2DW8LRM4ZPGR_7xNkFA@mail.gmail.com>
	<517324BC.2050003@oracle.com>
Message-ID: <CAHzJPErq_Uh95rC-3qq2gxA-kW0Ce7Q-ASKtL2o2D06+7T7JcA@mail.gmail.com>

Thanks for clarifying.

Most of your justifications seem to be a matter of taste, so there is no
use arguing. (Taste, like foolishness, defies argument.)

Alas, after cleansing my mind of "foolish hobgoblins" and other distracting
remarks, I think your proposal is an improvement, even without squinting.
On Apr 20, 2013 4:29 PM, "Brian Goetz" <brian.goetz at oracle.com> wrote:

> Brian, What do you mean by the following?
>>
>>     it was *already* a huge (but warranted) discoverability compromise
>>     to have the stream/parallelStream "bun" methods in the first place!
>>     Two buns for this case [...] would be too much.
>>
>> Are you referring to the fact the there is no ParallelStream type?
>>
>
> No, that's a big plus -- one Stream to rule them all!
>
> The negative was having to have the
>
>   .stream()
> and
>   .parallelStream()
>
> methods at all.  We originally really liked the idea of
>
>   collection.filter(..)...
>             ^ no view method!
>
> but for various reasons reluctantly concluded it was untenable.  But that
> still doesn't mean we like having the new functionality be so far removed
> from Collection.  And if one layer removed is suboptimal, two is worse.
>
>  Note that the common point that Mike, Tim, and I raised is consistency.
>>   Your proposal to remove methods is creating the inconsistency, so I
>>
>
> Not really.  We were already inconsistent; they were present for
> Collection and for Array factories but not for range factories, generator
> factories, etc.  You could argue the new proposed state is more consistent
> (all the view methods have a parallel counterpart; all the static factories
> don't) but that's not what I consider its primary benefit.
>
>  don't understand the comment that you're meeting us 95% of the way
>> there...
>>
>
> Removing all but one of the parallelStream methods.
>
>  Still, why, if you're so interested in advertising the parallel
>> features, do you *want* to remove these methods from Arrays?
>>
>
> Simply: return on API surface area.  The return for having it the one
> extra method on Collection is large; the return for having 100 extra
> methods for 100 infrequently-used factory methods is small (even in the
> aggregate).  Arrays.parallelStream() was eight more methods -- four types x
> two forms.  (Others here have argued that "too many forms of the same
> method is a smell"; also there have been plenty of calls for a round of
> YAGNIism.)
>
>      but I don't particularly like the idea of encouraging people to
>>     think this is a sequential library with some parallel bags nailed on
>>     the side
>>
>> Then again, users like consistency...
>>
>
> I'm not saying consistency is unimportant.  But in my experience,
> "consistency" can be used to justify nearly any position -- one can always
> find a precedent in a complex system to be "consistent" with.  So I want
> more than mere consistency.  (Not to mention many consistencies are of the
> "foolish hobgoblin" variety.)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130420/c4a631c5/attachment.html 

From tim at peierls.net  Sat Apr 20 16:49:18 2013
From: tim at peierls.net (Tim Peierls)
Date: Sat, 20 Apr 2013 19:49:18 -0400
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <CAHzJPErq_Uh95rC-3qq2gxA-kW0Ce7Q-ASKtL2o2D06+7T7JcA@mail.gmail.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
	<CAHzJPEr7WyWEqADjAu3_qSKA1fezsPZS+A=95UuFRkrDw+p8=A@mail.gmail.com>
	<51731894.80107@oracle.com>
	<CAHzJPEqisZJrH=9G329igyHDypCMkNk2DW8LRM4ZPGR_7xNkFA@mail.gmail.com>
	<517324BC.2050003@oracle.com>
	<CAHzJPErq_Uh95rC-3qq2gxA-kW0Ce7Q-ASKtL2o2D06+7T7JcA@mail.gmail.com>
Message-ID: <CA+F8eeQmgBz8Ok0fxek1S_wt5v+OsPpOb_znDvdn0V7kE9N7Jg@mail.gmail.com>

On Sat, Apr 20, 2013 at 7:47 PM, Joe Bowbeer <joe.bowbeer at gmail.com> wrote:

> Thanks for clarifying.
>
> Most of your justifications seem to be a matter of taste, so there is no
> use arguing. (Taste, like foolishness, defies argument.)
>
> Alas, after cleansing my mind of "foolish hobgoblins" and other
> distracting remarks, I think your proposal is an improvement, even without
> squinting.
>

Why "alas"? Or did was it auto-corrected/mistyped from "Also"?

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130420/23889aa1/attachment-0001.html 

From joe.bowbeer at gmail.com  Sat Apr 20 16:56:16 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Sat, 20 Apr 2013 16:56:16 -0700
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <CA+F8eeQmgBz8Ok0fxek1S_wt5v+OsPpOb_znDvdn0V7kE9N7Jg@mail.gmail.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
	<CAHzJPEr7WyWEqADjAu3_qSKA1fezsPZS+A=95UuFRkrDw+p8=A@mail.gmail.com>
	<51731894.80107@oracle.com>
	<CAHzJPEqisZJrH=9G329igyHDypCMkNk2DW8LRM4ZPGR_7xNkFA@mail.gmail.com>
	<517324BC.2050003@oracle.com>
	<CAHzJPErq_Uh95rC-3qq2gxA-kW0Ce7Q-ASKtL2o2D06+7T7JcA@mail.gmail.com>
	<CA+F8eeQmgBz8Ok0fxek1S_wt5v+OsPpOb_znDvdn0V7kE9N7Jg@mail.gmail.com>
Message-ID: <CAHzJPEp=ri2QH7pa01Y13LGCw+cyZUCwGkLzfWjeNLUOw+hzrg@mail.gmail.com>

Strike "Alas".  Thanks.


On Sat, Apr 20, 2013 at 4:49 PM, Tim Peierls <tim at peierls.net> wrote:

> On Sat, Apr 20, 2013 at 7:47 PM, Joe Bowbeer <joe.bowbeer at gmail.com>wrote:
>
>> Thanks for clarifying.
>>
>> Most of your justifications seem to be a matter of taste, so there is no
>> use arguing. (Taste, like foolishness, defies argument.)
>>
>> Alas, after cleansing my mind of "foolish hobgoblins" and other
>> distracting remarks, I think your proposal is an improvement, even without
>> squinting.
>>
>
> Why "alas"? Or did was it auto-corrected/mistyped from "Also"?
>
> --tim
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130420/8add6737/attachment.html 

From brian.goetz at oracle.com  Sun Apr 21 11:19:35 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sun, 21 Apr 2013 14:19:35 -0400
Subject: Static methods on Stream and friends
Message-ID: <51742DB7.1000302@oracle.com>

I moved the following from Streams to Stream:

   Stream.builder()
   Stream.empty()
   Stream.singleton(T)
   Stream.of(T...)
   Stream.iterate(T, T -> T)
   Stream.generate(i -> T)

with the same on {Int,Long,Double}Stream, and also

   {Int,Long,Double}Stream.range(start, end)
   {Int,Long,Double}Stream.range(start, end, step)

It was suggested on lambda-dev that we should rename singleton to simply 
be an overload of "of":

   Stream.of(T)
   Stream.of(T...)

which seems reasonable.

Remaining open issues:
  - Some people are unhappy that range is half-open (which also means 
people are constrained to ranges topping out at MAX_VALUE-1 rather than 
MAX_VALUE).  Some options:
    - Add XxxStream.rangeExclusive(start, end)
    - Further doc hints, such as renaming the parameters to 
startInclusive / endExclusive
    - Nothing
  - Paul has suggested that generate be finite.  While this is kind of 
yucky, the practical difference between infinite and long-sized is 
pretty much negligible, and the version based on 
LongStream.range().map() parallellizes much better.

I propose to accept the suggestion of s/singleton/of/, go the "doc hint" 
route on range, and go finite on generate.

Also never closed on whether there was value to ints() / longs() -- 
these show up in lots of teaching examples, though less so in real-world 
code.  Still, teaching people how to think about this stuff is important.


From tim at peierls.net  Sun Apr 21 11:30:26 2013
From: tim at peierls.net (Tim Peierls)
Date: Sun, 21 Apr 2013 14:30:26 -0400
Subject: Static methods on Stream and friends
In-Reply-To: <51742DB7.1000302@oracle.com>
References: <51742DB7.1000302@oracle.com>
Message-ID: <CA+F8eeT2eS=ZnU61nPt=0Tn-VWFtwvtkKMYZoPioJWXBFK6Uxw@mail.gmail.com>

On Sun, Apr 21, 2013 at 2:19 PM, Brian Goetz <brian.goetz at oracle.com> wrote:

> It was suggested on lambda-dev that we should rename singleton to simply
> be an overload of "of":
>
>   Stream.of(T)
>   Stream.of(T...)
>
> which seems reasonable.
>

Aren't there ambiguity problems with that pair of signatures? I would have
thought something like this:

Stream.of() // for empty
Stream.of(T) // for singleton
Stream.of(T, T, T...) // for two or more

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130421/fdcf3f3e/attachment.html 

From brian.goetz at oracle.com  Sun Apr 21 11:35:32 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Sun, 21 Apr 2013 14:35:32 -0400
Subject: Static methods on Stream and friends
In-Reply-To: <CA+F8eeT2eS=ZnU61nPt=0Tn-VWFtwvtkKMYZoPioJWXBFK6Uxw@mail.gmail.com>
References: <51742DB7.1000302@oracle.com>
	<CA+F8eeT2eS=ZnU61nPt=0Tn-VWFtwvtkKMYZoPioJWXBFK6Uxw@mail.gmail.com>
Message-ID: <51743174.1000007@oracle.com>

Two things here:

1.  Tim may be suggesting to go further and rename "Stream.empty" to 
"Stream.of()"?

2.  Query about method selection.

Method selection proceeds in three phases (see JLS 7/e 15.12.2):

   1.  no boxing or unboxing
   2.  with boxing/unboxing, but no varargs
   3.  with varargs.

So, Stream.of(T) will be considered before Stream.of(T...) is -- even 
for boxed streams like Stream<Integer>.  So I believe there is no need 
to extend the variable arity signature to of(T, T, T...).

On 4/21/2013 2:30 PM, Tim Peierls wrote:
> On Sun, Apr 21, 2013 at 2:19 PM, Brian Goetz <brian.goetz at oracle.com
> <mailto:brian.goetz at oracle.com>> wrote:
>
>     It was suggested on lambda-dev that we should rename singleton to
>     simply be an overload of "of":
>
>        Stream.of(T)
>        Stream.of(T...)
>
>     which seems reasonable.
>
>
> Aren't there ambiguity problems with that pair of signatures? I would
> have thought something like this:
>
> Stream.of() // for empty
> Stream.of(T) // for singleton
> Stream.of(T, T, T...) // for two or more
>
> --tim

From david.lloyd at redhat.com  Mon Apr 22 06:42:51 2013
From: david.lloyd at redhat.com (David M. Lloyd)
Date: Mon, 22 Apr 2013 08:42:51 -0500
Subject: Drop Arrays.parallelStream()?
In-Reply-To: <517318EA.7060801@oracle.com>
References: <51730D05.5030007@oracle.com>
	<D3C4D46C-5495-4D26-A5E1-8084A06972AC@oracle.com>
	<CA+F8eeTMpoALe7k59UPSTMHq0Up44020BWjM3rVSbL56uQA19A@mail.gmail.com>
	<517318EA.7060801@oracle.com>
Message-ID: <51753E5B.2040600@redhat.com>

On 04/20/2013 05:38 PM, Brian Goetz wrote:
>> For most folks, the expectation and intuition will be sequential, so
>> take advantage of that: Let people come to c.stream().parallel() slowly
>> and deliberately, after getting their feet wet with c.stream().
>
> I have a slightly different viewpoint about the value of this sequential
> intuition -- I view the pervasive "sequential expectation" as one if the
> biggest challenges of this entire effort; people are *constantly*
> bringing their incorrect sequential bias, which leads them to do stupid
> things like using a one-element array as a way to "trick" the "stupid"
> compiler into letting them capture a mutable local, or using lambdas as
> arguments to map that mutate state that will be used during the
> computation (in a non-thread-safe way), and then, when its pointed out
> that what they're doing, shrug it off and say "yeah, but I'm not doing
> it in parallel."
>
> We've made a lot of design tradeoffs to merge sequential and parallel
> streams.  The result, I believe, is a clean one and will add to the
> library's chances of still being useful in 10+ years, but I don't
> particularly like the idea of encouraging people to think this is a
> sequential library with some parallel bags nailed on the side.

Well, just the term "stream" really screams "sequential", so there's that.


-- 
- DML

From paul.sandoz at oracle.com  Mon Apr 22 09:09:56 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Mon, 22 Apr 2013 18:09:56 +0200
Subject: Pattern.splitAsStream/asPredicate
Message-ID: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com>

Hi,

It seems useful to provide an ability to create a stream from matches of a pattern, plus as a bonus create a predicate for matches of a pattern.

See below for more details:

  http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/

Thoughts?

Paul.


From joe.bowbeer at gmail.com  Wed Apr 24 09:23:43 2013
From: joe.bowbeer at gmail.com (Joe Bowbeer)
Date: Wed, 24 Apr 2013 09:23:43 -0700
Subject: Pattern.splitAsStream/asPredicate
In-Reply-To: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com>
References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com>
Message-ID: <CAHzJPEr3q-XYF-s8BaPwRxWc3CPySgsd-2KwJ2Ni0H9QEamuyw@mail.gmail.com>

Makes sense to me that one might want to generate a stream from a Pattern.
Is there more to this than splitAsStream?

It's also interesting to consider the absence of parallel stream options at
this and the other stream factory sites.


On Apr 22, 2013 9:10 AM, "Paul Sandoz" <paul.sandoz at oracle.com> wrote:

> Hi,
>
> It seems useful to provide an ability to create a stream from matches of a
> pattern, plus as a bonus create a predicate for matches of a pattern.
>
> See below for more details:
>
>   http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/
>
> Thoughts?
>
> Paul.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130424/b6306adc/attachment.html 

From brian.goetz at oracle.com  Wed Apr 24 10:13:35 2013
From: brian.goetz at oracle.com (Brian Goetz)
Date: Wed, 24 Apr 2013 12:13:35 -0500
Subject: Pattern.splitAsStream/asPredicate
In-Reply-To: <CAHzJPEr3q-XYF-s8BaPwRxWc3CPySgsd-2KwJ2Ni0H9QEamuyw@mail.gmail.com>
References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com>
	<CAHzJPEr3q-XYF-s8BaPwRxWc3CPySgsd-2KwJ2Ni0H9QEamuyw@mail.gmail.com>
Message-ID: <CEDBB674-B052-4374-9C5D-30D0C7D960E1@oracle.com>

There definitely could be more to this.  For example, a common usage pattern for matching is:

 while (more) { 
     // get the next match
     // get the stuff between the last match and the start of this match
     // do something with that
     // do something with the current match
 }

So while getting the matches is good, getting at the stuff between the matches is also sometimes useful.  Is there an easy way to do that, such as providing a Stream<Match>?  

There's an easy way for streams like this to be never-parallel -- create them from a Spliterator whose trySplit always returns null.  Then, even parallel execution will always be serial.  I don't think there's a need for an abstraction for that -- just build off a non-splittable iterator.  

But, there may also be some parallelism to extract, if the post-processing on a match is high-Q.  Then you might still be able to overcome the sequentiality of generating matches if the per-match post processing is high enough.  

On Apr 24, 2013, at 11:23 AM, Joe Bowbeer wrote:

> Makes sense to me that one might want to generate a stream from a Pattern. Is there more to this than splitAsStream?
> 
> It's also interesting to consider the absence of parallel stream options at this and the other stream factory sites.
> 
> 
> 
> 
> On Apr 22, 2013 9:10 AM, "Paul Sandoz" <paul.sandoz at oracle.com> wrote:
> Hi,
> 
> It seems useful to provide an ability to create a stream from matches of a pattern, plus as a bonus create a predicate for matches of a pattern.
> 
> See below for more details:
> 
>   http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/
> 
> Thoughts?
> 
> Paul.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130424/146a6826/attachment.html 

From forax at univ-mlv.fr  Wed Apr 24 10:16:55 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Wed, 24 Apr 2013 19:16:55 +0200
Subject: Pattern.splitAsStream/asPredicate
In-Reply-To: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com>
References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com>
Message-ID: <51781387.90902@univ-mlv.fr>

On 04/22/2013 06:09 PM, Paul Sandoz wrote:
> Hi,
>
> It seems useful to provide an ability to create a stream from matches of a pattern, plus as a bonus create a predicate for matches of a pattern.
>
> See below for more details:
>
>    http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/
>
> Thoughts?
>
> Paul.
>

Hi Paul,
MatcherIterator should not be a local class of splitAsStream,
because the reference to the current Pattern will be kept
even if the Matcher not reference if anymore
(note that the current implementation of the Matcher always references
the Pattern object but maybe at some point the automata will be
transformed to bytecode as by example V8 does).
To summarize, the class MatcherIterator defines 4 fields instead of 3.

The is no need to initialize current and nextElement to their default 
values,
javac emits bytecodes for that.

in next(), the else is useless and it's rare in the jdk sources to find 
a else after a throw.
in hasNext(), you can re-order the branch of the first test to avoid the 
code to be shifted to the right.
    if (nextElement != null) {
      return true;
    }
    if (current == input.length()) {
      ...

and yes, this method is useful :)

cheers,
R?mi


From forax at univ-mlv.fr  Fri Apr 26 02:22:07 2013
From: forax at univ-mlv.fr (Remi Forax)
Date: Fri, 26 Apr 2013 11:22:07 +0200
Subject: RFR : JDK-8001642 : Add Optional<T>, OptionalDouble, OptionalInt, 
	OptionalLong
In-Reply-To: <CAGKkBktCFBXvmBed904ahm=NrQ_dFUH3tcRYHr0-DCyzGrDShg@mail.gmail.com>
References: <C6D1CA05-31CF-491A-9D2F-8DF7E04CC503@oracle.com>
	<513710CC.3010903@univ-mlv.fr>
	<CAGKkBktCFBXvmBed904ahm=NrQ_dFUH3tcRYHr0-DCyzGrDShg@mail.gmail.com>
Message-ID: <517A473F.3030906@univ-mlv.fr>

On 03/28/2013 07:23 PM, Kevin Bourrillion wrote:
> I do NOT wish to restart this discussion; I just noticed a falsehood 
> that was never exposed:

What I should have written is that Guava unlike the JDK allows to create 
an Optional from null,
the fact that it stores null or not is an implementation detail.

R?mi

>
>
> On Wed, Mar 6, 2013 at 1:47 AM, Remi Forax <forax at univ-mlv.fr 
> <mailto:forax at univ-mlv.fr>> wrote:
>
>     Google's Guava, which is a popular library, defines a class named
>     Optional, but allow to store null unlike the current proposed
>     implementation, this will generate a lot of confusions and
>     frustrations.
>
>
> Guava's Optional /cannot/ be used to hold null. So this particular 
> concern is not a concern at all.
>
>
> -- 
> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com 
> <mailto:kevinb at google.com>


From paul.sandoz at oracle.com  Fri Apr 26 03:37:59 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Fri, 26 Apr 2013 12:37:59 +0200
Subject: Pattern.splitAsStream/asPredicate
In-Reply-To: <51781387.90902@univ-mlv.fr>
References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com>
	<51781387.90902@univ-mlv.fr>
Message-ID: <7C8F8184-93B4-4895-8F03-46340B84177A@oracle.com>


On Apr 24, 2013, at 7:16 PM, Remi Forax <forax at univ-mlv.fr> wrote:

> On 04/22/2013 06:09 PM, Paul Sandoz wrote:
>> Hi,
>> 
>> It seems useful to provide an ability to create a stream from matches of a pattern, plus as a bonus create a predicate for matches of a pattern.
>> 
>> See below for more details:
>> 
>>   http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/
>> 
>> Thoughts?
>> 
>> Paul.
>> 
> 
> Hi Paul,
> MatcherIterator should not be a local class of splitAsStream,
> because the reference to the current Pattern will be kept
> even if the Matcher not reference if anymore
> (note that the current implementation of the Matcher always references
> the Pattern object but maybe at some point the automata will be
> transformed to bytecode as by example V8 does).

Matcher returns it too:

    /**
     * Returns the pattern that is interpreted by this matcher.
     *
     * @return  The pattern for which this matcher was created
     */
    public Pattern pattern() {


> To summarize, the class MatcherIterator defines 4 fields instead of 3.
> 

Yes, it's an inner class, but  I prefer the locality, since splitAsStream is the only method that uses the class.


> The is no need to initialize current and nextElement to their default values,
> javac emits bytecodes for that.
> 
> in next(), the else is useless and it's rare in the jdk sources to find a else after a throw.
> in hasNext(), you can re-order the branch of the first test to avoid the code to be shifted to the right.
>   if (nextElement != null) {
>     return true;
>   }
>   if (current == input.length()) {
>     ...
> 

Thanks i have cleaned up that code.

Paul.


> and yes, this method is useful :)
> 
> cheers,
> R?mi
> 
> 
> 
> 
> 


From paul.sandoz at oracle.com  Fri Apr 26 04:00:27 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Fri, 26 Apr 2013 13:00:27 +0200
Subject: Pattern.splitAsStream/asPredicate
In-Reply-To: <CEDBB674-B052-4374-9C5D-30D0C7D960E1@oracle.com>
References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com>
	<CAHzJPEr3q-XYF-s8BaPwRxWc3CPySgsd-2KwJ2Ni0H9QEamuyw@mail.gmail.com>
	<CEDBB674-B052-4374-9C5D-30D0C7D960E1@oracle.com>
Message-ID: <42E53CA6-122F-41B8-9B49-F394396F5DAB@oracle.com>


On Apr 24, 2013, at 7:13 PM, Brian Goetz <brian.goetz at Oracle.COM> wrote:

> There definitely could be more to this.  For example, a common usage pattern for matching is:
> 
>  while (more) { 
>      // get the next match
>      // get the stuff between the last match and the start of this match
>      // do something with that
>      // do something with the current match
>  }
> 
> So while getting the matches is good, getting at the stuff between the matches is also sometimes useful.  Is there an easy way to do that, such as providing a Stream<Match>?  
> 

It's awkward with the current types. A Matcher of a Pattern is mutable and MatchResult (which would need to be cloned via Matcher.toMatchResult) only provides access to a match. 

The prefix characters before a match need to be tracked independently, as do the remaining characters after no further matches. So we would require a stream of say (String prefix, MatchResult r) where r is null, or an empty match, for the last tuple in the stream.

We can add methods to Matcher that behave the same way as the String bearing methods:

    public String replaceAll(Function<MatchResult, String> f)

    public String replaceFirst(Function<MatchResult, String> f)
 

> There's an easy way for streams like this to be never-parallel -- create them from a Spliterator whose trySplit always returns null.  Then, even parallel execution will always be serial.  I don't think there's a need for an abstraction for that -- just build off a non-splittable iterator.  
> 
> But, there may also be some parallelism to extract, if the post-processing on a match is high-Q.  Then you might still be able to overcome the sequentiality of generating matches if the per-match post processing is high enough.  
> 

Right, i think it would be incorrect to make any predictions about Q.

Paul.

From tim at peierls.net  Fri Apr 26 04:45:37 2013
From: tim at peierls.net (Tim Peierls)
Date: Fri, 26 Apr 2013 07:45:37 -0400
Subject: RFR : JDK-8001642 : Add Optional<T>, OptionalDouble, OptionalInt,
	OptionalLong
In-Reply-To: <517A473F.3030906@univ-mlv.fr>
References: <C6D1CA05-31CF-491A-9D2F-8DF7E04CC503@oracle.com>
	<513710CC.3010903@univ-mlv.fr>
	<CAGKkBktCFBXvmBed904ahm=NrQ_dFUH3tcRYHr0-DCyzGrDShg@mail.gmail.com>
	<517A473F.3030906@univ-mlv.fr>
Message-ID: <CA+F8eeQ0FCSZQEUoWCT5DpWi7Ye5q9=T0zSpxSL9gCAh-oa+RQ@mail.gmail.com>

On Fri, Apr 26, 2013 at 5:22 AM, Remi Forax <forax at univ-mlv.fr> wrote:

> On 03/28/2013 07:23 PM, Kevin Bourrillion wrote:
>
>> I do NOT wish to restart this discussion; I just noticed a falsehood that
>> was never exposed: Guava's Optional /cannot/ be used to hold null. So this
>> particular concern is not a concern at all.
>
>
> What I should have written is that Guava unlike the JDK allows to create
> an Optional from null,
> the fact that it stores null or not is an implementation detail.
>

Kevin's point was that there's no need to worry about confusion over this
particular difference. The method used in Guava to create an Optional from
a reference that might be null has a name that makes this very clear:
Optional.fromNullable. Both Guava and JDK have null-rejecting Optional.of()
methods.

--tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20130426/16114902/attachment.html