From dl at cs.oswego.edu Mon Apr 1 05:37:59 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 01 Apr 2013 08:37:59 -0400 Subject: sorting and stability In-Reply-To: <515318BB.2030805@cs.oswego.edu> References: <5150DE09.3020505@cs.oswego.edu> <515318BB.2030805@cs.oswego.edu> Message-ID: <51597FA7.9070207@cs.oswego.edu> On 03/27/13 12:05, Doug Lea wrote: > > * The previous versions required temp workspace > arrays as large as source array, even if only a > portion was being sorted. Now they don't. Except (and this took an embarrassingly long time to track down), DualPivotQuickSort itself (as of JDK7) sometimes creates a temp array, and if so, allocates it to be as long as the array, not the slice. Now fixed. This led to crazy anomalies during tests that made me suspect all kinds of other problems with parallel versions. As a side benefit though, it did lead to a few minor improvements made while rechecking everything except what I should have been looking at. Another implementation note: While I cleaned up some of it, Arrays.java and the internal DualPivotQuickSort, TimSort, and ComparableTimSort classes are still inconsistent about which side does range checks and call conversion into internal forms vs propagating convenience methods. Someday this should be straightened out, so that only Arrays.java does these, calling only expanded internal forms in the sorter classes. Hilariously, this requires that method rangeCheck be removed from TimSort. Paul Sandoz: Please find sorts2.tar in the usual place. -Doug From kevinb at google.com Mon Apr 1 07:23:27 2013 From: kevinb at google.com (Kevin Bourrillion) Date: Mon, 1 Apr 2013 07:23:27 -0700 Subject: Spec and API review for {Int,Long,Double}SummaryStatistics In-Reply-To: <1E7C3B20-8B4A-4782-BD59-B82ACD7AF4DB@oracle.com> References: <514CD46F.9020508@oracle.com> <1BC38610-51E9-4A69-A1E7-192880618E5F@oracle.com> <1E7C3B20-8B4A-4782-BD59-B82ACD7AF4DB@oracle.com> Message-ID: I'm confused, but I've seen nothing to change my impression that exposing sumOfSquares is not helpful. As unpleasant as it may seem, if we want to address the variance case at all, I think we have little choice but to expose sampleVariance() and populationVariance() ourselves, and then *those* can use Kahan summation or whatever (which internally computes "sum of squares of deltas", not sum of squares, as I (don't) understand it). On Fri, Mar 29, 2013 at 3:16 PM, Brian Goetz wrote: > > Also, while I'm here... > > > > Exposing sumOfSquares() does not permit users to safely calculate > variance, which I believe makes it fairly useless and even dangerous: > > > > "The failure of Cauchy's fundamental inequality is another important > example of the breakdown of traditional algebra in the presence of floating > point arithmetic...Novice programmers who calculate the standard deviation > of some observations by using the textbook formula [formula for the > standard deviation in terms of the sum of squares] often find themselves > taking the square root of a negative number!" (Knuth AoCP vol 2, section > 4.2.2) > > Thanks for raising this issue again -- I'd meant to respond earlier. I > ran this by our numerics guys. > > Basically, the problem is that for floating point numbers, since squaring > makes small numbers smaller and big numbers bigger, summing squares in the > obvious way risks the usual problem with adding numbers of grossly > differing magnitudes. So while the naive factoring of population/sample > variance allows you to compute them from sum(x) and sum(x^2), the latter is > potentially numerically challenged. (Note that this problem doesn't exist > for int/long, assuming a long is big enough to compute sum(x^2) without > overflow.) > > Still, I am not sure we do users a favor by leaving this out. Many of > them are likely to simply extend DoubleSummaryStatistics to calculate > sum(x^2) anyway. And the only other alternative is horrible; stream the > data into a collection and make two passes on it, one for mean and one for > variance. That's at least 3x as expensive, if you can fit the whole thing > in memory in the first place. > > The Knuth section you cite also offers a means to calculate variance more > effectively in a single pass using a recurrence relation based on Kahan > summation. So I think the winning move is to provide a better > implementation of sumsq than either of the naive implementations above, one > that uses real numerics fu. (We intend to provide a better implementation > of summation for DoubleSummaryStatistics as well, based on Kahan.) > > Of course the crappy implementation that is in there now is less than > ideal. > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com From brian.goetz at oracle.com Mon Apr 1 07:47:23 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 01 Apr 2013 10:47:23 -0400 Subject: Spec and API review for {Int,Long,Double}SummaryStatistics In-Reply-To: References: <514CD46F.9020508@oracle.com> <1BC38610-51E9-4A69-A1E7-192880618E5F@oracle.com> <1E7C3B20-8B4A-4782-BD59-B82ACD7AF4DB@oracle.com> Message-ID: <51599DFB.1020405@oracle.com> The motivation for sumOfSquares() is indeed to help in calculation of variance. As you've noted, there are multiple forms this can take (e.g., sample vs population). Modulo numerical issues, sum(sq) is an input to all the various forms, so we theoretically stay out of whack-a-mole territory by providing this form rather than trying to provide all the various forms people might want. Note that *not* providing any help here is a disaster for those who want it; they have to materialize the collection and then make two passes. Its not like those users can just (safely) extend the summary statistics to also calculate the part they need. Note also that for numeric types like long, there are no numerical issues. So punishing long for his brother's instability just seems mean. (For those following at home: the formalae for variance involve: sum((x_i - \bar x)^2) where \bar x is the average of x. Which means you would first have to make a pass to find the average, and then make another pass to calculate sum of squares of deviation from the mean. Factoring the above: (x_i - \bar x)^2 = x_i^2 - 2 x_i \bar x + (\bar x)^2 So the sum can be expressed in terms of average and sum of squares, and done in a single pass. But unfortunately since squaring makes big numbers bigger and small numbers smaller, you end up risking adding 10^20 and 10^-20 and losing data when done with floating points.) On 4/1/2013 10:23 AM, Kevin Bourrillion wrote: > I'm confused, but I've seen nothing to change my impression that > exposing sumOfSquares is not helpful. As unpleasant as it may seem, if > we want to address the variance case at all, I think we have little > choice but to expose sampleVariance() and populationVariance() > ourselves, and then /those/ can use Kahan summation or whatever (which > internally computes "sum of squares of deltas", not sum of squares, as I > (don't) understand it). > > > On Fri, Mar 29, 2013 at 3:16 PM, Brian Goetz > wrote: > > > Also, while I'm here... > > > > Exposing sumOfSquares() does not permit users to safely calculate > variance, which I believe makes it fairly useless and even dangerous: > > > > "The failure of Cauchy's fundamental inequality is another > important example of the breakdown of traditional algebra in the > presence of floating point arithmetic...Novice programmers who > calculate the standard deviation of some observations by using the > textbook formula [formula for the standard deviation in terms of the > sum of squares] often find themselves taking the square root of a > negative number!" (Knuth AoCP vol 2, section 4.2.2) > > Thanks for raising this issue again -- I'd meant to respond earlier. > I ran this by our numerics guys. > > Basically, the problem is that for floating point numbers, since > squaring makes small numbers smaller and big numbers bigger, summing > squares in the obvious way risks the usual problem with adding > numbers of grossly differing magnitudes. So while the naive > factoring of population/sample variance allows you to compute them > from sum(x) and sum(x^2), the latter is potentially numerically > challenged. (Note that this problem doesn't exist for int/long, > assuming a long is big enough to compute sum(x^2) without overflow.) > > Still, I am not sure we do users a favor by leaving this out. Many > of them are likely to simply extend DoubleSummaryStatistics to > calculate sum(x^2) anyway. And the only other alternative is > horrible; stream the data into a collection and make two passes on > it, one for mean and one for variance. That's at least 3x as > expensive, if you can fit the whole thing in memory in the first place. > > The Knuth section you cite also offers a means to calculate variance > more effectively in a single pass using a recurrence relation based > on Kahan summation. So I think the winning move is to provide a > better implementation of sumsq than either of the naive > implementations above, one that uses real numerics fu. (We intend > to provide a better implementation of summation for > DoubleSummaryStatistics as well, based on Kahan.) > > Of course the crappy implementation that is in there now is less > than ideal. > > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From jim at pentastich.org Mon Apr 1 10:53:32 2013 From: jim at pentastich.org (Jim Mayer) Date: Mon, 1 Apr 2013 13:53:32 -0400 Subject: Spec and API review for {Int,Long,Double}SummaryStatistics In-Reply-To: <51599DFB.1020405@oracle.com> References: <514CD46F.9020508@oracle.com> <1BC38610-51E9-4A69-A1E7-192880618E5F@oracle.com> <1E7C3B20-8B4A-4782-BD59-B82ACD7AF4DB@oracle.com> <51599DFB.1020405@oracle.com> Message-ID: On Mon, Apr 1, 2013 at 10:47 AM, Brian Goetz wrote: > The motivation for sumOfSquares() is indeed to help in calculation of > variance. As you've noted, there are multiple forms this can take (e.g., > sample vs population). Modulo numerical issues, sum(sq) is an input to all > the various forms, so we theoretically stay out of whack-a-mole territory > by providing this form rather than trying to provide all the various forms > people might want. > > Note that *not* providing any help here is a disaster for those who want > it; they have to materialize the collection and then make two passes. Its > not like those users can just (safely) extend the summary statistics to > also calculate the part they need. > > Note also that for numeric types like long, there are no numerical issues. > So punishing long for his brother's instability just seems mean. > > Sadly, while this is true for sumsq, is is not true for the calculation of variance using the sum of squares. The problems occur when either the individual values or N is large. Here's an example: Sample size: 100 Values: all values are 1, except for one that is 2. sumsq -> 103 sum(x)^2/N -> 102.01 sumsq-sum(x)^2/N -> 0.99 Sample size: 1000000 Values: all values are 1, except for one that is 2. sumsq -> 1000003 sum(x)^2/N -> 1000002.000001 sumsq-sum(x)^2/N -> .999999 Basically, as N gets bigger the sums get larger and larger while the variance approaches one. This is an unstable computation. Jim Mayer From brian.goetz at oracle.com Mon Apr 1 12:28:07 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 01 Apr 2013 15:28:07 -0400 Subject: Survey results Message-ID: <5159DFC7.4070001@oracle.com> Survey results for the last two surveys are here: https://www.surveymonkey.com/sr.aspx?sm=Rmxo_2fOmocQqW5Txn1rPztBT4bwQsjNcCWzomugR5Fsg_3d https://www.surveymonkey.com/sr.aspx?sm=KxnVsqG2kS7L_2bayV3Kg_2bu2Qi40QNOfB8penEX2R4Cuc_3d Mike has already responded to the comments for XxxSummaryStatistics. I have integrated the comments for Stream into a recent push, and propagated them forward to {Int,Long,Double}Stream. I have removed forEachUntilCancelled based on evidence that people find it too confusing. We're working on a new proposal for cancelation, stay tuned. From paul.sandoz at oracle.com Tue Apr 2 06:34:06 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 2 Apr 2013 15:34:06 +0200 Subject: RFR JDK-8010096 : Initial java.util.Spliterator putback In-Reply-To: <515AD100.5040103@oracle.com> References: <09A8DF98-6FF6-452E-8150-E86D9113E580@oracle.com> <515AD100.5040103@oracle.com> Message-ID: On Apr 2, 2013, at 2:37 PM, Chris Hegarty wrote: > Nice work Paul, some small comments. > > - new javadocs tags, @implSpec, @apiNote, etc. I really like the use of > implSpec to define the behavior of this implementations default > methods. There is probably a separate thread, but any idea when these > will be generated in the javadoc, not just the lambda docs? > I do not know, Mike is the one who is very likely to know more. > - Iterator.remove @since 1.8? I see there is a conflict here between > when the method was originally added and its default > Right, that is most likely a mistake. How can we express that the default method is there since 1.8? > - Spliterator class level examples are not showing in the specdiff. > Are these really API Notes? Maybe they are. > The examples are non-normative so i think such docs can be categorized under @apiNote. See here for generated JavaDoc from the lambda repo: http://cr.openjdk.java.net/~psandoz/lambda/spliterator/jdk-8010096/api/java/util/Spliterator.html Paul. From brian.goetz at oracle.com Wed Apr 3 10:27:32 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 03 Apr 2013 13:27:32 -0400 Subject: Additional Collectors Message-ID: <515C6684.8020007@oracle.com> There's been some feedback on lambda-dev and from the recent Lambda Hack Day on Collectors. There were two big categories: 1. Need more / better docs. 2. We want some more collectors. The first is obvious and we've been working on those. Here are some suggestions for simple additions to the Collector set. - count() (and possibly sum, min, max) These are straighforward analogues of the specialized stream methods; they serve as a "gentle on ramp" to understanding reduction. People also expressed concern that the "toMap()" (nee mappedTo, joiningWith) is not flexible enough. As a reminder, what toMap does is take a Stream and a function T->U and produces a Map. Some people call this "backwards"; they would rather have something that takes a Stream and function T->K and produces a Map. And others would rather have something that takes two functions T->K and T->U and produces a Map. All of these are useful enough. The question is how to fit them into the API. I think the name "toMap" is a bit of a challenge, since there are several "modes" and not all of them can be easily handled by overloads. Maybe: toMap(T->U) // first version toMap(T->K, T->U) // third version and leave the second version out, since the third version can easily simulate the second? From forax at univ-mlv.fr Wed Apr 3 10:41:17 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 03 Apr 2013 19:41:17 +0200 Subject: Additional Collectors In-Reply-To: <515C6684.8020007@oracle.com> References: <515C6684.8020007@oracle.com> Message-ID: <515C69BD.3090505@univ-mlv.fr> On 04/03/2013 07:27 PM, Brian Goetz wrote: > There's been some feedback on lambda-dev and from the recent Lambda > Hack Day on Collectors. There were two big categories: > > 1. Need more / better docs. > > 2. We want some more collectors. > > The first is obvious and we've been working on those. Here are some > suggestions for simple additions to the Collector set. > > - count() (and possibly sum, min, max) > > These are straighforward analogues of the specialized stream methods; > they serve as a "gentle on ramp" to understanding reduction. > > People also expressed concern that the "toMap()" (nee mappedTo, > joiningWith) is not flexible enough. As a reminder, what toMap does > is take a Stream and a function T->U and produces a Map. Some > people call this "backwards"; they would rather have something that > takes a Stream and function T->K and produces a Map. And > others would rather have something that takes two functions T->K and > T->U and produces a Map. > > All of these are useful enough. The question is how to fit them into > the API. I think the name "toMap" is a bit of a challenge, since > there are several "modes" and not all of them can be easily handled by > overloads. Maybe: better if you rename U to V > > toMap(T->V) // first version produces a Map > toMap(T->K, T->V) // third version produces a Map. why toMap(T -> V) is not toMap(T -> T, T, -> V) ? in that case, we only need one toMap. > > and leave the second version out, since the third version can easily > simulate the second? > cheers, R?mi From tim at peierls.net Wed Apr 3 10:49:40 2013 From: tim at peierls.net (Tim Peierls) Date: Wed, 3 Apr 2013 13:49:40 -0400 Subject: Additional Collectors In-Reply-To: <515C6684.8020007@oracle.com> References: <515C6684.8020007@oracle.com> Message-ID: On Wed, Apr 3, 2013 at 1:27 PM, Brian Goetz wrote: > People also expressed concern that the "toMap()" (nee mappedTo, > joiningWith) is not flexible enough. As a reminder, what toMap does is > take a Stream and a function T->U and produces a Map. Some people > call this "backwards"; they would rather have something that takes a > Stream and function T->K and produces a Map. And others would > rather have something that takes two functions T->K and T->U and produces a > Map. > The second form (Stream and T->K producing Map) could be called "indexing", so toIndexMap or toIndex would seem appropriate. I don't have a sense of a natural name for the third form. toMap still seems good for the first form. > All of these are useful enough. The question is how to fit them into the > API. I think the name "toMap" is a bit of a challenge, since there are > several "modes" and not all of them can be easily handled by overloads. > Maybe: > > toMap(T->U) // first version > toMap(T->K, T->U) // third version > > and leave the second version out, since the third version can easily > simulate the second? > Maybe, but I like the thought of toIndex or something like that. --tim From brian.goetz at oracle.com Wed Apr 3 10:53:30 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 03 Apr 2013 13:53:30 -0400 Subject: Performance update Message-ID: <515C6C9A.2000105@oracle.com> With Doug's help, we've been beating on the performance of the Streams implementation. We've been in pretty good shape all along with per-element overhead, since we departed from Iterator very early on. But we've been struggling with startup overhead. As the API has stabilized and many simplifying assumptions have been made (e.g., recent simplification of sequential/parallel, outlawing "reuse", outlawing "forked" streams, etc), we've recently been able to make a refactoring pass that reduces the object count for setting up a stream. Highlights of this include: - merging PipelineHelper into AbstractPipeline; - eliminating the Supplier capture even when the client provides a late-binding Spliterator; - Recasting the Op implementations as "extends XxxPipeline" instead of having the pipeline object encapsulate the Op (2x reduction) - merging TerminalOp and TerminalSink for some operations, including forEach Some of this is already in, but the rest should be going in the next few days. It does not affect the public API at all. We've also opened the door to implementing some parallel stateful operations without full barriers, so they can be better pipelined. For example, limit/substream on a stream that is SIZED+SUBSIZED can be expressed as a wrapping spliterator without touching the data or computing elements that won't be part of the result. The new implementation strategy permits this, though we have to do some more work to upgrade the candidate operations. Further, many of the "expensive" setup operations are now avoided for sequential and stateless parallel pipelines, only being paid by parallel pipelines with stateful ops. From brian.goetz at oracle.com Wed Apr 3 11:02:33 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 03 Apr 2013 14:02:33 -0400 Subject: Additional Collectors In-Reply-To: References: <515C6684.8020007@oracle.com> Message-ID: <515C6EB9.5030301@oracle.com> There are basically three strategies here we could take: 1. Annoint one direction as the "natural" direction and make the other either fit into the general form (as I proposed) or have a modified name (as Tim proposes.) I am fine with either of these. (There will always be those who say "you picked the wrong direction to annoint.) 2. Lard up both names with a directionality. 3. Pick totally new names, such as mappedTo and indexedFrom. Also we want to avoid variant overload. Right now we have two versions each of toMap, toConcurrentMap; a simple (mapping function only) one, and a kitchen-sink (mapping function + map ctor + merge function). This was already a compromise to keep the count low. If we go with my suggestion (keep T->U form, plus add one more general form) that is 4 new methods. If we decide to have forms for all three forms, that's 8 new methods. I think if we do Map toMap(T->U) and Map toMap(T->K, T->U) we can call them both toMap and people will get it. On 4/3/2013 1:49 PM, Tim Peierls wrote: > On Wed, Apr 3, 2013 at 1:27 PM, Brian Goetz > wrote: > > People also expressed concern that the "toMap()" (nee mappedTo, > joiningWith) is not flexible enough. As a reminder, what toMap does > is take a Stream and a function T->U and produces a Map. > Some people call this "backwards"; they would rather have > something that takes a Stream and function T->K and produces a > Map. And others would rather have something that takes two > functions T->K and T->U and produces a Map. > > > The second form (Stream and T->K producing Map) could be called > "indexing", so toIndexMap or toIndex would seem appropriate. I don't > have a sense of a natural name for the third form. toMap still seems > good for the first form. > > All of these are useful enough. The question is how to fit them > into the API. I think the name "toMap" is a bit of a challenge, > since there are several "modes" and not all of them can be easily > handled by overloads. Maybe: > > toMap(T->U) // first version > toMap(T->K, T->U) // third version > > and leave the second version out, since the third version can easily > simulate the second? > > > Maybe, but I like the thought of toIndex or something like that. > > --tim > From ali.ebrahimi1781 at gmail.com Wed Apr 3 11:21:47 2013 From: ali.ebrahimi1781 at gmail.com (Ali Ebrahimi) Date: Wed, 3 Apr 2013 22:51:47 +0430 Subject: Additional Collectors In-Reply-To: <515C6684.8020007@oracle.com> References: <515C6684.8020007@oracle.com> Message-ID: Hi brian, I have concerns about toMap method and this may result in unexpected and unpredictable results in user program, and this method only have mean for unique collections (Set) and streams (resulted for Stream.distinct). Consider this example: class Entity{ int id; //key field String name; // override equals and hashcode .... } Entity foo = new Entity(1, "Foo"); Entity bar = new Entity(1, "Bar"); List entities = list(new Entity(0, "Some"),foo,..., bar,...) Map entitymap=entities.stream().collect(ToMap(e -> e.name)); what is result of entitymap.get(foo)? "Foo" or "Bar" Map entitymap2=entities.parallelStream().collect(ToMap(e -> e.name)); what is result of entitymap2.get(foo)? "Foo" or "Bar" Suggestion1: get rid of ToMap Suggestion 2: May be we need consider adding subclass UniqueStream with additional method toMap and change return type of Stream.distinct and Set.stream to UniqueStream. What do you think? Ali Ebrahimi On Wed, Apr 3, 2013 at 9:57 PM, Brian Goetz wrote: > There's been some feedback on lambda-dev and from the recent Lambda Hack > Day on Collectors. There were two big categories: > > 1. Need more / better docs. > > 2. We want some more collectors. > > The first is obvious and we've been working on those. Here are some > suggestions for simple additions to the Collector set. > > - count() (and possibly sum, min, max) > > These are straighforward analogues of the specialized stream methods; they > serve as a "gentle on ramp" to understanding reduction. > > People also expressed concern that the "toMap()" (nee mappedTo, > joiningWith) is not flexible enough. As a reminder, what toMap does is > take a Stream and a function T->U and produces a Map. Some people > call this "backwards"; they would rather have something that takes a > Stream and function T->K and produces a Map. And others would > rather have something that takes two functions T->K and T->U and produces a > Map. > > All of these are useful enough. The question is how to fit them into the > API. I think the name "toMap" is a bit of a challenge, since there are > several "modes" and not all of them can be easily handled by overloads. > Maybe: > > toMap(T->U) // first version > toMap(T->K, T->U) // third version > > and leave the second version out, since the third version can easily > simulate the second? > > From brian.goetz at oracle.com Wed Apr 3 11:31:52 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 03 Apr 2013 14:31:52 -0400 Subject: Additional Collectors In-Reply-To: References: <515C6684.8020007@oracle.com> Message-ID: <515C7598.2010603@oracle.com> This should be handled by the two overloadings of toMap. The general one: toMap(Function, Supplier, BinaryOperator) takes a merge function which resolves duplicates. The default form: toMap(Function) implicitly uses a merge function which throws. The doc says: If the input elements contains duplicates (according to {@link Object#equals(Object)}), an {@code IllegalStateException} is thrown when the collection operation is performed. for the basic form and documents the use of the merge function for the more general form. Is that not adequate? On 4/3/2013 2:21 PM, Ali Ebrahimi wrote: > Hi brian, > I have concerns about toMap method and this may result in unexpected and > unpredictable results in user program, and this method only have mean > for unique collections (Set) and streams (resulted for Stream.distinct). > Consider this example: > > class Entity{ > int id; //key field > String name; > // override equals and hashcode > .... > } > > Entity foo = new Entity(1, "Foo"); > Entity bar = new Entity(1, "Bar"); > List entities = list(new Entity(0, "Some"),foo,..., bar,...) > > Map entitymap=entities.stream().collect(ToMap(e -> e.name > )); > > what is result of entitymap.get(foo)? "Foo" or "Bar" > > Map entitymap2=entities.parallelStream().collect(ToMap(e > -> e.name )); > > what is result of entitymap2.get(foo)? "Foo" or "Bar" > > Suggestion1: get rid of ToMap > Suggestion 2: May be we need consider adding subclass UniqueStream with > additional method toMap and change return type of Stream.distinct and > Set.stream to UniqueStream. > > What do you think? > > Ali Ebrahimi > > > > > > On Wed, Apr 3, 2013 at 9:57 PM, Brian Goetz > wrote: > > There's been some feedback on lambda-dev and from the recent Lambda > Hack Day on Collectors. There were two big categories: > > 1. Need more / better docs. > > 2. We want some more collectors. > > The first is obvious and we've been working on those. Here are some > suggestions for simple additions to the Collector set. > > - count() (and possibly sum, min, max) > > These are straighforward analogues of the specialized stream > methods; they serve as a "gentle on ramp" to understanding reduction. > > People also expressed concern that the "toMap()" (nee mappedTo, > joiningWith) is not flexible enough. As a reminder, what toMap does > is take a Stream and a function T->U and produces a Map. > Some people call this "backwards"; they would rather have > something that takes a Stream and function T->K and produces a > Map. And others would rather have something that takes two > functions T->K and T->U and produces a Map. > > All of these are useful enough. The question is how to fit them > into the API. I think the name "toMap" is a bit of a challenge, > since there are several "modes" and not all of them can be easily > handled by overloads. Maybe: > > toMap(T->U) // first version > toMap(T->K, T->U) // third version > > and leave the second version out, since the third version can easily > simulate the second? > > From tim at peierls.net Wed Apr 3 11:34:39 2013 From: tim at peierls.net (Tim Peierls) Date: Wed, 3 Apr 2013 14:34:39 -0400 Subject: Additional Collectors In-Reply-To: <515C6EB9.5030301@oracle.com> References: <515C6684.8020007@oracle.com> <515C6EB9.5030301@oracle.com> Message-ID: On Wed, Apr 3, 2013 at 2:02 PM, Brian Goetz wrote: > I think if we do > Map toMap(T->U) > and > Map toMap(T->K, T->U) > > we can call them both toMap and people will get it. I think that's fine, but if you do that it would be really nice to show how to get the "toIndexMap" behavior in the docs for the two-arg toMap. --tim From ali.ebrahimi1781 at gmail.com Wed Apr 3 12:08:24 2013 From: ali.ebrahimi1781 at gmail.com (Ali Ebrahimi) Date: Wed, 3 Apr 2013 23:38:24 +0430 Subject: Additional Collectors In-Reply-To: <515C7598.2010603@oracle.com> References: <515C6684.8020007@oracle.com> <515C7598.2010603@oracle.com> Message-ID: Hi, On Wed, Apr 3, 2013 at 11:01 PM, Brian Goetz wrote: > This should be handled by the two overloadings of toMap. The general one: > > toMap(Function, Supplier, BinaryOperator) > > takes a merge function which resolves duplicates. > > The default form: > > toMap(Function) > > implicitly uses a merge function which throws. The doc says: > > If the input elements contains duplicates > (according to {@link Object#equals(Object)}), an {@code > IllegalStateException} is thrown when the > collection operation is performed. > > for the basic form and documents the use of the merge function for the > more general form. > > Is that not adequate? yes, but this is runtime safe solution. My suggestion was compile time safe. Ali Ebrahimi > > > On 4/3/2013 2:21 PM, Ali Ebrahimi wrote: > >> Hi brian, >> I have concerns about toMap method and this may result in unexpected and >> unpredictable results in user program, and this method only have mean >> for unique collections (Set) and streams (resulted for Stream.distinct). >> Consider this example: >> >> class Entity{ >> int id; //key field >> String name; >> // override equals and hashcode >> .... >> } >> >> Entity foo = new Entity(1, "Foo"); >> Entity bar = new Entity(1, "Bar"); >> List entities = list(new Entity(0, "Some"),foo,..., bar,...) >> >> Map entitymap=entities.stream().**collect(ToMap(e -> >> e.name >> )); >> >> >> what is result of entitymap.get(foo)? "Foo" or "Bar" >> >> Map entitymap2=entities.**parallelStream().collect(** >> ToMap(e >> -> e.name )); >> >> >> what is result of entitymap2.get(foo)? "Foo" or "Bar" >> >> Suggestion1: get rid of ToMap >> Suggestion 2: May be we need consider adding subclass UniqueStream with >> additional method toMap and change return type of Stream.distinct and >> Set.stream to UniqueStream. >> >> What do you think? >> >> Ali Ebrahimi >> >> >> >> >> >> On Wed, Apr 3, 2013 at 9:57 PM, Brian Goetz > > wrote: >> >> There's been some feedback on lambda-dev and from the recent Lambda >> Hack Day on Collectors. There were two big categories: >> >> 1. Need more / better docs. >> >> 2. We want some more collectors. >> >> The first is obvious and we've been working on those. Here are some >> suggestions for simple additions to the Collector set. >> >> - count() (and possibly sum, min, max) >> >> These are straighforward analogues of the specialized stream >> methods; they serve as a "gentle on ramp" to understanding reduction. >> >> People also expressed concern that the "toMap()" (nee mappedTo, >> joiningWith) is not flexible enough. As a reminder, what toMap does >> is take a Stream and a function T->U and produces a Map. >> Some people call this "backwards"; they would rather have >> something that takes a Stream and function T->K and produces a >> Map. And others would rather have something that takes two >> functions T->K and T->U and produces a Map. >> >> All of these are useful enough. The question is how to fit them >> into the API. I think the name "toMap" is a bit of a challenge, >> since there are several "modes" and not all of them can be easily >> handled by overloads. Maybe: >> >> toMap(T->U) // first version >> toMap(T->K, T->U) // third version >> >> and leave the second version out, since the third version can easily >> simulate the second? >> >> >> From brian.goetz at oracle.com Wed Apr 3 12:13:47 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 03 Apr 2013 15:13:47 -0400 Subject: Additional Collectors In-Reply-To: References: <515C6684.8020007@oracle.com> <515C7598.2010603@oracle.com> Message-ID: <515C7F6B.5020908@oracle.com> The short answer is: we have deliberately moved away from trying to capture these properties statically. Not only does having UniqueStream and ParallelStream and SortedStream result in a combinatorial explosion of interfaces (IntUniqueParallelSortedStream), but most of the raw information needed to get static safety does not actually exist in the static type system. For example, it is quite common that you know that an array contains no duplicates or is sorted, but you'd have to do some sort of "cast" to teach the static type system that: Arrays.stream(array) .pretendIAmUnique() .pretendIAmSorted() ... And the pretendXxx() call here is like an unsafe cast; the compiler cannot verify that you are doing so safely. So we're back to dynamic detection, with a little extra documentation, and a monstrous API bloat as the cost of that extra documentation. Instead, we can provide canned merge policies such as "throw", "first wins", and "last wins". On 4/3/2013 3:08 PM, Ali Ebrahimi wrote: > > Hi, > > On Wed, Apr 3, 2013 at 11:01 PM, Brian Goetz > wrote: > > This should be handled by the two overloadings of toMap. The > general one: > > toMap(Function, Supplier, BinaryOperator) > > takes a merge function which resolves duplicates. > > The default form: > > toMap(Function) > > implicitly uses a merge function which throws. The doc says: > > If the input elements contains duplicates > (according to {@link Object#equals(Object)}), an {@code > IllegalStateException} is thrown when the > collection operation is performed. > > for the basic form and documents the use of the merge function for > the more general form. > > Is that not adequate? > > yes, but this is runtime safe solution. My suggestion was compile time safe. > > Ali Ebrahimi > > > > On 4/3/2013 2:21 PM, Ali Ebrahimi wrote: > > Hi brian, > I have concerns about toMap method and this may result in > unexpected and > unpredictable results in user program, and this method only have > mean > for unique collections (Set) and streams (resulted for > Stream.distinct). > Consider this example: > > class Entity{ > int id; //key field > String name; > // override equals and hashcode > .... > } > > Entity foo = new Entity(1, "Foo"); > Entity bar = new Entity(1, "Bar"); > List entities = list(new Entity(0, "Some"),foo,..., bar,...) > > Map entitymap=entities.stream().__collect(ToMap(e > -> e.name > )); > > > what is result of entitymap.get(foo)? "Foo" or "Bar" > > Map > entitymap2=entities.__parallelStream().collect(__ToMap(e > -> e.name )); > > > what is result of entitymap2.get(foo)? "Foo" or "Bar" > > Suggestion1: get rid of ToMap > Suggestion 2: May be we need consider adding subclass > UniqueStream with > additional method toMap and change return type of > Stream.distinct and > Set.stream to UniqueStream. > > What do you think? > > Ali Ebrahimi > > > > > > On Wed, Apr 3, 2013 at 9:57 PM, Brian Goetz > > __>> wrote: > > There's been some feedback on lambda-dev and from the > recent Lambda > Hack Day on Collectors. There were two big categories: > > 1. Need more / better docs. > > 2. We want some more collectors. > > The first is obvious and we've been working on those. Here > are some > suggestions for simple additions to the Collector set. > > - count() (and possibly sum, min, max) > > These are straighforward analogues of the specialized stream > methods; they serve as a "gentle on ramp" to understanding > reduction. > > People also expressed concern that the "toMap()" (nee mappedTo, > joiningWith) is not flexible enough. As a reminder, what > toMap does > is take a Stream and a function T->U and produces a > Map. > Some people call this "backwards"; they would rather have > something that takes a Stream and function T->K and > produces a > Map. And others would rather have something that > takes two > functions T->K and T->U and produces a Map. > > All of these are useful enough. The question is how to fit > them > into the API. I think the name "toMap" is a bit of a > challenge, > since there are several "modes" and not all of them can be > easily > handled by overloads. Maybe: > > toMap(T->U) // first version > toMap(T->K, T->U) // third version > > and leave the second version out, since the third version > can easily > simulate the second? > > > From brian.goetz at oracle.com Wed Apr 3 20:00:25 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 03 Apr 2013 23:00:25 -0400 Subject: unrdered() Message-ID: <515CECC9.6080101@oracle.com> At one point, we had an unordered() op. I think it may be time to bring it back. There are a growing number of ops that have optimized implementations for unordered streams: - distinct can be implemented with concurrent insertion into a CHS instead of merging if we don't care about order. Not only is this less work (merging is expensive), but it makes distinct lazy (elements can flow through immediately once they've not been found in the CHS, instead of waiting for all the elements to be seen.) - sorted is non-stable in unordered streams. - limit/subsequence are far lighter for unordered streams (and can similarly be made lazy) So a way of saying "I know you think this stream has ordering, but I don't care about it" is a way of opting into these optimizations. Implementation is trivial. Adding .unordered() could also enable us to get rid of .collectUnordered(), and allow more of the reduce-like ops to benefit from the embrace of "disorder" without API explosion. From paul.sandoz at oracle.com Thu Apr 4 02:11:51 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 4 Apr 2013 11:11:51 +0200 Subject: unrdered() In-Reply-To: <515CECC9.6080101@oracle.com> References: <515CECC9.6080101@oracle.com> Message-ID: On Apr 4, 2013, at 5:00 AM, Brian Goetz wrote: > At one point, we had an unordered() op. I think it may be time to bring it back. > > There are a growing number of ops that have optimized implementations for unordered streams: > > - distinct can be implemented with concurrent insertion into a CHS instead of merging if we don't care about order. Not only is this less work (merging is expensive), but it makes distinct lazy (elements can flow through immediately once they've not been found in the CHS, instead of waiting for all the elements to be seen.) > > - sorted is non-stable in unordered streams. > > - limit/subsequence are far lighter for unordered streams (and can similarly be made lazy) > > So a way of saying "I know you think this stream has ordering, but I don't care about it" is a way of opting into these optimizations. > Right, we previously thought "well lets just go with what the two ends of the pipeline define in terms of having order and preserving order respectively". AFAICT unordered() would be useful for parallel pipelines with: 1) a source that has order 2) stateful operations that can be optimize if order need not be preserved 3) an order preserving terminal operation and implying unordered() should be declared close to the source. > Implementation is trivial. > > Adding .unordered() could also enable us to get rid of .collectUnordered(), and allow more of the reduce-like ops to benefit from the embrace of "disorder" without API explosion. > Although collectUnordered also back propagates lack of order upstream (just like forEach, or findAny). To remove collectUnordered we would need collectors to define whether they preserve order or not (I see in a recent change set to lambda you started work on that). So a collect(toConcurrentMap()) should back propagate lack of order since CHM is used, but for the supplier version we cannot guarantee that since ConcurrentSkipListMap might be used. Ugh is all a bit complex. Paul. From paul.sandoz at oracle.com Thu Apr 4 05:02:36 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 4 Apr 2013 14:02:36 +0200 Subject: Additional Collectors In-Reply-To: References: <515C6684.8020007@oracle.com> <515C6EB9.5030301@oracle.com> Message-ID: On Apr 3, 2013, at 8:34 PM, Tim Peierls wrote: > On Wed, Apr 3, 2013 at 2:02 PM, Brian Goetz wrote: > >> I think if we do >> Map toMap(T->U) >> and >> Map toMap(T->K, T->U) >> >> we can call them both toMap and people will get it. > > > I think that's fine, but if you do that it would be really nice to show how > to get the "toIndexMap" behavior in the docs for the two-arg toMap. > I quite like the fact, at the moment, that toMap always uses an element as the key and maps element to values. It is an easy rule to remember. Where as groupingBy always requires a classifiying function to map an element to a key, plus many ways to collect elements associated the same key, the canonical one being to collect elements to a list [*]. There seems another basic use-case which is toUniqueIndexMap: Map toUniqueIndexMap(T->U) and then we don't need to merge values of T. If that is required then groupingBy could be used instead. Perhaps documentation-wise it may be helpful to provide examples of how toMap etc could be implemented using groupingBy? Paul. [*] I wondering whether we really need the explicit List returning variants of groupingBy. From joe.bowbeer at gmail.com Thu Apr 4 06:01:09 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 4 Apr 2013 08:01:09 -0500 Subject: unrdered() In-Reply-To: <515CECC9.6080101@oracle.com> References: <515CECC9.6080101@oracle.com> Message-ID: Consider making forEach ordered by default, and relying on unordered() to disable this. On Apr 3, 2013 10:00 PM, "Brian Goetz" wrote: > At one point, we had an unordered() op. I think it may be time to bring > it back. > > There are a growing number of ops that have optimized implementations for > unordered streams: > > - distinct can be implemented with concurrent insertion into a CHS > instead of merging if we don't care about order. Not only is this less > work (merging is expensive), but it makes distinct lazy (elements can flow > through immediately once they've not been found in the CHS, instead of > waiting for all the elements to be seen.) > > - sorted is non-stable in unordered streams. > > - limit/subsequence are far lighter for unordered streams (and can > similarly be made lazy) > > So a way of saying "I know you think this stream has ordering, but I don't > care about it" is a way of opting into these optimizations. > > Implementation is trivial. > > Adding .unordered() could also enable us to get rid of > .collectUnordered(), and allow more of the reduce-like ops to benefit from > the embrace of "disorder" without API explosion. > > > > > From brian.goetz at oracle.com Thu Apr 4 06:56:53 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 04 Apr 2013 09:56:53 -0400 Subject: unrdered() In-Reply-To: References: <515CECC9.6080101@oracle.com> Message-ID: <515D86A5.1030709@oracle.com> > Although collectUnordered also back propagates lack of order upstream > (just like forEach, or findAny). To remove collectUnordered we would > need collectors to define whether they preserve order or not (I see > in a recent change set to lambda you started work on that). With foo.unordered()....collect() vs foo...collectUnordered() unless any of the ops in ... inject order (only candidate I can think of is sort, when you probably really want the order!), it will be unordered for all of the ... ops -- so do we really need the back-propagation? From paul.sandoz at oracle.com Thu Apr 4 07:08:24 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 4 Apr 2013 16:08:24 +0200 Subject: unrdered() In-Reply-To: <515D86A5.1030709@oracle.com> References: <515CECC9.6080101@oracle.com> <515D86A5.1030709@oracle.com> Message-ID: On Apr 4, 2013, at 3:56 PM, Brian Goetz wrote: >> Although collectUnordered also back propagates lack of order upstream >> (just like forEach, or findAny). To remove collectUnordered we would >> need collectors to define whether they preserve order or not (I see >> in a recent change set to lambda you started work on that). > > With > foo.unordered()....collect() > vs > foo...collectUnordered() > > unless any of the ops in ... inject order (only candidate I can think of is sort, when you probably really want the order!), it will be unordered for all of the ... ops -- so do we really need the back-propagation? > I was thinking the same thing, we can get rid of the back propagation. It is complex, plus annoying to implement :-) We can then also achieve what Joe proposes with forEach. Paul. From brian.goetz at oracle.com Thu Apr 4 07:18:32 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 04 Apr 2013 10:18:32 -0400 Subject: unrdered() In-Reply-To: References: <515CECC9.6080101@oracle.com> Message-ID: <515D8BB8.6080601@oracle.com> > Consider making forEach ordered by default, and relying on unordered() > to disable this. We did consider this, and it is weird that this the only terminal that has unordered as its behavior, but I think the current behavior is right. If someone does: seqStream.forEach(action) They will expect that the action is performed sequentially in the calling thread. If they do: seqStream.parallel().forEach(action) I believe they will (reasonably) expect the action to happen in parallel across threads. Constraining to encounter order gives up the vast majority of the parallelism. I think if we did this people would say "parallel streams don't work." Separately, Paul quite correctly points out that back-propagating unordered from the terminal is a pain. In: seqStream.parallel().distinct().forEach() Here, since the forEach will be unordered, there's no point in doing the more expensive ordered processing for distinct. This only shows up for parallel pipelines with stateful operations. In that case, we can walk backwards injecting unordered, but have to stop when we hit a short-circuit operation. (Though we could just omit this for now, its just an optimization.) From howard.lovatt at gmail.com Thu Apr 4 15:55:20 2013 From: howard.lovatt at gmail.com (Howard Lovatt) Date: Fri, 5 Apr 2013 09:55:20 +1100 Subject: unrdered() In-Reply-To: <515D8BB8.6080601@oracle.com> References: <515CECC9.6080101@oracle.com> <515D8BB8.6080601@oracle.com> Message-ID: See comments on sequential streams. A Limits interface could provide an unordered method that defaults to true and if you want to force ordered you override to return true. Sent from my iPad On 05/04/2013, at 1:18 AM, Brian Goetz wrote: >> Consider making forEach ordered by default, and relying on unordered() >> to disable this. > > We did consider this, and it is weird that this the only terminal that has unordered as its behavior, but I think the current behavior is right. > > If someone does: > > seqStream.forEach(action) > > They will expect that the action is performed sequentially in the calling thread. If they do: > > seqStream.parallel().forEach(action) > > I believe they will (reasonably) expect the action to happen in parallel across threads. Constraining to encounter order gives up the vast majority of the parallelism. I think if we did this people would say "parallel streams don't work." > > > Separately, Paul quite correctly points out that back-propagating unordered from the terminal is a pain. In: > > seqStream.parallel().distinct().forEach() > > Here, since the forEach will be unordered, there's no point in doing the more expensive ordered processing for distinct. This only shows up for parallel pipelines with stateful operations. In that case, we can walk backwards injecting unordered, but have to stop when we hit a short-circuit operation. (Though we could just omit this for now, its just an optimization.) From howard.lovatt at gmail.com Thu Apr 4 16:51:03 2013 From: howard.lovatt at gmail.com (Howard Lovatt) Date: Fri, 5 Apr 2013 10:51:03 +1100 Subject: Consumers and Suppliers Message-ID: If the method in Consumer was called put and the method in Supplier was called take, then Queue could be retrofitted to extend both Consumer and Supplier and then many collections could be Consumers and Suppliers. I think this would be useful - what do others think? -- Howard. From david.holmes at oracle.com Thu Apr 4 18:54:48 2013 From: david.holmes at oracle.com (David Holmes) Date: Fri, 05 Apr 2013 11:54:48 +1000 Subject: Consumers and Suppliers In-Reply-To: References: Message-ID: <515E2EE8.7070102@oracle.com> On 5/04/2013 9:51 AM, Howard Lovatt wrote: > If the method in Consumer was called put and the method in Supplier was > called take, then Queue could be retrofitted to extend both Consumer and > Supplier and then many collections could be Consumers and Suppliers. > > I think this would be useful - what do others think? I would find that rather confusing because in the common producer/consumer sense the producer puts things into the queue and the consumer removes them from the queue. The queue itself is not considered to be producer/supplier nor consumer. David > -- Howard. > From howard.lovatt at gmail.com Fri Apr 5 00:15:12 2013 From: howard.lovatt at gmail.com (Howard Lovatt) Date: Fri, 5 Apr 2013 18:15:12 +1100 Subject: Consumers and Suppliers In-Reply-To: <515E2EE8.7070102@oracle.com> References: <515E2EE8.7070102@oracle.com> Message-ID: <9D92D285-1280-4782-8137-1E51999C921D@gmail.com> I didn't mean that the collection was typically simultaneously a Consumer and a Supplier, I meant that it was typically one or the other (I should have been clearer in what I said). In my own collection library I do this and I find it very convenient. Sent from my iPad On 05/04/2013, at 12:54 PM, David Holmes wrote: > On 5/04/2013 9:51 AM, Howard Lovatt wrote: >> If the method in Consumer was called put and the method in Supplier was >> called take, then Queue could be retrofitted to extend both Consumer and >> Supplier and then many collections could be Consumers and Suppliers. >> >> I think this would be useful - what do others think? > > I would find that rather confusing because in the common producer/consumer sense the producer puts things into the queue and the consumer removes them from the queue. The queue itself is not considered to be producer/supplier nor consumer. > > David > >> -- Howard. >> From brian.goetz at oracle.com Fri Apr 5 06:21:39 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 05 Apr 2013 09:21:39 -0400 Subject: Consumers and Suppliers In-Reply-To: References: <515E2EE8.7070102@oracle.com> <9D92D285-1280-4782-8137-1E51999C921D@gmail.com> Message-ID: <515ECFE3.5080107@oracle.com> Also, this request is kind of letting the tail (the way methods happening to be named in one particular collection implementation) wag the dog (names for function types.) The java.util.function types are for naming *functions*. A queue is not a function. On 4/5/2013 9:17 AM, Vitaly Davidovich wrote: > I'm with David on this one. It's not that a queue is both at same time, > but rather it's neither - it's just a handoff/communication channel between > producers/consumers. > > I'd say a queue can be a message sink (receives) and message source > (returns), but supplier/consumer is too close of a name to what people > typically think of in that case, and it's not the queue. > > My $.02 > > Thanks > > Sent from my phone > I didn't mean that the collection was typically simultaneously a Consumer > and a Supplier, I meant that it was typically one or the other (I should > have been clearer in what I said). In my own collection library I do this > and I find it very convenient. > > Sent from my iPad > > On 05/04/2013, at 12:54 PM, David Holmes wrote: > >> On 5/04/2013 9:51 AM, Howard Lovatt wrote: >>> If the method in Consumer was called put and the method in Supplier was >>> called take, then Queue could be retrofitted to extend both Consumer and >>> Supplier and then many collections could be Consumers and Suppliers. >>> >>> I think this would be useful - what do others think? >> >> I would find that rather confusing because in the common > producer/consumer sense the producer puts things into the queue and the > consumer removes them from the queue. The queue itself is not considered to > be producer/supplier nor consumer. >> >> David >> >>> -- Howard. >>> > From brian.goetz at oracle.com Fri Apr 5 09:28:16 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 05 Apr 2013 12:28:16 -0400 Subject: Fwd: hg: lambda/lambda/jdk: Add .unordered() operation; eliminate .collectUnordered() In-Reply-To: <20130405160957.BAADE480D2@hg.openjdk.java.net> References: <20130405160957.BAADE480D2@hg.openjdk.java.net> Message-ID: <515EFBA0.3010804@oracle.com> Stream.collectUnordered has been removed in favor of a more general .unordered() method (which may be a no-op if the stream is already unordered.) This allows more stateful and terminal ops to gain the benefit of opting out of ordering. The collect(Collector) method currently performs a concurrent collection when all of the following are true: - the stream is parallel - the collector is concurrent - the collector is unordered OR the stream is unordered Currently the groupingByConcurrent / toConcurrentMap collectors are not UNORDERED. Meaning that users still have to have an unordered source (or ask for unordered explicitly) to get concurrent collection. I'm currently working through what it would look like if this were reversed, and these collectors declared UNORDERED. -------- Original Message -------- Subject: hg: lambda/lambda/jdk: Add .unordered() operation; eliminate .collectUnordered() Date: Fri, 05 Apr 2013 16:09:44 +0000 From: brian.goetz at oracle.com To: lambda-dev at openjdk.java.net Changeset: adc363b47e78 Author: briangoetz Date: 2013-04-05 12:09 -0400 URL: http://hg.openjdk.java.net/lambda/lambda/jdk/rev/adc363b47e78 Add .unordered() operation; eliminate .collectUnordered() ! src/share/classes/java/util/stream/AbstractPipeline.java ! src/share/classes/java/util/stream/BaseStream.java ! src/share/classes/java/util/stream/Collector.java ! src/share/classes/java/util/stream/DelegatingStream.java ! src/share/classes/java/util/stream/DoublePipeline.java ! src/share/classes/java/util/stream/IntPipeline.java ! src/share/classes/java/util/stream/LongPipeline.java ! src/share/classes/java/util/stream/ReferencePipeline.java ! src/share/classes/java/util/stream/Stream.java ! test-ng/bootlib/java/util/stream/OpTestCase.java ! test-ng/bootlib/java/util/stream/StreamTestData.java ! test-ng/tests/org/openjdk/tests/java/util/stream/TabulatorsTest.java From zhong.j.yu at gmail.com Fri Apr 5 10:27:28 2013 From: zhong.j.yu at gmail.com (Zhong Yu) Date: Fri, 5 Apr 2013 12:27:28 -0500 Subject: Consumers and Suppliers In-Reply-To: References: Message-ID: If you need to address a queue as a Consumer/Supplier, anything wrong with queue::put and queue:take? About the choice of name of "Consumer", I seriously hate it. It has a connotation that the ownership of the object is transferred, and object might be dissolved. In many of my use cases, "consumer" simply sounds grotesque and totally misleading. I wish the choice was Procedure instead. (I'm just venting, not asking for change.) Zhong Yu On Thu, Apr 4, 2013 at 6:51 PM, Howard Lovatt wrote: > If the method in Consumer was called put and the method in Supplier was > called take, then Queue could be retrofitted to extend both Consumer and > Supplier and then many collections could be Consumers and Suppliers. > > I think this would be useful - what do others think? > > -- Howard. > From brian.goetz at oracle.com Fri Apr 5 10:36:42 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 05 Apr 2013 13:36:42 -0400 Subject: Consumers and Suppliers In-Reply-To: References: Message-ID: <515F0BAA.6010601@oracle.com> > About the choice of name of "Consumer", I seriously hate it. It has a > connotation that the ownership of the object is transferred, and object > might be dissolved. In many of my use cases, "consumer" simply sounds > grotesque and totally misleading. I wish the choice was Procedure instead. > (I'm just venting, not asking for change.) Yes, finding names was very hard. Suffice it to say that there was no name considered that *someone* didn't hate. From brian.goetz at oracle.com Fri Apr 5 12:47:35 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 05 Apr 2013 15:47:35 -0400 Subject: API and spec review for Stream In-Reply-To: <514CD7E0.6030102@oracle.com> References: <514CD7E0.6030102@oracle.com> Message-ID: <515F2A57.8080507@oracle.com> Updated based on comments from last survey. New survey is up at: https://www.surveymonkey.com/s/VQ8MYBN Includes Stream, IntStream, LongStream, DoubleStream, and a rough version of package doc. On 3/22/2013 6:14 PM, Brian Goetz wrote: > I have posted a survey at: > https://www.surveymonkey.com/s/59CTHS8 > > This is a hopefully-final review of the API and preliminary review of > the specification for the single class Stream. Docs are linked from the > survey. Usual password. Any and all constructive comments welcome. > > It is known that the specs are incomplete; what is here is a start. > Suggestions for improvement are welcome. From brian.goetz at oracle.com Fri Apr 5 12:49:41 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 05 Apr 2013 15:49:41 -0400 Subject: API and spec review for Collector Message-ID: <515F2AD5.90007@oracle.com> I have posted a survey at: https://www.surveymonkey.com/s/VWC55PD This is a review for the Collector API. (Collectors will be separate.) From brian.goetz at oracle.com Fri Apr 5 12:51:32 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 05 Apr 2013 15:51:32 -0400 Subject: API and spec review for FlatMapper Message-ID: <515F2B44.2000901@oracle.com> I have posted a survey at: https://www.surveymonkey.com/s/VW5TZ5W for the FlatMapper API. From brian.goetz at oracle.com Fri Apr 5 13:00:58 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 05 Apr 2013 16:00:58 -0400 Subject: Concurrent / unordered collection Message-ID: <515F2D7A.4030109@oracle.com> We've had some improvements in the model for managing ordering, so its time to take a look at whether these can flow into Collector as well. - There is now an .unordered() operation, and we removed the special-purpose .collectUnordered(). - Terminal operations can have flags/characteristics just like intermediate operations. The allowable flags for a terminal operation are ORDERED/NOT_ORDERED and SHORT_CIRCUIT. The unordered status can back-propagate from a terminal up the pipeline: stream.distinct().forEach(...) In the above, ordinarily distinct would be constrained to encounter order. But, because we know there is an unordered forEach operation coming downstream, we can back-propagate UNORDERED up the chain, enabling the more efficient unordered version of distinct(). - The Collector API has been enhanced with characteristic flags, just like Spliterator and Stream. Defined characteristics include CONCURRENT and UNORDERED. UNORDERED-ness of a Collector flows into the terminal flags of a collect() operation. So, for example, toSet() is an unordered collector. Until now, you only got a concurrent reduction when BOTH of the following were true: - the Collector is CONCURRENT - the source is unordered This was because a concurrent collection fundamentally interferes with encounter order. So if a user did: stream.collect(groupingByConcurrent()) they would NOT get a concurrent collection because the stream is ordered. But, I think this may be surprising to users. Now that a Collector can indicate that it is UNORDERED, I think we should consider making the concurrent-map collectors UNORDERED. So if a user says: stream.collect(groupingByConcurrent()) he truly gets a concurrent collection. If he is surprised that this is an unorderd collection, it is an opportunity to learn more about ordering. But I think this is more consistent with user expectations and we can now more easily represent this in the API. From brian.goetz at oracle.com Sun Apr 7 15:47:15 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 07 Apr 2013 18:47:15 -0400 Subject: Whither FlatMapper? Message-ID: <5161F773.6050705@oracle.com> I started to work through the survey comments on FlatMapper, which amounted to "hate the name", "need more examples", "hard to understand." As I started to write more examples, and consider some of the things that have changed in the implementation recently, I am starting to think that maybe now we *can* actually get away with only the "obvious" (but still less performant) form. What people think they want is: flatMap(T -> Stream) And, in a perfect world, they would be right. The reason this has historically been a bad fit is that the performance cost of this version over the "explicit" version was enormous. (It was merely bad for the "I already have a collection lying around" case, but horrible for the "I am generating values" case.) But, a lot has happened recently in the implementation. Previously, each *iteration* would have generated a Spliterator, a Supplier, a Pipeline, a PipelineHelper, and a ForEachTask -- just to pass the values down the stream. Since then, the supplier and helper are gone, the spliterator can likely be merged with the pipeline, and the forEach eliminated in most cases. And there is still quite a bit more running room to further decrease the cost of building small streams. There's a dozen small things we can do -- many implementation-only, but some are small API additions (such as singletonStream(T)) -- to bring this cost down further. Even with the general forms available, almost no one understands how they work, and even those who figure it out still can't figure out why they would want it. The pretty version is just so attractive that no one is willing to believe that it is painfully slow compared to the ugly version. Given that this adds seven new SAMs (a significant fraction of the public API surface area of java.util.stream), I'm having second thoughts on including these now. So, concrete proposal: - Drop all FlatMapper.* SAMs; - Drop all forms of flatMap(FlatMapper*) - Add back flatMapToXxx(Function References: <5161F773.6050705@oracle.com> Message-ID: I'm a big fan of the current FlatMapper stuff that takes a Consumer. Much more efficient and straightforward when you don't have a stream or collection to just return. Here is some code that uses 3 of them for good effect: https://github.com/spullara/twitterprocessor/blob/master/src/main/java/twitterprocessor/App.java On Sun, Apr 7, 2013 at 3:47 PM, Brian Goetz wrote: > I started to work through the survey comments on FlatMapper, which > amounted to "hate the name", "need more examples", "hard to understand." > As I started to write more examples, and consider some of the things that > have changed in the implementation recently, I am starting to think that > maybe now we *can* actually get away with only the "obvious" (but still > less performant) form. > > What people think they want is: > > flatMap(T -> Stream) > > And, in a perfect world, they would be right. The reason this has > historically been a bad fit is that the performance cost of this version > over the "explicit" version was enormous. (It was merely bad for the "I > already have a collection lying around" case, but horrible for the "I am > generating values" case.) > > But, a lot has happened recently in the implementation. Previously, each > *iteration* would have generated a Spliterator, a Supplier, a > Pipeline, a PipelineHelper, and a ForEachTask -- just to pass the values > down the stream. Since then, the supplier and helper are gone, the > spliterator can likely be merged with the pipeline, and the forEach > eliminated in most cases. And there is still quite a bit more running room > to further decrease the cost of building small streams. There's a dozen > small things we can do -- many implementation-only, but some are small API > additions (such as singletonStream(T)) -- to bring this cost down further. > > Even with the general forms available, almost no one understands how they > work, and even those who figure it out still can't figure out why they > would want it. The pretty version is just so attractive that no one is > willing to believe that it is painfully slow compared to the ugly version. > Given that this adds seven new SAMs (a significant fraction of the public > API surface area of java.util.stream), I'm having second thoughts on > including these now. > > So, concrete proposal: > - Drop all FlatMapper.* SAMs; > - Drop all forms of flatMap(FlatMapper*) > - Add back flatMapToXxx(Function > From alahijani at gmail.com Sun Apr 7 18:24:53 2013 From: alahijani at gmail.com (Ali Lahijani) Date: Mon, 8 Apr 2013 05:54:53 +0430 Subject: Whither FlatMapper? Message-ID: On Mon, Apr 8, 2013 at 3:17 AM, Brian Goetz wrote: > I started to work through the survey comments on FlatMapper, which > amounted to "hate the name", "need more examples", "hard to understand." > As I started to write more examples, and consider some of the things that > have changed in the implementation recently, I am starting to think that > maybe now we *can* actually get away with only the "obvious" (but still > less performant) form. > > What people think they want is: > > flatMap(T -> Stream) > > And, in a perfect world, they would be right. The reason this has > historically been a bad fit is that the performance cost of this version > over the "explicit" version was enormous. (It was merely bad for the "I > already have a collection lying around" case, but horrible for the "I am > generating values" case.) > > But, a lot has happened recently in the implementation. Previously, each > *iteration* would have generated a Spliterator, a Supplier, a > Pipeline, a PipelineHelper, and a ForEachTask -- just to pass the values > down the stream. Since then, the supplier and helper are gone, the > spliterator can likely be merged with the pipeline, and the forEach > eliminated in most cases. And there is still quite a bit more running room > to further decrease the cost of building small streams. There's a dozen > small things we can do -- many implementation-only, but some are small API > additions (such as singletonStream(T)) -- to bring this cost down further. > > Even with the general forms available, almost no one understands how they > work, and even those who figure it out still can't figure out why they > would want it. The pretty version is just so attractive that no one is > willing to believe that it is painfully slow compared to the ugly version. > Given that this adds seven new SAMs (a significant fraction of the public > API surface area of java.util.stream), I'm having second thoughts on > including these now. > > So, concrete proposal: > - Drop all FlatMapper.* SAMs; > - Drop all forms of flatMap(FlatMapper*) > - Add back flatMapToXxx(Function > Or you can keep both forms, and do it in an elegant way. Define an abstraction, Generator, for anything that supports forEach(): interface Generator { void forEach(Consumer c); } Trivially, all Collections and Streams are Generators. But you can also define Generators in push mode: Generator g = (c) -> { for (int i = 0; i < 100; 1++) { c.accept(i); } } Now: - Add flatMapToXxx(Function) to Stream The body of the function can return a Collection, s.flatMapToInt(e -> Arrays.asList(e, e+1, e+2)); or a Stream, s.flatMapToInt(e -> Arrays.asList(e, e+1, e+2).stream()); But since Generator is itself a functional interface, advanced users like Sam can return a Generator: s.flatMapToInt(e -> c -> { c.accept(e); c.accept(e + 1); c.accept(e + 2); }) which might be more efficient. Of course you can - Drop all FlatMapper.* SAMs; - Drop all forms of flatMap(FlatMapper*) Best From howard.lovatt at gmail.com Sun Apr 7 21:10:19 2013 From: howard.lovatt at gmail.com (Howard Lovatt) Date: Mon, 8 Apr 2013 14:10:19 +1000 Subject: Whither FlatMapper? In-Reply-To: References: <5161F773.6050705@oracle.com> Message-ID: I am also a fan of the Consumer form for efficiency reasons. On 8 April 2013 09:01, Sam Pullara wrote: > I'm a big fan of the current FlatMapper stuff that takes a Consumer. Much > more efficient and straightforward when you don't have a stream or > collection to just return. Here is some code that uses 3 of them for good > effect: > > > https://github.com/spullara/twitterprocessor/blob/master/src/main/java/twitterprocessor/App.java > > > On Sun, Apr 7, 2013 at 3:47 PM, Brian Goetz > wrote: > > > I started to work through the survey comments on FlatMapper, which > > amounted to "hate the name", "need more examples", "hard to understand." > > As I started to write more examples, and consider some of the things > that > > have changed in the implementation recently, I am starting to think that > > maybe now we *can* actually get away with only the "obvious" (but still > > less performant) form. > > > > What people think they want is: > > > > flatMap(T -> Stream) > > > > And, in a perfect world, they would be right. The reason this has > > historically been a bad fit is that the performance cost of this version > > over the "explicit" version was enormous. (It was merely bad for the "I > > already have a collection lying around" case, but horrible for the "I am > > generating values" case.) > > > > But, a lot has happened recently in the implementation. Previously, each > > *iteration* would have generated a Spliterator, a Supplier, > a > > Pipeline, a PipelineHelper, and a ForEachTask -- just to pass the values > > down the stream. Since then, the supplier and helper are gone, the > > spliterator can likely be merged with the pipeline, and the forEach > > eliminated in most cases. And there is still quite a bit more running > room > > to further decrease the cost of building small streams. There's a dozen > > small things we can do -- many implementation-only, but some are small > API > > additions (such as singletonStream(T)) -- to bring this cost down > further. > > > > Even with the general forms available, almost no one understands how they > > work, and even those who figure it out still can't figure out why they > > would want it. The pretty version is just so attractive that no one is > > willing to believe that it is painfully slow compared to the ugly > version. > > Given that this adds seven new SAMs (a significant fraction of the > public > > API surface area of java.util.stream), I'm having second thoughts on > > including these now. > > > > So, concrete proposal: > > - Drop all FlatMapper.* SAMs; > > - Drop all forms of flatMap(FlatMapper*) > > - Add back flatMapToXxx(Function > > > > -- -- Howard. From vitalyd at gmail.com Fri Apr 5 06:17:35 2013 From: vitalyd at gmail.com (Vitaly Davidovich) Date: Fri, 5 Apr 2013 09:17:35 -0400 Subject: Consumers and Suppliers In-Reply-To: <9D92D285-1280-4782-8137-1E51999C921D@gmail.com> References: <515E2EE8.7070102@oracle.com> <9D92D285-1280-4782-8137-1E51999C921D@gmail.com> Message-ID: I'm with David on this one. It's not that a queue is both at same time, but rather it's neither - it's just a handoff/communication channel between producers/consumers. I'd say a queue can be a message sink (receives) and message source (returns), but supplier/consumer is too close of a name to what people typically think of in that case, and it's not the queue. My $.02 Thanks Sent from my phone I didn't mean that the collection was typically simultaneously a Consumer and a Supplier, I meant that it was typically one or the other (I should have been clearer in what I said). In my own collection library I do this and I find it very convenient. Sent from my iPad On 05/04/2013, at 12:54 PM, David Holmes wrote: > On 5/04/2013 9:51 AM, Howard Lovatt wrote: >> If the method in Consumer was called put and the method in Supplier was >> called take, then Queue could be retrofitted to extend both Consumer and >> Supplier and then many collections could be Consumers and Suppliers. >> >> I think this would be useful - what do others think? > > I would find that rather confusing because in the common producer/consumer sense the producer puts things into the queue and the consumer removes them from the queue. The queue itself is not considered to be producer/supplier nor consumer. > > David > >> -- Howard. >> From jed at wesleysmith.io Sat Apr 6 18:25:01 2013 From: jed at wesleysmith.io (Jed Wesley-Smith) Date: Sun, 7 Apr 2013 11:25:01 +1000 Subject: Consumers and Suppliers In-Reply-To: <515F0BAA.6010601@oracle.com> References: <515F0BAA.6010601@oracle.com> Message-ID: We call it Effect, because the only possible thing a method that returns void can do, is side-effects. https://bitbucket.org/atlassian/fugue/src/master/src/main/java/com/atlassian/fugue/Effect.java On 6 April 2013 04:36, Brian Goetz wrote: >> About the choice of name of "Consumer", I seriously hate it. It has a >> connotation that the ownership of the object is transferred, and object >> might be dissolved. In many of my use cases, "consumer" simply sounds >> grotesque and totally misleading. I wish the choice was Procedure instead. >> (I'm just venting, not asking for change.) > > Yes, finding names was very hard. Suffice it to say that there was no > name considered that *someone* didn't hate. > From pbenedict at apache.org Mon Apr 8 07:32:10 2013 From: pbenedict at apache.org (Paul Benedict) Date: Mon, 8 Apr 2013 09:32:10 -0500 Subject: Surveys Message-ID: When the EG gets asked to do a survey, it's impossible for observers to see what the survey is (it's password protected). Okay, but you at least post a copy of the survey in the mailing list so we can "observe" what is being asked? Thank you. From brian.goetz at oracle.com Mon Apr 8 12:08:49 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 15:08:49 -0400 Subject: Setting of UNORDERED on concurrent collectors Message-ID: <516315C1.3080509@oracle.com> Now that we've removed collectUnordered in favor of a more general unordered() op, we should consider what should be the default behavior for: orderedStream.collect(groupingByConcurrent(f)) Currently, the collect-to-ConcurrentMap collectors are *not* defined as UNORDERED. Which means, if the stream is ordered, we will attempt to do an ordered collection anyway, which is incompatible with concurrent collection, and we will do the plain old partition-and-merge with ConcurrentMap. Here, we have competing evidence for the user intent. On the one hand, the stream is ordered, and the user could have chosen unordered. On the other, the user has asked for concurrent grouping. Its not 100% obvious which should win. On the other hand, ordered map collections are so awful that they will almost certainly be unhappy with the performance if they forget to say unordered here in the parallel case (and it makes no difference in the sequential case.) So I'm inclined to make groupingByConcurrent / toConcurrentMap be UNORDERED collections. From dl at cs.oswego.edu Mon Apr 8 12:27:53 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 08 Apr 2013 15:27:53 -0400 Subject: Whither FlatMapper? In-Reply-To: References: <5161F773.6050705@oracle.com> Message-ID: <51631A39.30001@cs.oswego.edu> On 04/07/13 19:01, Sam Pullara wrote: > I'm a big fan of the current FlatMapper stuff that takes a Consumer. Much more > efficient and straightforward when you don't have a stream or collection to just > return. Here is some code that uses 3 of them for good effect: I think the main issue is whether, given the user reactions so far, we should insist on people using a generally better but non-obvious approach to flat-mapping. Considering that anyone *could* write their own FlatMappers layered on top of existing functionality (we could even show how to do it as a code example somewhere), I'm with Brian on this: give people the obvious forms in the API. People who are most likely to use it are the least likely to be obsessive about its performance. And when they are, they can learn about alternatives. -Doug From joe.bowbeer at gmail.com Mon Apr 8 12:36:37 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 8 Apr 2013 12:36:37 -0700 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: <516315C1.3080509@oracle.com> References: <516315C1.3080509@oracle.com> Message-ID: What is groupingByConcurrent good for? What's the difference between parallel and concurrent in this context? I've re-read the last 5 emails that mention groupingByConcurrent and it is not clear to me what's going on. The most succinct indication of its function is: The collect(Collector) method currently performs a concurrent collection > when all of the following are true: > - the stream is parallel > - the collector is *concurrent* > - the collector is unordered OR the stream is unordered In other words, *if* I happen to use groupingByConcurrent *then* maybe a concurrent collection will be performed, but maybe not, depending on a couple other factors... Can we make this simpler and more intuitive/predictable? I realize that's what you're addressing now, but can't we go a lot farther? Can we, say, get rid of groupingByConcurrent and just assume that if the stream is parallel? What do we lose? Do we lose any functionality that can't be derived another way? Please educate me! --Joe On Mon, Apr 8, 2013 at 12:08 PM, Brian Goetz wrote: > Now that we've removed collectUnordered in favor of a more general > unordered() op, we should consider what should be the default behavior for: > > orderedStream.collect(**groupingByConcurrent(f)) > > Currently, the collect-to-ConcurrentMap collectors are *not* defined as > UNORDERED. Which means, if the stream is ordered, we will attempt to do an > ordered collection anyway, which is incompatible with concurrent > collection, and we will do the plain old partition-and-merge with > ConcurrentMap. > > Here, we have competing evidence for the user intent. On the one hand, > the stream is ordered, and the user could have chosen unordered. On the > other, the user has asked for concurrent grouping. Its not 100% obvious > which should win. > > On the other hand, ordered map collections are so awful that they will > almost certainly be unhappy with the performance if they forget to say > unordered here in the parallel case (and it makes no difference in the > sequential case.) So I'm inclined to make groupingByConcurrent / > toConcurrentMap be UNORDERED collections. > From brian.goetz at oracle.com Mon Apr 8 12:50:48 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 15:50:48 -0400 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: References: <516315C1.3080509@oracle.com> Message-ID: <51631F98.8030304@oracle.com> > What is groupingByConcurrent good for? What's the difference between > parallel and concurrent in this context? For sequential streams, concurrent is irrelevant. So this is only relevant for parallel streams. When doing a reduction on a parallel stream, there are two obvious ways to do it: - Partition the input, reduce the chunks separately into isolated subresults, then combine the subresults "up the tree" into a complete result (call this a "traditional" parallel reduction) - Use some sort of thread-safe combiner, and blast input elements at some shared combiner from all threads (call this a "concurrent" reduction.) This is more like a forEach than a reduce. Requirements for this to be safe include: the combiner must be thread-safe, and the user must not care about order, since there's no telling in what order the elements will be blasted. When the reduction is a groupBy into a Map, this can make a big difference because of the merging performance of HashMap. The traditional reduction looks like this: - create a HashMap per partition - Insert the elements of this partition into the HashMap - Go up the tree, merging two HashMaps into one. This involves iterating a key-by-key merge. This is slow. The concurrent reduction looks like: - create one ConcurrentHashMap - Blast elements into it using atomic methods like putIfAbsent - Return that, no merging In most reasonable cases, concurrent parallel reduction with CHM blows away traditional parallel reduction with HashMap. On the other hand, one of the casualties of the concurrent approach is ordering. If your input is (ordered): [ 1, 2, 3, 4, 5, 6, 7, 8 ] and your classifier function is: e % 2 then the traditional approach must yield: { 0 => [ 2, 4, 6, 8 ], 1 => [ 1, 3, 5, 7] } but the concurrent approach could yield: { 0 => [ 6, 2, 4, 8 ], 1 => [ 7, 1, 3, 5 ] } So the question is, when confronted with an obvious desire to use a concurrent-safe collector, do we infer that the user must not care about ordering? > The most succinct indication of its function is: > > The collect(Collector) method currently performs a > concurrent collection when all of the following are true: > - the stream is parallel > - the collector is *concurrent* > - the collector is unordered OR the stream is unordered This is the current rule about whether or not collect() does a concurrent reduction. My question here is whether we wish to make our existing concurrent collectors always be unordered, so that the last bullet is trivially satisfied for the built-in concurrent collectors. > In other words, *if* I happen to use groupingByConcurrent *then* maybe a > concurrent collection will be performed, but maybe not, depending on a > couple other factors... That is the current state of affairs. > Can we make this simpler and more intuitive/predictable? I realize > that's what you're addressing now, but can't we go a lot farther? > > Can we, say, get rid of groupingByConcurrent and just assume that if the > stream is parallel? What do we lose? Do we lose any functionality that > can't be derived another way? That would cause us to access a non-thread-safe HashMap concurrently from multiple threads. From brian.goetz at oracle.com Mon Apr 8 13:05:45 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 16:05:45 -0400 Subject: Whither FlatMapper? In-Reply-To: <51631A39.30001@cs.oswego.edu> References: <5161F773.6050705@oracle.com> <51631A39.30001@cs.oswego.edu> Message-ID: <51632319.4040704@oracle.com> A slight correction: if we remove the flatMap(FlatMapper), there is no fluent form that is as efficient as the removed form that accepts (T, Consumer), since there's no other way to get your hands on the downstream Sink. (Not that this dampens my enthusiasm for removing it much.) For the truly diffident, a middle ground does exist: remove FlatMapper and its six brothers as a named SAM, and replace it with BiConsumer>, leaving both forms of flatMap methods in place: flatMap(Function>) flapMap(BiConsumer>) The main advantage being that the package javadoc is not polluted by seven forms of FlatMapper. On 4/8/2013 3:27 PM, Doug Lea wrote: > On 04/07/13 19:01, Sam Pullara wrote: >> I'm a big fan of the current FlatMapper stuff that takes a Consumer. >> Much more >> efficient and straightforward when you don't have a stream or >> collection to just >> return. Here is some code that uses 3 of them for good effect: > > I think the main issue is whether, given the user reactions so far, we > should insist on people using a generally better but non-obvious > approach to flat-mapping. Considering that anyone *could* write their own > FlatMappers layered on top of existing functionality (we could > even show how to do it as a code example somewhere), I'm with > Brian on this: give people the obvious forms in the API. People > who are most likely to use it are the least likely to be obsessive > about its performance. And when they are, they can learn about > alternatives. > > -Doug > From joe.bowbeer at gmail.com Mon Apr 8 13:07:40 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 8 Apr 2013 13:07:40 -0700 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: <51631F98.8030304@oracle.com> References: <516315C1.3080509@oracle.com> <51631F98.8030304@oracle.com> Message-ID: > > > That would cause us to access a non-thread-safe HashMap concurrently from > multiple threads. I assumed that a concurrent collection would use a concurrent map. Isn't it reasonable to assume that operations on a parallel stream will use thread-safe collections? BTW, the other downside of the current state of affairs is experienced by the user who specifies a parallel stream and even declares it unordered, but still gets a non-concurrent collection because groupingBy was used instead of groupingByConcurrent. In your examples, the difference between the two results is primarily one of order, not concurrency. Can we reflect this choice more directly in the API? Joe On Mon, Apr 8, 2013 at 12:50 PM, Brian Goetz wrote: > What is groupingByConcurrent good for? What's the difference between >> parallel and concurrent in this context? >> > > For sequential streams, concurrent is irrelevant. So this is only > relevant for parallel streams. > > When doing a reduction on a parallel stream, there are two obvious ways to > do it: > > - Partition the input, reduce the chunks separately into isolated > subresults, then combine the subresults "up the tree" into a complete > result (call this a "traditional" parallel reduction) > > - Use some sort of thread-safe combiner, and blast input elements at some > shared combiner from all threads (call this a "concurrent" reduction.) > This is more like a forEach than a reduce. Requirements for this to be > safe include: the combiner must be thread-safe, and the user must not care > about order, since there's no telling in what order the elements will be > blasted. > > When the reduction is a groupBy into a Map, this can make a big difference > because of the merging performance of HashMap. > > The traditional reduction looks like this: > - create a HashMap per partition > - Insert the elements of this partition into the HashMap > - Go up the tree, merging two HashMaps into one. This involves iterating > a key-by-key merge. This is slow. > > The concurrent reduction looks like: > - create one ConcurrentHashMap > - Blast elements into it using atomic methods like putIfAbsent > - Return that, no merging > > In most reasonable cases, concurrent parallel reduction with CHM blows > away traditional parallel reduction with HashMap. On the other hand, one > of the casualties of the concurrent approach is ordering. > > If your input is (ordered): > > [ 1, 2, 3, 4, 5, 6, 7, 8 ] > > and your classifier function is: > > e % 2 > > then the traditional approach must yield: > { 0 => [ 2, 4, 6, 8 ], > 1 => [ 1, 3, 5, 7] } > > but the concurrent approach could yield: > > { 0 => [ 6, 2, 4, 8 ], > 1 => [ 7, 1, 3, 5 ] } > > So the question is, when confronted with an obvious desire to use a > concurrent-safe collector, do we infer that the user must not care about > ordering? > > > The most succinct indication of its function is: >> >> The collect(Collector) method currently performs a >> concurrent collection when all of the following are true: >> - the stream is parallel >> - the collector is *concurrent* >> - the collector is unordered OR the stream is unordered >> > > This is the current rule about whether or not collect() does a concurrent > reduction. My question here is whether we wish to make our existing > concurrent collectors always be unordered, so that the last bullet is > trivially satisfied for the built-in concurrent collectors. > > > In other words, *if* I happen to use groupingByConcurrent *then* maybe a >> concurrent collection will be performed, but maybe not, depending on a >> couple other factors... >> > > That is the current state of affairs. > > > Can we make this simpler and more intuitive/predictable? I realize >> that's what you're addressing now, but can't we go a lot farther? >> >> Can we, say, get rid of groupingByConcurrent and just assume that if the >> stream is parallel? What do we lose? Do we lose any functionality that >> can't be derived another way? >> > > That would cause us to access a non-thread-safe HashMap concurrently from > multiple threads. > > From brian.goetz at oracle.com Mon Apr 8 13:19:16 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 16:19:16 -0400 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: References: <516315C1.3080509@oracle.com> <51631F98.8030304@oracle.com> Message-ID: <51632644.2020502@oracle.com> > I assumed that a concurrent collection would use a concurrent map. > Isn't it reasonable to assume that operations on a parallel stream > will use thread-safe collections? ABSOLUTELY NOT! Any non-thread-safe collection can be used as a source for a parallel stream, without any more synchronization than is already implicit in the FJ library. (Some may partition better than others, though; linked lists are never going to be parallel screamers.) Similarly, any reduction can be done in parallel into a non-thread-safe collection. Many of our collectors use non-thread-safe result containers like ArrayList, StringBuilder, or HashMap but are still perfectly parallel-safe. The library provides the necessary isolation, so that these non-thread-safe containers are serially thread-confined and still we can get decent parallelism. The only thing the user has to be careful of in order to not undermine this wonderful gift is to avoid interference. Interference includes things like: - Modifying the source while you're doing a stream operation on it. - Using "lambdas" that depend on state that might be modified during the course of the stream operation. In other words, as long as you can hold relevant state constant for the duration of your query, you get all this parallelism for free without having to think about thread safety or use thread-safe collections. Effective immutability is a very powerful thing. > BTW, the other downside of the current state of affairs is experienced > by the user who specifies a parallel stream and even declares it > unordered, but still gets a non-concurrent collection because groupingBy > was used instead of groupingByConcurrent. Right. But he will still get a parallel reduction. It just may be that in some cases, he gets a reduction that parallelizes poorly, because the combine step of the reduction happens to be way more expensive that the accumulate step, as it is when the combine step is a merge-maps-by-key. (We have no way of knowing this a priori. Some non-concurrent reductions will parallelize with fine performance and have no need of the additional benefit that a concurrent collection gives.) > In your examples, the difference between the two results is primarily > one of order, not concurrency. Can we reflect this choice more directly > in the API? We used to have that -- the selection of ordering (collect vs collectUnordered) was orthogonal to the collector, and we did a concurrent collection if we were in the (unordered, concurrent) quadrant. That's the most explicit. From joe.bowbeer at gmail.com Mon Apr 8 13:41:49 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 8 Apr 2013 13:41:49 -0700 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: <51632644.2020502@oracle.com> References: <516315C1.3080509@oracle.com> <51631F98.8030304@oracle.com> <51632644.2020502@oracle.com> Message-ID: > > In other words, as long as you can hold relevant state constant for the > duration of your query, you get all this parallelism for free without > having to think about thread safety or use thread-safe collections. I'm using the forms of collect that hide the collections completely (except as a return type). I was only thinking about the order vs unorder and parallel vs sequential aspects -- and I'd prefer to keep it that way. So, for example: collect(unordered+parallel) should perform a concurrent collection? (But you've already indicated that yes I do, in addition, need to think about the collection type in this case even if I don't handle the construction, right?) Whereas your question is: collectConcurrent(ordered+parallel) should disregard order? I'm OK with this, but I wish groupingByConcurrent could go away. --Joe On Mon, Apr 8, 2013 at 1:19 PM, Brian Goetz wrote: > I assumed that a concurrent collection would use a concurrent map. >> Isn't it reasonable to assume that operations on a parallel stream >> will use thread-safe collections? >> > > ABSOLUTELY NOT! > > Any non-thread-safe collection can be used as a source for a parallel > stream, without any more synchronization than is already implicit in the FJ > library. (Some may partition better than others, though; linked lists are > never going to be parallel screamers.) > > Similarly, any reduction can be done in parallel into a non-thread-safe > collection. Many of our collectors use non-thread-safe result containers > like ArrayList, StringBuilder, or HashMap but are still perfectly > parallel-safe. The library provides the necessary isolation, so that these > non-thread-safe containers are serially thread-confined and still we can > get decent parallelism. > > The only thing the user has to be careful of in order to not undermine > this wonderful gift is to avoid interference. Interference includes things > like: > - Modifying the source while you're doing a stream operation on it. > - Using "lambdas" that depend on state that might be modified during the > course of the stream operation. > > In other words, as long as you can hold relevant state constant for the > duration of your query, you get all this parallelism for free without > having to think about thread safety or use thread-safe collections. > Effective immutability is a very powerful thing. > > > BTW, the other downside of the current state of affairs is experienced >> by the user who specifies a parallel stream and even declares it >> unordered, but still gets a non-concurrent collection because groupingBy >> was used instead of groupingByConcurrent. >> > > Right. But he will still get a parallel reduction. It just may be that > in some cases, he gets a reduction that parallelizes poorly, because the > combine step of the reduction happens to be way more expensive that the > accumulate step, as it is when the combine step is a merge-maps-by-key. > (We have no way of knowing this a priori. Some non-concurrent reductions > will parallelize with fine performance and have no need of the additional > benefit that a concurrent collection gives.) > > > In your examples, the difference between the two results is primarily >> one of order, not concurrency. Can we reflect this choice more directly >> in the API? >> > > We used to have that -- the selection of ordering (collect vs > collectUnordered) was orthogonal to the collector, and we did a concurrent > collection if we were in the (unordered, concurrent) quadrant. That's the > most explicit. > > > From brian.goetz at oracle.com Mon Apr 8 13:46:00 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 16:46:00 -0400 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: References: <516315C1.3080509@oracle.com> <51631F98.8030304@oracle.com> <51632644.2020502@oracle.com> Message-ID: <51632C88.4090309@oracle.com> > I'm using the forms of collect that hide the collections completely > (except as a return type). I was only thinking about the order vs > unorder and parallel vs sequential aspects -- and I'd prefer to keep it > that way. So, for example: > > collect(unordered+parallel) should perform a concurrent collection? Most of the collectors hide the return type, but expose the concurrent-ness of the return type in their name. Earlier, we had a separate bag (ConcurrentCollectors) for concurrent collectors. I disliked this because, with the obvious static imports, the user couldn't tell whether parStream.collect(groupingBy(f)) would be a concurrent (unordered) reduction or a traditional (ordered) one. > (But you've already indicated that yes I do, in addition, need to think > about the collection type in this case even if I don't handle the > construction, right?) Not the specific collection type. You do need to reason about shape (List vs Map), and you need to reason about concurrent vs not (HashMap vs CHM), but not necessarily about List vs Set. > Whereas your question is: > > collectConcurrent(ordered+parallel) should disregard order? More whether: Collectors.groupingByConcurrent(f) should declare itself to be an unordered Collector, just as toSet() is. From dl at cs.oswego.edu Mon Apr 8 13:57:34 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 08 Apr 2013 16:57:34 -0400 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: References: <516315C1.3080509@oracle.com> <51631F98.8030304@oracle.com> <51632644.2020502@oracle.com> Message-ID: <51632F3E.70505@cs.oswego.edu> On 04/08/13 16:41, Joe Bowbeer wrote: > I'm OK with this, but I wish groupingByConcurrent could go away. > These were the kinds of thoughts that led me last fall to suggest that we just tell people to do it themselves as a little idiom: chm = ... c.parallelStream().forEach( chm.merge(x->keyFor(x), x, mergefn); } The main downside is that this, the most commonly recommended way of doing parallel groupBy, would not be in the family of collect methods. Still maybe worth reconsidering though. -Doug From brian.goetz at oracle.com Mon Apr 8 14:09:44 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 17:09:44 -0400 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: <51632F3E.70505@cs.oswego.edu> References: <516315C1.3080509@oracle.com> <51631F98.8030304@oracle.com> <51632644.2020502@oracle.com> <51632F3E.70505@cs.oswego.edu> Message-ID: <51633218.6070906@oracle.com> That option is always available regardless of what we do with Collectors. Remember, where Collector really shines is not the simple things like this, but composite collections, like Map> biggestTransactionByBuyerSeller = stream.collect(groupingBy(Txn::buyer, groupingBy(Txn::seller, maxBy(comparing(Txn::amount)) The groupBy combinator lets you compose complex collections out of building blocks. These would have to be manually inlined with the explicit parallel forEach version. On 4/8/2013 4:57 PM, Doug Lea wrote: > On 04/08/13 16:41, Joe Bowbeer wrote: > >> I'm OK with this, but I wish groupingByConcurrent could go away. >> > > These were the kinds of thoughts that led me last fall to suggest > that we just tell people to do it themselves as a little idiom: > chm = ... > c.parallelStream().forEach( chm.merge(x->keyFor(x), x, mergefn); } > > The main downside is that this, the most commonly recommended > way of doing parallel groupBy, would not be in the family of > collect methods. Still maybe worth reconsidering though. > > -Doug > From spullara at gmail.com Mon Apr 8 14:40:32 2013 From: spullara at gmail.com (Sam Pullara) Date: Mon, 8 Apr 2013 14:40:32 -0700 Subject: Whither FlatMapper? In-Reply-To: <51632319.4040704@oracle.com> References: <5161F773.6050705@oracle.com> <51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com> Message-ID: <61E16080-A6C3-4B76-A42B-9D5E84A4D133@gmail.com> I like this plan. I'd hate to lose the lower level API. Sam On Apr 8, 2013, at 1:05 PM, Brian Goetz wrote: > A slight correction: if we remove the flatMap(FlatMapper), there is no fluent form that is as efficient as the removed form that accepts (T, Consumer), since there's no other way to get your hands on the downstream Sink. (Not that this dampens my enthusiasm for removing it much.) > > For the truly diffident, a middle ground does exist: remove FlatMapper and its six brothers as a named SAM, and replace it with BiConsumer>, leaving both forms of flatMap methods in place: > flatMap(Function>) > flapMap(BiConsumer>) > > The main advantage being that the package javadoc is not polluted by seven forms of FlatMapper. > > On 4/8/2013 3:27 PM, Doug Lea wrote: >> On 04/07/13 19:01, Sam Pullara wrote: >>> I'm a big fan of the current FlatMapper stuff that takes a Consumer. >>> Much more >>> efficient and straightforward when you don't have a stream or >>> collection to just >>> return. Here is some code that uses 3 of them for good effect: >> >> I think the main issue is whether, given the user reactions so far, we >> should insist on people using a generally better but non-obvious >> approach to flat-mapping. Considering that anyone *could* write their own >> FlatMappers layered on top of existing functionality (we could >> even show how to do it as a code example somewhere), I'm with >> Brian on this: give people the obvious forms in the API. People >> who are most likely to use it are the least likely to be obsessive >> about its performance. And when they are, they can learn about >> alternatives. >> >> -Doug >> From joe.bowbeer at gmail.com Mon Apr 8 14:47:30 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 8 Apr 2013 14:47:30 -0700 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: <51633218.6070906@oracle.com> References: <516315C1.3080509@oracle.com> <51631F98.8030304@oracle.com> <51632644.2020502@oracle.com> <51632F3E.70505@cs.oswego.edu> <51633218.6070906@oracle.com> Message-ID: On Mon, Apr 8, 2013 at 2:09 PM, Brian Goetz wrote: > That option is always available regardless of what we do with Collectors. > > Remember, where Collector really shines is not the simple things like > this, but composite collections, like > > Map> > biggestTransactionByBuyerSelle**r = > stream.collect(groupingBy(Txn:**:buyer, > groupingBy(Txn::seller, > maxBy(comparing(Txn::amount)) > > This is where groupingBy really shines :) But how is someone supposed to decide if any or some or all of these groupingBy's should really be groupingByConcurrent's? If we eliminated groupingByConcurrent in favor of a more explicit form in those cases, would that ruin the shine? --Joe > The groupBy combinator lets you compose complex collections out of > building blocks. These would have to be manually inlined with the explicit > parallel forEach version. > > > > On 4/8/2013 4:57 PM, Doug Lea wrote: > >> On 04/08/13 16:41, Joe Bowbeer wrote: >> >> I'm OK with this, but I wish groupingByConcurrent could go away. >>> >>> >> These were the kinds of thoughts that led me last fall to suggest >> that we just tell people to do it themselves as a little idiom: >> chm = ... >> c.parallelStream().forEach( chm.merge(x->keyFor(x), x, mergefn); } >> >> The main downside is that this, the most commonly recommended >> way of doing parallel groupBy, would not be in the family of >> collect methods. Still maybe worth reconsidering though. >> >> -Doug >> >> From brian.goetz at oracle.com Mon Apr 8 14:54:33 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 17:54:33 -0400 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: References: <516315C1.3080509@oracle.com> <51631F98.8030304@oracle.com> <51632644.2020502@oracle.com> <51632F3E.70505@cs.oswego.edu> <51633218.6070906@oracle.com> Message-ID: <51633C99.3030100@oracle.com> > Remember, where Collector really shines is not the simple things > like this, but composite collections, like > > Map> > biggestTransactionByBuyerSelle__r = > stream.collect(groupingBy(Txn:__:buyer, > groupingBy(Txn::seller, > > maxBy(comparing(Txn::amount)) > > This is where groupingBy really shines :) > > But how is someone supposed to decide if any or some or all of these > groupingBy's should really be groupingByConcurrent's? Basically, if they care more about performance than ordering. But groupingByConcurrent can do all the same cool composed collections that groupingBy can do. > If we eliminated groupingByConcurrent in favor of a more explicit form > in those cases, would that ruin the shine? I think that would be silly; instead of choosing between "fast" and "ordered", you would have to choose between "fast" and "ordered and powerful and flexible." Given that they can have powerful and flexible if they're willing to give up ordered, why would we do that? From forax at univ-mlv.fr Mon Apr 8 15:09:18 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 09 Apr 2013 00:09:18 +0200 Subject: Whither FlatMapper? In-Reply-To: <51632319.4040704@oracle.com> References: <5161F773.6050705@oracle.com> <51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com> Message-ID: <5163400E.70002@univ-mlv.fr> On 04/08/2013 10:05 PM, Brian Goetz wrote: > A slight correction: if we remove the flatMap(FlatMapper), there is no > fluent form that is as efficient as the removed form that accepts (T, > Consumer), since there's no other way to get your hands on the > downstream Sink. (Not that this dampens my enthusiasm for removing it > much.) > > For the truly diffident, a middle ground does exist: remove FlatMapper > and its six brothers as a named SAM, and replace it with BiConsumer Consumer>, leaving both forms of flatMap methods in place: > flatMap(Function>) > flapMap(BiConsumer>) > me trying to understand ... we don't have more forms due to the primitive specialization ? > The main advantage being that the package javadoc is not polluted by > seven forms of FlatMapper. R?mi > > On 4/8/2013 3:27 PM, Doug Lea wrote: >> On 04/07/13 19:01, Sam Pullara wrote: >>> I'm a big fan of the current FlatMapper stuff that takes a Consumer. >>> Much more >>> efficient and straightforward when you don't have a stream or >>> collection to just >>> return. Here is some code that uses 3 of them for good effect: >> >> I think the main issue is whether, given the user reactions so far, we >> should insist on people using a generally better but non-obvious >> approach to flat-mapping. Considering that anyone *could* write their >> own >> FlatMappers layered on top of existing functionality (we could >> even show how to do it as a code example somewhere), I'm with >> Brian on this: give people the obvious forms in the API. People >> who are most likely to use it are the least likely to be obsessive >> about its performance. And when they are, they can learn about >> alternatives. >> >> -Doug >> From brian.goetz at oracle.com Mon Apr 8 15:33:52 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 18:33:52 -0400 Subject: Whither FlatMapper? In-Reply-To: <5163400E.70002@univ-mlv.fr> References: <5161F773.6050705@oracle.com> <51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com> <5163400E.70002@univ-mlv.fr> Message-ID: <516345D0.3080103@oracle.com> OK, let me be more explicit. We currently have: Stream: Stream flatMap(Function> mapper); Stream flatMap(FlatMapper mapper); IntStream flatMapToInt(FlatMapper.ToInt mapper); LongStream flatMapToLong(FlatMapper.ToLong mapper); DoubleStream flatMapToDouble(FlatMapper.ToDouble mapper); Plus two forms in each of {Int,Long,Double}Stream: DoubleStream flatMap(DoubleFunction mapper); DoubleStream flatMap(FlatMapper.OfDoubleToDouble mapper); Plus seven variants of FlatMapper: FlatMapper FlatMapper.Of{Int,Long,Double} FlatMapper.OfXToX for X={Int,Long,Double} The proposal was to: - Keep the first form under Stream - Keep the first form under each of {Int,Long,Double}Stream - Remove the other forms - Remove all FlatMapper SAM variants - Add back 3 new Obj-to-int specializations to Stream: Stream flatMapToXxx(Function mapper); Then *all* the flatMap forms would take some form of element -> Stream function. The motivation is: no one can understand the (element, Consumer) versions of these, and, even when explained, most people can't understand why they would ever not use the T->Stream form, and the (element, Consumer) forms generate a lot of API surface area (including 7 classes in java.util.stream). The downside is that the T->STream form *is* intrinsically slower, though we've made pretty big progress lately on stream startup cost and anticipate making more. The objection to the proposal, coming from a few advanced users, is: "but, now that I *finally* figured out how the (element, Consumer) versions work, I realize they're faster, so I don't want to give them up." (Note that we can still always add them later.) The fallback position is to keep the methods as is, but drop the FlatMapper name, and instead fall back to BiConsumer>. Frankly, I think that makes the advanced forms even harder to understand. I still like the original proposal. On 4/8/2013 6:09 PM, Remi Forax wrote: > On 04/08/2013 10:05 PM, Brian Goetz wrote: >> A slight correction: if we remove the flatMap(FlatMapper), there is no >> fluent form that is as efficient as the removed form that accepts (T, >> Consumer), since there's no other way to get your hands on the >> downstream Sink. (Not that this dampens my enthusiasm for removing it >> much.) >> >> For the truly diffident, a middle ground does exist: remove FlatMapper >> and its six brothers as a named SAM, and replace it with BiConsumer> Consumer>, leaving both forms of flatMap methods in place: >> flatMap(Function>) >> flapMap(BiConsumer>) >> > > me trying to understand ... > we don't have more forms due to the primitive specialization ? > >> The main advantage being that the package javadoc is not polluted by >> seven forms of FlatMapper. > > R?mi > >> >> On 4/8/2013 3:27 PM, Doug Lea wrote: >>> On 04/07/13 19:01, Sam Pullara wrote: >>>> I'm a big fan of the current FlatMapper stuff that takes a Consumer. >>>> Much more >>>> efficient and straightforward when you don't have a stream or >>>> collection to just >>>> return. Here is some code that uses 3 of them for good effect: >>> >>> I think the main issue is whether, given the user reactions so far, we >>> should insist on people using a generally better but non-obvious >>> approach to flat-mapping. Considering that anyone *could* write their >>> own >>> FlatMappers layered on top of existing functionality (we could >>> even show how to do it as a code example somewhere), I'm with >>> Brian on this: give people the obvious forms in the API. People >>> who are most likely to use it are the least likely to be obsessive >>> about its performance. And when they are, they can learn about >>> alternatives. >>> >>> -Doug >>> > From brian.goetz at oracle.com Mon Apr 8 16:02:02 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 19:02:02 -0400 Subject: Whither FlatMapper? In-Reply-To: <51632319.4040704@oracle.com> References: <5161F773.6050705@oracle.com> <51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com> Message-ID: <51634C6A.1080301@oracle.com> Actually, there is an allocation-free path to get almost the Consumer-version performance with the non-consumer version, using the proposed StreamBuilder type (that also implements Spliterator and Stream, so "building" is allocation-free), and stuffing that into a ThreadLocal: ThreadLocal tl = ... ... stream.flatMap(e -> { StreamBuilder sb = tl.get(); sb.init(); // stuff elements into sb return sb.build(); // basically a no-op }); So I recant my earlier statement that there's no efficient way to simulate the consumer form. Its just ugly. And the above can be captured by a wrapping helper: Function> = wrapWithThreadLocalStreamBuilder( (T t, Consumer target) -> { /* old way */ }); So, I'm even more firmly in the "remove it" camp. On 4/8/2013 4:05 PM, Brian Goetz wrote: > A slight correction: if we remove the flatMap(FlatMapper), there is no > fluent form that is as efficient as the removed form that accepts (T, > Consumer), since there's no other way to get your hands on the > downstream Sink. (Not that this dampens my enthusiasm for removing it > much.) > > For the truly diffident, a middle ground does exist: remove FlatMapper > and its six brothers as a named SAM, and replace it with BiConsumer Consumer>, leaving both forms of flatMap methods in place: > flatMap(Function>) > flapMap(BiConsumer>) > > The main advantage being that the package javadoc is not polluted by > seven forms of FlatMapper. > > On 4/8/2013 3:27 PM, Doug Lea wrote: >> On 04/07/13 19:01, Sam Pullara wrote: >>> I'm a big fan of the current FlatMapper stuff that takes a Consumer. >>> Much more >>> efficient and straightforward when you don't have a stream or >>> collection to just >>> return. Here is some code that uses 3 of them for good effect: >> >> I think the main issue is whether, given the user reactions so far, we >> should insist on people using a generally better but non-obvious >> approach to flat-mapping. Considering that anyone *could* write their own >> FlatMappers layered on top of existing functionality (we could >> even show how to do it as a code example somewhere), I'm with >> Brian on this: give people the obvious forms in the API. People >> who are most likely to use it are the least likely to be obsessive >> about its performance. And when they are, they can learn about >> alternatives. >> >> -Doug >> From spullara at gmail.com Mon Apr 8 16:14:34 2013 From: spullara at gmail.com (Sam Pullara) Date: Mon, 8 Apr 2013 16:14:34 -0700 Subject: Whither FlatMapper? In-Reply-To: <51634C6A.1080301@oracle.com> References: <5161F773.6050705@oracle.com> <51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com> <51634C6A.1080301@oracle.com> Message-ID: That seems reasonable to me. Sam On Apr 8, 2013, at 4:02 PM, Brian Goetz wrote: > Actually, there is an allocation-free path to get almost the Consumer-version performance with the non-consumer version, using the proposed StreamBuilder type (that also implements Spliterator and Stream, so "building" is allocation-free), and stuffing that into a ThreadLocal: > > ThreadLocal tl = ... > > ... > > stream.flatMap(e -> { > StreamBuilder sb = tl.get(); > sb.init(); > // stuff elements into sb > return sb.build(); // basically a no-op > }); > > So I recant my earlier statement that there's no efficient way to simulate the consumer form. Its just ugly. > > And the above can be captured by a wrapping helper: > > Function> = wrapWithThreadLocalStreamBuilder( > (T t, Consumer target) -> { /* old way */ }); > > So, I'm even more firmly in the "remove it" camp. > > On 4/8/2013 4:05 PM, Brian Goetz wrote: >> A slight correction: if we remove the flatMap(FlatMapper), there is no >> fluent form that is as efficient as the removed form that accepts (T, >> Consumer), since there's no other way to get your hands on the >> downstream Sink. (Not that this dampens my enthusiasm for removing it >> much.) >> >> For the truly diffident, a middle ground does exist: remove FlatMapper >> and its six brothers as a named SAM, and replace it with BiConsumer> Consumer>, leaving both forms of flatMap methods in place: >> flatMap(Function>) >> flapMap(BiConsumer>) >> >> The main advantage being that the package javadoc is not polluted by >> seven forms of FlatMapper. >> >> On 4/8/2013 3:27 PM, Doug Lea wrote: >>> On 04/07/13 19:01, Sam Pullara wrote: >>>> I'm a big fan of the current FlatMapper stuff that takes a Consumer. >>>> Much more >>>> efficient and straightforward when you don't have a stream or >>>> collection to just >>>> return. Here is some code that uses 3 of them for good effect: >>> >>> I think the main issue is whether, given the user reactions so far, we >>> should insist on people using a generally better but non-obvious >>> approach to flat-mapping. Considering that anyone *could* write their own >>> FlatMappers layered on top of existing functionality (we could >>> even show how to do it as a code example somewhere), I'm with >>> Brian on this: give people the obvious forms in the API. People >>> who are most likely to use it are the least likely to be obsessive >>> about its performance. And when they are, they can learn about >>> alternatives. >>> >>> -Doug >>> From brian.goetz at oracle.com Mon Apr 8 17:08:02 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 20:08:02 -0400 Subject: Convenience Collector forms Message-ID: <51635BE2.1010909@oracle.com> One of the feedback items from the recent London Lambda Hack Day was "more convenience forms for Collectors please!". One suggested was "count()" (related are min/max/sum). Another is a dedicated form for frequency counting. The idea is that: - They are easier to read than their obvious reduce expansion; everyone understands count(), even if they don't understand reduce (this was an argument in favor of sum() and friends on IntStream). - They provide more on-ramp for understanding reduction and composition of reduction; the Javadoc for count() can explain itself in terms of reduction, and simple examples like this help connect the dots better. - They are more discoverable that some of the idioms they expand to (once someone discovers Collectors.) The implementations are of course trivial. So, on the block are: - Collector counting() - Collector minBy(Comparator) - Collector maxBy(Comparator) - Collector sumBy(Function) - Collector> countingFrequency() - Collector> countingFrequency(T -> K classifier) Q: Other Collector names are all of the form either toXxx or xxxing, which read relatively english-like: collect(groupingBy(f)) collect(toList()) The minBy, maxBy, and sumBy don't follow this form, though still don't read terribly. Sum can easily be "summingBy" but "minningBy" sucks. Is this naming OK? Q: Do we need separate long and int versions for sumBy()? From brian.goetz at oracle.com Mon Apr 8 17:10:34 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Apr 2013 20:10:34 -0400 Subject: Additional Collectors In-Reply-To: <515C6684.8020007@oracle.com> References: <515C6684.8020007@oracle.com> Message-ID: <51635C7A.9000805@oracle.com> And, still need to close on this one: > People also expressed concern that the "toMap()" (nee mappedTo, > joiningWith) is not flexible enough. As a reminder, what toMap does is > take a Stream and a function T->U and produces a Map. Some > people call this "backwards"; they would rather have something that > takes a Stream and function T->K and produces a Map. And others > would rather have something that takes two functions T->K and T->U and > produces a Map. > > All of these are useful enough. The question is how to fit them into > the API. I think the name "toMap" is a bit of a challenge, since there > are several "modes" and not all of them can be easily handled by > overloads. Maybe: > > toMap(T->U) // first version > toMap(T->K, T->U) // third version > > and leave the second version out, since the third version can easily > simulate the second? From joe.bowbeer at gmail.com Mon Apr 8 19:32:37 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 8 Apr 2013 19:32:37 -0700 Subject: Convenience Collector forms In-Reply-To: <51635BE2.1010909@oracle.com> References: <51635BE2.1010909@oracle.com> Message-ID: Q: Why is the method on the block called counting() instead of the proposed count()? Except for possibly count(), I'm not liking any of these, because: 1. There is already enough exposed "reduce" surface area in max/min/sum. 2. map/reduce is where it's at. It's easier for me to read code that uses those familiar forms than it is to familiarize myself with a bunch of new convenience methods. I don't think these new forms are going to make Collectors easier to learn, or collectors code easier to read (except at a very superficial level). On Mon, Apr 8, 2013 at 5:08 PM, Brian Goetz wrote: > One of the feedback items from the recent London Lambda Hack Day was "more > convenience forms for Collectors please!". One suggested was "count()" > (related are min/max/sum). Another is a dedicated form for frequency > counting. > > The idea is that: > - They are easier to read than their obvious reduce expansion; everyone > understands count(), even if they don't understand reduce (this was an > argument in favor of sum() and friends on IntStream). > - They provide more on-ramp for understanding reduction and composition > of reduction; the Javadoc for count() can explain itself in terms of > reduction, and simple examples like this help connect the dots better. > - They are more discoverable that some of the idioms they expand to (once > someone discovers Collectors.) > > The implementations are of course trivial. > > So, on the block are: > > - Collector counting() > - Collector minBy(Comparator) > - Collector maxBy(Comparator) > - Collector sumBy(Function) > - Collector> countingFrequency() > - Collector> countingFrequency(T -> K classifier) > > Q: Other Collector names are all of the form either toXxx or xxxing, which > read relatively english-like: > > collect(groupingBy(f)) > collect(toList()) > > The minBy, maxBy, and sumBy don't follow this form, though still don't > read terribly. Sum can easily be "summingBy" but "minningBy" sucks. Is > this naming OK? > > Q: Do we need separate long and int versions for sumBy()? > > From ali.ebrahimi1781 at gmail.com Mon Apr 8 22:27:52 2013 From: ali.ebrahimi1781 at gmail.com (Ali Ebrahimi) Date: Tue, 9 Apr 2013 09:57:52 +0430 Subject: Convenience Collector forms In-Reply-To: References: <51635BE2.1010909@oracle.com> Message-ID: Hi, I suggest "counter". Ali Ebrahimi On Tue, Apr 9, 2013 at 7:02 AM, Joe Bowbeer wrote: > Q: Why is the method on the block called counting() instead of the proposed > count()? > > Except for possibly count(), I'm not liking any of these, because: > > 1. There is already enough exposed "reduce" surface area in max/min/sum. > > 2. map/reduce is where it's at. It's easier for me to read code that uses > those familiar forms than it is to familiarize myself with a bunch of new > convenience methods. > > I don't think these new forms are going to make Collectors easier to learn, > or collectors code easier to read (except at a very superficial level). > > > > On Mon, Apr 8, 2013 at 5:08 PM, Brian Goetz > wrote: > > > One of the feedback items from the recent London Lambda Hack Day was > "more > > convenience forms for Collectors please!". One suggested was "count()" > > (related are min/max/sum). Another is a dedicated form for frequency > > counting. > > > > The idea is that: > > - They are easier to read than their obvious reduce expansion; everyone > > understands count(), even if they don't understand reduce (this was an > > argument in favor of sum() and friends on IntStream). > > - They provide more on-ramp for understanding reduction and composition > > of reduction; the Javadoc for count() can explain itself in terms of > > reduction, and simple examples like this help connect the dots better. > > - They are more discoverable that some of the idioms they expand to > (once > > someone discovers Collectors.) > > > > The implementations are of course trivial. > > > > So, on the block are: > > > > - Collector counting() > > - Collector minBy(Comparator) > > - Collector maxBy(Comparator) > > - Collector sumBy(Function) > > - Collector> countingFrequency() > > - Collector> countingFrequency(T -> K classifier) > > > > Q: Other Collector names are all of the form either toXxx or xxxing, > which > > read relatively english-like: > > > > collect(groupingBy(f)) > > collect(toList()) > > > > The minBy, maxBy, and sumBy don't follow this form, though still don't > > read terribly. Sum can easily be "summingBy" but "minningBy" sucks. Is > > this naming OK? > > > > Q: Do we need separate long and int versions for sumBy()? > > > > > From tim at peierls.net Tue Apr 9 06:27:06 2013 From: tim at peierls.net (Tim Peierls) Date: Tue, 9 Apr 2013 09:27:06 -0400 Subject: Convenience Collector forms In-Reply-To: References: <51635BE2.1010909@oracle.com> Message-ID: On Mon, Apr 8, 2013 at 10:32 PM, Joe Bowbeer wrote: > Q: Why is the method on the block called counting() instead of the > proposed count()? > I like the adverbial form because it reads more like English. Except for possibly count(), I'm not liking any of these, because: > > 1. There is already enough exposed "reduce" surface area in max/min/sum. > > 2. map/reduce is where it's at. It's easier for me to read code that uses > those familiar forms than it is to familiarize myself with a bunch of new > convenience methods. > > I don't think these new forms are going to make Collectors easier to > learn, or collectors code easier to read (except at a very superficial > level). > I think there are many folks for whom these convenience Collectors will make the difference between ignoring and using streams. As long as they're bundled as static factory methods in a Collectors class, I don't see the problem. --tim From brian.goetz at oracle.com Tue Apr 9 12:54:44 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 09 Apr 2013 15:54:44 -0400 Subject: Survey results Message-ID: <51647204.4010108@oracle.com> Closing three surveys, responses are here: FlatMapper: https://www.surveymonkey.com/sr.aspx?sm=eqAnAfK4z0IjPKVllUu24NW38AIeF5NiPcBxcrdMTVc_3d Resolution: FlatMapper removed as per discussion. Collector: https://www.surveymonkey.com/sr.aspx?sm=eqAnAfK4z0IjPKVllUu24C2CNuL68Gm6quYPmGqoZ9A_3d Resolution: spec adjusted as per comments -- additional spec work still needed. Stream: https://www.surveymonkey.com/sr.aspx?sm=QyMHR9lw9a4qhahv_2bP4ePapqLUvfdbFSi_2fYBYkt2zgA_3d From brian.goetz at oracle.com Tue Apr 9 13:22:29 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 09 Apr 2013 16:22:29 -0400 Subject: Survey results In-Reply-To: <51647204.4010108@oracle.com> References: <51647204.4010108@oracle.com> Message-ID: <51647885.6020604@oracle.com> Updated Javadoc at: http://cr.openjdk.java.net/~briangoetz/JDK-8008682/api/java/util/stream/ On 4/9/2013 3:54 PM, Brian Goetz wrote: > Closing three surveys, responses are here: > > FlatMapper: > https://www.surveymonkey.com/sr.aspx?sm=eqAnAfK4z0IjPKVllUu24NW38AIeF5NiPcBxcrdMTVc_3d > > > Resolution: FlatMapper removed as per discussion. > > Collector: > https://www.surveymonkey.com/sr.aspx?sm=eqAnAfK4z0IjPKVllUu24C2CNuL68Gm6quYPmGqoZ9A_3d > > > Resolution: spec adjusted as per comments -- additional spec work still > needed. > > Stream: > https://www.surveymonkey.com/sr.aspx?sm=QyMHR9lw9a4qhahv_2bP4ePapqLUvfdbFSi_2fYBYkt2zgA_3d > > From brian.goetz at oracle.com Tue Apr 9 13:25:28 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 09 Apr 2013 16:25:28 -0400 Subject: Survey: API review for static factory methods Message-ID: <51647938.5080708@oracle.com> I've posted a survey for the static factory methods in Streams at: https://www.surveymonkey.com/s/5WZ7NJL We are also planning to add singletonStream() factories. Usual password. From brian.goetz at oracle.com Tue Apr 9 14:16:58 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 09 Apr 2013 17:16:58 -0400 Subject: Possible groupingBy simplification? Message-ID: <5164854A.3000006@oracle.com> Paul suggested the following possible simplification for groupingBy. It is somewhat counterintuitive at first glance, in that it removes the most commonly used form (!), but might make things easier to grasp in the long run (aided by good docs.) Recall we currently have four forms of groupingBy: // classifier only -- maps keys to list of matching elements Collector>> groupingBy(Function classifier) // Like above, but with explicit map ctor >> Collector groupingBy(Function classifier, Supplier mapFactory) // basic cascaded form Collector> groupingBy(Function classifier, Collector downstream) // cascaded form with explicit ctor > Collector groupingBy(Function classifier, Supplier mapFactory, Collector downstream) Plus four corresponding forms for groupingByConcurrent. The first form is likely to be the most common, as it is the traditional "group by". It is equivalent to: groupingBy(classifier, toList()); The proposal is: Drop the first two forms. Just as users can learn that to collect elements into a list, you do: collect(toList()) people can learn that to do the simple form of groupBy, you can do: collect(groupingBy(f, toList()); Which also reads perfectly well. By cutting the number of forms in half, it helps users to realize that groupingBy does just one thing -- classifies elements by key, and collects elements associated with that key. Obviously the docs for groupingBy can show examples of the simple grouping as well as more sophisticated groupings. From brian.goetz at oracle.com Tue Apr 9 14:29:21 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 09 Apr 2013 17:29:21 -0400 Subject: toMap options Message-ID: <51648831.4060301@oracle.com> Currently we have: Collector> toMap(Function mapper) and > Collector toMap(Function mapper, Supplier mapSupplier, BinaryOperator mergeFunction) (plus concurrent versions of both of these.) The former is just sugar for: toMap(mapper, HashMap::new, throwingMerger()) (We have predefined merge functions for throw-on-duplicates, first-wins, and last-wins, called throwingMerger, firstWinsMerger, and lastWinsMerger.) As has been noted, we do not yet serve the use case of creating a map where the stream elements are the values of the map instead of the keys of the map. Options for addressing this are: 1. Leave toMap as is, add toIndexedMap (or toKeyedMap) variants. 2. Leave toMap as is, add a two-function version of toMap: Collector> toMap(Function keyMapper, Function valueMapper) in which case the regular toMap becomes sugar for toMap(Function.identity(), mapper) 3. Get rid of the current form of toMap, and just have the two-function form as in (2). 4. Break free of the toMap naming (recall that until recently this was called mappedTo, and prior to that, joiningWith), and have two versions: mappedTo and mappedFrom. This is explicit, but also doesn't address the use case where both key and value are functions of the stream elements. Others? From joe.bowbeer at gmail.com Tue Apr 9 14:56:49 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Tue, 9 Apr 2013 14:56:49 -0700 Subject: Possible groupingBy simplification? In-Reply-To: <5164854A.3000006@oracle.com> References: <5164854A.3000006@oracle.com> Message-ID: I like the most popular form. In fact, I think it's the only one that I've used. The argument that users will gain by removing their most common form seems kind of far-fetched. In my experience, I do a ctrl-space and look for my target return type on the right-hand-side of the IDE popup, and then I try to fill in the missing information, such as parameters. In this case, having to provide toList() would probably be a stumbling block for me, as the IDE is not as good when it comes to suggesting expressions for parameters. I sort of like the symmetry with collect(toList()) but not enough to make up for the loss. On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz wrote: > Paul suggested the following possible simplification for groupingBy. It > is somewhat counterintuitive at first glance, in that it removes the most > commonly used form (!), but might make things easier to grasp in the long > run (aided by good docs.) > > Recall we currently have four forms of groupingBy: > > // classifier only -- maps keys to list of matching elements > Collector>> > groupingBy(Function classifier) > > // Like above, but with explicit map ctor > >> > Collector > groupingBy(Function classifier, > Supplier mapFactory) > > // basic cascaded form > Collector> > groupingBy(Function classifier, > Collector downstream) > > // cascaded form with explicit ctor > > > Collector > groupingBy(Function classifier, > Supplier mapFactory, > Collector downstream) > > Plus four corresponding forms for groupingByConcurrent. > > The first form is likely to be the most common, as it is the traditional > "group by". It is equivalent to: > > groupingBy(classifier, toList()); > > The proposal is: Drop the first two forms. Just as users can learn that > to collect elements into a list, you do: > > collect(toList()) > > people can learn that to do the simple form of groupBy, you can do: > > collect(groupingBy(f, toList()); > > Which also reads perfectly well. > > By cutting the number of forms in half, it helps users to realize that > groupingBy does just one thing -- classifies elements by key, and collects > elements associated with that key. Obviously the docs for groupingBy can > show examples of the simple grouping as well as more sophisticated > groupings. > > From joe.bowbeer at gmail.com Tue Apr 9 15:03:42 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Tue, 9 Apr 2013 15:03:42 -0700 Subject: Convenience Collector forms In-Reply-To: References: <51635BE2.1010909@oracle.com> Message-ID: I didn't understand that these were proposed for the Collectors tools class. I don't see a problem with that either. On Tue, Apr 9, 2013 at 6:27 AM, Tim Peierls wrote: > On Mon, Apr 8, 2013 at 10:32 PM, Joe Bowbeer wrote: > >> Q: Why is the method on the block called counting() instead of the >> proposed count()? >> > > I like the adverbial form because it reads more like English. > > > Except for possibly count(), I'm not liking any of these, because: >> >> 1. There is already enough exposed "reduce" surface area in max/min/sum. >> >> 2. map/reduce is where it's at. It's easier for me to read code that >> uses those familiar forms than it is to familiarize myself with a bunch of >> new convenience methods. >> >> I don't think these new forms are going to make Collectors easier to >> learn, or collectors code easier to read (except at a very superficial >> level). >> > > I think there are many folks for whom these convenience Collectors will > make the difference between ignoring and using streams. As long as they're > bundled as static factory methods in a Collectors class, I don't see the > problem. > > --tim > From joe.bowbeer at gmail.com Tue Apr 9 15:34:45 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Tue, 9 Apr 2013 15:34:45 -0700 Subject: Possible groupingBy simplification? In-Reply-To: References: <5164854A.3000006@oracle.com> Message-ID: On a positive note, the shining example would be unchanged by this proposal: Map> biggestTransactionByBuyerSelle**r = stream.collect(groupingBy(Txn:**:buyer, groupingBy(Txn::seller, maxBy(comparing(Txn::amount)) I suggest leading users to the general form by illustrating the equivalence in the groupingBy(f) documentation. On Tue, Apr 9, 2013 at 2:56 PM, Joe Bowbeer wrote: > I like the most popular form. In fact, I think it's the only one that > I've used. > > The argument that users will gain by removing their most common form seems > kind of far-fetched. > > In my experience, I do a ctrl-space and look for my target return type on > the right-hand-side of the IDE popup, and then I try to fill in the missing > information, such as parameters. In this case, having to provide toList() > would probably be a stumbling block for me, as the IDE is not as good when > it comes to suggesting expressions for parameters. > > I sort of like the symmetry with collect(toList()) but not enough to make > up for the loss. > > > > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz wrote: > >> Paul suggested the following possible simplification for groupingBy. It >> is somewhat counterintuitive at first glance, in that it removes the most >> commonly used form (!), but might make things easier to grasp in the long >> run (aided by good docs.) >> >> Recall we currently have four forms of groupingBy: >> >> // classifier only -- maps keys to list of matching elements >> Collector>> >> groupingBy(Function classifier) >> >> // Like above, but with explicit map ctor >> >> >> Collector >> groupingBy(Function classifier, >> Supplier mapFactory) >> >> // basic cascaded form >> Collector> >> groupingBy(Function classifier, >> Collector downstream) >> >> // cascaded form with explicit ctor >> > >> Collector >> groupingBy(Function classifier, >> Supplier mapFactory, >> Collector downstream) >> >> Plus four corresponding forms for groupingByConcurrent. >> >> The first form is likely to be the most common, as it is the traditional >> "group by". It is equivalent to: >> >> groupingBy(classifier, toList()); >> >> The proposal is: Drop the first two forms. Just as users can learn that >> to collect elements into a list, you do: >> >> collect(toList()) >> >> people can learn that to do the simple form of groupBy, you can do: >> >> collect(groupingBy(f, toList()); >> >> Which also reads perfectly well. >> >> By cutting the number of forms in half, it helps users to realize that >> groupingBy does just one thing -- classifies elements by key, and collects >> elements associated with that key. Obviously the docs for groupingBy can >> show examples of the simple grouping as well as more sophisticated >> groupings. >> >> > From Donald.Raab at gs.com Tue Apr 9 15:56:47 2013 From: Donald.Raab at gs.com (Raab, Donald) Date: Tue, 9 Apr 2013 18:56:47 -0400 Subject: toMap options In-Reply-To: <51648831.4060301@oracle.com> References: <51648831.4060301@oracle.com> Message-ID: <6712820CB52CFB4D842561213A77C05404C97093FF@GSCMAMP09EX.firmwide.corp.gs.com> 3 sounds good to me. This is the only form we've supported over the years. I don't recall anyone complaining about the lack of more sugar here. http://www.goldmansachs.com/gs-collections/javadoc/3.0.0/com/gs/collections/api/RichIterable.html > 1. Leave toMap as is, add toIndexedMap (or toKeyedMap) variants. > > 2. Leave toMap as is, add a two-function version of toMap: > > > Collector> > toMap(Function keyMapper, > Function valueMapper) > > in which case the regular toMap becomes sugar for > > toMap(Function.identity(), mapper) > > 3. Get rid of the current form of toMap, and just have the two- > function form as in (2). > > 4. Break free of the toMap naming (recall that until recently this was > called mappedTo, and prior to that, joiningWith), and have two > versions: > mappedTo and mappedFrom. This is explicit, but also doesn't address > the use case where both key and value are functions of the stream > elements. > > Others? From spullara at gmail.com Tue Apr 9 16:28:29 2013 From: spullara at gmail.com (Sam Pullara) Date: Tue, 9 Apr 2013 16:28:29 -0700 Subject: toMap options In-Reply-To: <51648831.4060301@oracle.com> References: <51648831.4060301@oracle.com> Message-ID: I like version 3 as well. Sam On Apr 9, 2013, at 2:29 PM, Brian Goetz wrote: > Currently we have: > > Collector> > toMap(Function mapper) > > and > > > > Collector > toMap(Function mapper, > Supplier mapSupplier, > BinaryOperator mergeFunction) > > (plus concurrent versions of both of these.) The former is just sugar for: > > toMap(mapper, HashMap::new, throwingMerger()) > > (We have predefined merge functions for throw-on-duplicates, first-wins, and last-wins, called throwingMerger, firstWinsMerger, and lastWinsMerger.) > > As has been noted, we do not yet serve the use case of creating a map where the stream elements are the values of the map instead of the keys of the map. Options for addressing this are: > > 1. Leave toMap as is, add toIndexedMap (or toKeyedMap) variants. > > 2. Leave toMap as is, add a two-function version of toMap: > > > Collector> > toMap(Function keyMapper, > Function valueMapper) > > in which case the regular toMap becomes sugar for > > toMap(Function.identity(), mapper) > > 3. Get rid of the current form of toMap, and just have the two-function form as in (2). > > 4. Break free of the toMap naming (recall that until recently this was called mappedTo, and prior to that, joiningWith), and have two versions: mappedTo and mappedFrom. This is explicit, but also doesn't address the use case where both key and value are functions of the stream elements. > > Others? > From brian.goetz at oracle.com Tue Apr 9 16:33:45 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 09 Apr 2013 19:33:45 -0400 Subject: toMap options In-Reply-To: References: <51648831.4060301@oracle.com> Message-ID: <5164A559.1070600@oracle.com> I'm good with #3. Any objections? On 4/9/2013 7:28 PM, Sam Pullara wrote: > I like version 3 as well. > > Sam > > On Apr 9, 2013, at 2:29 PM, Brian Goetz wrote: > >> Currently we have: >> >> Collector> >> toMap(Function mapper) >> >> and >> >> > >> Collector >> toMap(Function mapper, >> Supplier mapSupplier, >> BinaryOperator mergeFunction) >> >> (plus concurrent versions of both of these.) The former is just sugar for: >> >> toMap(mapper, HashMap::new, throwingMerger()) >> >> (We have predefined merge functions for throw-on-duplicates, first-wins, and last-wins, called throwingMerger, firstWinsMerger, and lastWinsMerger.) >> >> As has been noted, we do not yet serve the use case of creating a map where the stream elements are the values of the map instead of the keys of the map. Options for addressing this are: >> >> 1. Leave toMap as is, add toIndexedMap (or toKeyedMap) variants. >> >> 2. Leave toMap as is, add a two-function version of toMap: >> >> >> Collector> >> toMap(Function keyMapper, >> Function valueMapper) >> >> in which case the regular toMap becomes sugar for >> >> toMap(Function.identity(), mapper) >> >> 3. Get rid of the current form of toMap, and just have the two-function form as in (2). >> >> 4. Break free of the toMap naming (recall that until recently this was called mappedTo, and prior to that, joiningWith), and have two versions: mappedTo and mappedFrom. This is explicit, but also doesn't address the use case where both key and value are functions of the stream elements. >> >> Others? >> > From tim at peierls.net Tue Apr 9 16:48:50 2013 From: tim at peierls.net (Tim Peierls) Date: Tue, 9 Apr 2013 19:48:50 -0400 Subject: toMap options In-Reply-To: <5164A559.1070600@oracle.com> References: <51648831.4060301@oracle.com> <5164A559.1070600@oracle.com> Message-ID: No objection, but now it makes me wonder: How do you get the effect of toMultimap(T->K, T->V)? In other words, how would you get a Map> from a Stream given T->K and T->V mappings? --tim On Tue, Apr 9, 2013 at 7:33 PM, Brian Goetz wrote: > I'm good with #3. Any objections? > > > On 4/9/2013 7:28 PM, Sam Pullara wrote: > >> I like version 3 as well. >> >> Sam >> >> On Apr 9, 2013, at 2:29 PM, Brian Goetz wrote: >> >> Currently we have: >>> >>> Collector> >>> toMap(Function mapper) >>> >>> and >>> >>> > >>> Collector >>> toMap(Function mapper, >>> Supplier mapSupplier, >>> BinaryOperator mergeFunction) >>> >>> (plus concurrent versions of both of these.) The former is just sugar >>> for: >>> >>> toMap(mapper, HashMap::new, throwingMerger()) >>> >>> (We have predefined merge functions for throw-on-duplicates, first-wins, >>> and last-wins, called throwingMerger, firstWinsMerger, and lastWinsMerger.) >>> >>> As has been noted, we do not yet serve the use case of creating a map >>> where the stream elements are the values of the map instead of the keys of >>> the map. Options for addressing this are: >>> >>> 1. Leave toMap as is, add toIndexedMap (or toKeyedMap) variants. >>> >>> 2. Leave toMap as is, add a two-function version of toMap: >>> >>> >>> Collector> >>> toMap(Function keyMapper, >>> Function valueMapper) >>> >>> in which case the regular toMap becomes sugar for >>> >>> toMap(Function.identity(), mapper) >>> >>> 3. Get rid of the current form of toMap, and just have the two-function >>> form as in (2). >>> >>> 4. Break free of the toMap naming (recall that until recently this was >>> called mappedTo, and prior to that, joiningWith), and have two versions: >>> mappedTo and mappedFrom. This is explicit, but also doesn't address the >>> use case where both key and value are functions of the stream elements. >>> >>> Others? >>> >>> >> From brian.goetz at oracle.com Tue Apr 9 16:51:15 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 09 Apr 2013 19:51:15 -0400 Subject: toMap options In-Reply-To: References: <51648831.4060301@oracle.com> <5164A559.1070600@oracle.com> Message-ID: <5164A973.10402@oracle.com> So you've got a Stream, and you want a Map>, and you've got a T->K called "f" and a T->V called "g". Easy: Map> multiMap stream.collect(groupingBy(f, mapping(g, toList())); On 4/9/2013 7:48 PM, Tim Peierls wrote: > No objection, but now it makes me wonder: How do you get the effect of > toMultimap(T->K, T->V)? In other words, how would you get a Map Collection> from a Stream given T->K and T->V mappings? > > --tim > > On Tue, Apr 9, 2013 at 7:33 PM, Brian Goetz > wrote: > > I'm good with #3. Any objections? > > > On 4/9/2013 7:28 PM, Sam Pullara wrote: > > I like version 3 as well. > > Sam > > On Apr 9, 2013, at 2:29 PM, Brian Goetz > wrote: > > Currently we have: > > Collector> > toMap(Function mapper) > > and > > > > Collector > toMap(Function mapper, > Supplier mapSupplier, > BinaryOperator mergeFunction) > > (plus concurrent versions of both of these.) The former is > just sugar for: > > toMap(mapper, HashMap::new, throwingMerger()) > > (We have predefined merge functions for throw-on-duplicates, > first-wins, and last-wins, called throwingMerger, > firstWinsMerger, and lastWinsMerger.) > > As has been noted, we do not yet serve the use case of > creating a map where the stream elements are the values of > the map instead of the keys of the map. Options for > addressing this are: > > 1. Leave toMap as is, add toIndexedMap (or toKeyedMap) > variants. > > 2. Leave toMap as is, add a two-function version of toMap: > > > Collector> > toMap(Function keyMapper, > Function valueMapper) > > in which case the regular toMap becomes sugar for > > toMap(Function.identity(), mapper) > > 3. Get rid of the current form of toMap, and just have the > two-function form as in (2). > > 4. Break free of the toMap naming (recall that until > recently this was called mappedTo, and prior to that, > joiningWith), and have two versions: mappedTo and > mappedFrom. This is explicit, but also doesn't address the > use case where both key and value are functions of the > stream elements. > > Others? > > > From tim at peierls.net Tue Apr 9 17:23:13 2013 From: tim at peierls.net (Tim Peierls) Date: Tue, 9 Apr 2013 20:23:13 -0400 Subject: toMap options In-Reply-To: <5164A973.10402@oracle.com> References: <51648831.4060301@oracle.com> <5164A559.1070600@oracle.com> <5164A973.10402@oracle.com> Message-ID: Easy if you know how! At any rate, it's doable, and this might serve as an example for groupingBy. --tim On Tue, Apr 9, 2013 at 7:51 PM, Brian Goetz wrote: > So you've got a Stream, and you want a Map>, and > you've got a T->K called "f" and a T->V called "g". Easy: > > Map> multiMap > stream.collect(groupingBy(f, mapping(g, toList())); > > > On 4/9/2013 7:48 PM, Tim Peierls wrote: > >> No objection, but now it makes me wonder: How do you get the effect of >> toMultimap(T->K, T->V)? In other words, how would you get a Map> Collection> from a Stream given T->K and T->V mappings? >> >> --tim >> >> On Tue, Apr 9, 2013 at 7:33 PM, Brian Goetz > > wrote: >> >> I'm good with #3. Any objections? >> >> >> On 4/9/2013 7:28 PM, Sam Pullara wrote: >> >> I like version 3 as well. >> >> Sam >> >> On Apr 9, 2013, at 2:29 PM, Brian Goetz > > wrote: >> >> Currently we have: >> >> Collector> >> toMap(Function mapper) >> >> and >> >> > >> Collector >> toMap(Function mapper, >> Supplier mapSupplier, >> BinaryOperator mergeFunction) >> >> (plus concurrent versions of both of these.) The former is >> just sugar for: >> >> toMap(mapper, HashMap::new, throwingMerger()) >> >> (We have predefined merge functions for throw-on-duplicates, >> first-wins, and last-wins, called throwingMerger, >> firstWinsMerger, and lastWinsMerger.) >> >> As has been noted, we do not yet serve the use case of >> creating a map where the stream elements are the values of >> the map instead of the keys of the map. Options for >> addressing this are: >> >> 1. Leave toMap as is, add toIndexedMap (or toKeyedMap) >> variants. >> >> 2. Leave toMap as is, add a two-function version of toMap: >> >> >> Collector> >> toMap(Function keyMapper, >> Function valueMapper) >> >> in which case the regular toMap becomes sugar for >> >> toMap(Function.identity(), mapper) >> >> 3. Get rid of the current form of toMap, and just have the >> two-function form as in (2). >> >> 4. Break free of the toMap naming (recall that until >> recently this was called mappedTo, and prior to that, >> joiningWith), and have two versions: mappedTo and >> mappedFrom. This is explicit, but also doesn't address the >> use case where both key and value are functions of the >> stream elements. >> >> Others? >> >> >> >> From brian.goetz at oracle.com Tue Apr 9 18:52:39 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 09 Apr 2013 21:52:39 -0400 Subject: toMap options In-Reply-To: References: <51648831.4060301@oracle.com> <5164A559.1070600@oracle.com> <5164A973.10402@oracle.com> Message-ID: <5164C5E7.5040501@oracle.com> Note also that the separation of Collector from Stream allows you to write your own, and allows Guava to publish properly typed collectors for, say, their implementation of Multimap. The pedagogical question remains, though -- how to spread these examples throughout Javadoc so people are exposed to the idioms. On 4/9/2013 8:23 PM, Tim Peierls wrote: > Easy if you know how! At any rate, it's doable, and this might serve as > an example for groupingBy. > > --tim > > On Tue, Apr 9, 2013 at 7:51 PM, Brian Goetz > wrote: > > So you've got a Stream, and you want a Map>, and > you've got a T->K called "f" and a T->V called "g". Easy: > > Map> multiMap > stream.collect(groupingBy(f, mapping(g, toList())); > > > On 4/9/2013 7:48 PM, Tim Peierls wrote: > > No objection, but now it makes me wonder: How do you get the > effect of > toMultimap(T->K, T->V)? In other words, how would you get a Map Collection> from a Stream given T->K and T->V mappings? > > --tim > > On Tue, Apr 9, 2013 at 7:33 PM, Brian Goetz > > __>> wrote: > > I'm good with #3. Any objections? > > > On 4/9/2013 7:28 PM, Sam Pullara wrote: > > I like version 3 as well. > > Sam > > On Apr 9, 2013, at 2:29 PM, Brian Goetz > > __>> wrote: > > Currently we have: > > Collector> > toMap(Function mapper) > > and > > > > Collector > toMap(Function mapper, > Supplier mapSupplier, > BinaryOperator mergeFunction) > > (plus concurrent versions of both of these.) The > former is > just sugar for: > > toMap(mapper, HashMap::new, throwingMerger()) > > (We have predefined merge functions for > throw-on-duplicates, > first-wins, and last-wins, called throwingMerger, > firstWinsMerger, and lastWinsMerger.) > > As has been noted, we do not yet serve the use case of > creating a map where the stream elements are the > values of > the map instead of the keys of the map. Options for > addressing this are: > > 1. Leave toMap as is, add toIndexedMap (or toKeyedMap) > variants. > > 2. Leave toMap as is, add a two-function version > of toMap: > > > Collector> > toMap(Function keyMapper, > Function valueMapper) > > in which case the regular toMap becomes sugar for > > toMap(Function.identity(), mapper) > > 3. Get rid of the current form of toMap, and just > have the > two-function form as in (2). > > 4. Break free of the toMap naming (recall that until > recently this was called mappedTo, and prior to that, > joiningWith), and have two versions: mappedTo and > mappedFrom. This is explicit, but also doesn't > address the > use case where both key and value are functions of the > stream elements. > > Others? > > > > From paul.sandoz at oracle.com Wed Apr 10 02:35:33 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 10 Apr 2013 11:35:33 +0200 Subject: Possible groupingBy simplification? In-Reply-To: References: <5164854A.3000006@oracle.com> Message-ID: On Apr 9, 2013, at 11:56 PM, Joe Bowbeer wrote: > I like the most popular form. In fact, I think it's the only one that I've > used. > > The argument that users will gain by removing their most common form seems > kind of far-fetched. > If each method in Collectors does just one conceptual thing we can concisely express in documentation it is easier to remember and therefore easier to read the code, easier to find in documentation be it using the IDE or otherwise. Thus to me that suggests removing conceptual variants or renaming them. If the list variants were called say groupingByToList that would ensure the "one conceptual thing": classifies elements by key, and collects elements associated with that key to a list. But i suspect we might not require those methods if the leap of stream.collector(toList()) can be grasped. The same applies to toMap. I think it is easier to understand/read if it does just one conceptual thing: elements are keys, elements are mapped to values, conflicting keys result in an exception. If that does not fit ones requirements use groupingBy. Paul. > In my experience, I do a ctrl-space and look for my target return type on > the right-hand-side of the IDE popup, and then I try to fill in the missing > information, such as parameters. In this case, having to provide toList() > would probably be a stumbling block for me, as the IDE is not as good when > it comes to suggesting expressions for parameters. > > I sort of like the symmetry with collect(toList()) but not enough to make > up for the loss. > > > > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz wrote: > >> Paul suggested the following possible simplification for groupingBy. It >> is somewhat counterintuitive at first glance, in that it removes the most >> commonly used form (!), but might make things easier to grasp in the long >> run (aided by good docs.) >> >> Recall we currently have four forms of groupingBy: >> >> // classifier only -- maps keys to list of matching elements >> Collector>> >> groupingBy(Function classifier) >> >> // Like above, but with explicit map ctor >> >> >> Collector >> groupingBy(Function classifier, >> Supplier mapFactory) >> >> // basic cascaded form >> Collector> >> groupingBy(Function classifier, >> Collector downstream) >> >> // cascaded form with explicit ctor >> > >> Collector >> groupingBy(Function classifier, >> Supplier mapFactory, >> Collector downstream) >> >> Plus four corresponding forms for groupingByConcurrent. >> >> The first form is likely to be the most common, as it is the traditional >> "group by". It is equivalent to: >> >> groupingBy(classifier, toList()); >> >> The proposal is: Drop the first two forms. Just as users can learn that >> to collect elements into a list, you do: >> >> collect(toList()) >> >> people can learn that to do the simple form of groupBy, you can do: >> >> collect(groupingBy(f, toList()); >> >> Which also reads perfectly well. >> >> By cutting the number of forms in half, it helps users to realize that >> groupingBy does just one thing -- classifies elements by key, and collects >> elements associated with that key. Obviously the docs for groupingBy can >> show examples of the simple grouping as well as more sophisticated >> groupings. >> >> From joe.bowbeer at gmail.com Wed Apr 10 09:37:48 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Wed, 10 Apr 2013 09:37:48 -0700 Subject: Possible groupingBy simplification? In-Reply-To: References: <5164854A.3000006@oracle.com> Message-ID: For consistency with minBy and friends, all the 'By' methods should take a single argument: f. Hence grouping(f). No-arg and one-arg forms are the easiest to use and maintain. Just the additional comma, and which pair of parens contains it, is a significant burden. The most readable forms of collect that have an explicit toList() would be of the form: collect(grouping(f)).toList(); or maybe collect(toList(), groupingBy(f)); Joe On Apr 10, 2013 2:35 AM, "Paul Sandoz" wrote: > > On Apr 9, 2013, at 11:56 PM, Joe Bowbeer wrote: > > > I like the most popular form. In fact, I think it's the only one that > I've > > used. > > > > The argument that users will gain by removing their most common form > seems > > kind of far-fetched. > > > > If each method in Collectors does just one conceptual thing we can > concisely express in documentation it is easier to remember and therefore > easier to read the code, easier to find in documentation be it using the > IDE or otherwise. Thus to me that suggests removing conceptual variants or > renaming them. > > If the list variants were called say groupingByToList that would ensure > the "one conceptual thing": classifies elements by key, and collects > elements associated with that key to a list. But i suspect we might not > require those methods if the leap of stream.collector(toList()) can be > grasped. > > The same applies to toMap. I think it is easier to understand/read if it > does just one conceptual thing: elements are keys, elements are mapped to > values, conflicting keys result in an exception. If that does not fit ones > requirements use groupingBy. > > Paul. > > > In my experience, I do a ctrl-space and look for my target return type on > > the right-hand-side of the IDE popup, and then I try to fill in the > missing > > information, such as parameters. In this case, having to provide > toList() > > would probably be a stumbling block for me, as the IDE is not as good > when > > it comes to suggesting expressions for parameters. > > > > I sort of like the symmetry with collect(toList()) but not enough to make > > up for the loss. > > > > > > > > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz > wrote: > > > >> Paul suggested the following possible simplification for groupingBy. It > >> is somewhat counterintuitive at first glance, in that it removes the > most > >> commonly used form (!), but might make things easier to grasp in the > long > >> run (aided by good docs.) > >> > >> Recall we currently have four forms of groupingBy: > >> > >> // classifier only -- maps keys to list of matching elements > >> Collector>> > >> groupingBy(Function classifier) > >> > >> // Like above, but with explicit map ctor > >> >> > >> Collector > >> groupingBy(Function classifier, > >> Supplier mapFactory) > >> > >> // basic cascaded form > >> Collector> > >> groupingBy(Function classifier, > >> Collector downstream) > >> > >> // cascaded form with explicit ctor > >> > > >> Collector > >> groupingBy(Function classifier, > >> Supplier mapFactory, > >> Collector downstream) > >> > >> Plus four corresponding forms for groupingByConcurrent. > >> > >> The first form is likely to be the most common, as it is the traditional > >> "group by". It is equivalent to: > >> > >> groupingBy(classifier, toList()); > >> > >> The proposal is: Drop the first two forms. Just as users can learn that > >> to collect elements into a list, you do: > >> > >> collect(toList()) > >> > >> people can learn that to do the simple form of groupBy, you can do: > >> > >> collect(groupingBy(f, toList()); > >> > >> Which also reads perfectly well. > >> > >> By cutting the number of forms in half, it helps users to realize that > >> groupingBy does just one thing -- classifies elements by key, and > collects > >> elements associated with that key. Obviously the docs for groupingBy > can > >> show examples of the simple grouping as well as more sophisticated > >> groupings. > >> > >> > > From joe.bowbeer at gmail.com Wed Apr 10 09:42:28 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Wed, 10 Apr 2013 09:42:28 -0700 Subject: Possible groupingBy simplification? In-Reply-To: References: <5164854A.3000006@oracle.com> Message-ID: Correction: All the grouping(f) should be groupingBy(f) On Apr 10, 2013 9:37 AM, "Joe Bowbeer" wrote: > For consistency with minBy and friends, all the 'By' methods should take a > single argument: f. Hence grouping(f). > > No-arg and one-arg forms are the easiest to use and maintain. Just the > additional comma, and which pair of parens contains it, is a significant > burden. > > The most readable forms of collect that have an explicit toList() would be > of the form: > > collect(grouping(f)).toList(); > > or maybe > > collect(toList(), groupingBy(f)); > > Joe > On Apr 10, 2013 2:35 AM, "Paul Sandoz" wrote: > >> >> On Apr 9, 2013, at 11:56 PM, Joe Bowbeer wrote: >> >> > I like the most popular form. In fact, I think it's the only one that >> I've >> > used. >> > >> > The argument that users will gain by removing their most common form >> seems >> > kind of far-fetched. >> > >> >> If each method in Collectors does just one conceptual thing we can >> concisely express in documentation it is easier to remember and therefore >> easier to read the code, easier to find in documentation be it using the >> IDE or otherwise. Thus to me that suggests removing conceptual variants or >> renaming them. >> >> If the list variants were called say groupingByToList that would ensure >> the "one conceptual thing": classifies elements by key, and collects >> elements associated with that key to a list. But i suspect we might not >> require those methods if the leap of stream.collector(toList()) can be >> grasped. >> >> The same applies to toMap. I think it is easier to understand/read if it >> does just one conceptual thing: elements are keys, elements are mapped to >> values, conflicting keys result in an exception. If that does not fit ones >> requirements use groupingBy. >> >> Paul. >> >> > In my experience, I do a ctrl-space and look for my target return type >> on >> > the right-hand-side of the IDE popup, and then I try to fill in the >> missing >> > information, such as parameters. In this case, having to provide >> toList() >> > would probably be a stumbling block for me, as the IDE is not as good >> when >> > it comes to suggesting expressions for parameters. >> > >> > I sort of like the symmetry with collect(toList()) but not enough to >> make >> > up for the loss. >> > >> > >> > >> > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz >> wrote: >> > >> >> Paul suggested the following possible simplification for groupingBy. >> It >> >> is somewhat counterintuitive at first glance, in that it removes the >> most >> >> commonly used form (!), but might make things easier to grasp in the >> long >> >> run (aided by good docs.) >> >> >> >> Recall we currently have four forms of groupingBy: >> >> >> >> // classifier only -- maps keys to list of matching elements >> >> Collector>> >> >> groupingBy(Function classifier) >> >> >> >> // Like above, but with explicit map ctor >> >> >> >> >> Collector >> >> groupingBy(Function classifier, >> >> Supplier mapFactory) >> >> >> >> // basic cascaded form >> >> Collector> >> >> groupingBy(Function classifier, >> >> Collector downstream) >> >> >> >> // cascaded form with explicit ctor >> >> > >> >> Collector >> >> groupingBy(Function classifier, >> >> Supplier mapFactory, >> >> Collector downstream) >> >> >> >> Plus four corresponding forms for groupingByConcurrent. >> >> >> >> The first form is likely to be the most common, as it is the >> traditional >> >> "group by". It is equivalent to: >> >> >> >> groupingBy(classifier, toList()); >> >> >> >> The proposal is: Drop the first two forms. Just as users can learn >> that >> >> to collect elements into a list, you do: >> >> >> >> collect(toList()) >> >> >> >> people can learn that to do the simple form of groupBy, you can do: >> >> >> >> collect(groupingBy(f, toList()); >> >> >> >> Which also reads perfectly well. >> >> >> >> By cutting the number of forms in half, it helps users to realize that >> >> groupingBy does just one thing -- classifies elements by key, and >> collects >> >> elements associated with that key. Obviously the docs for groupingBy >> can >> >> show examples of the simple grouping as well as more sophisticated >> >> groupings. >> >> >> >> >> >> From forax at univ-mlv.fr Wed Apr 10 10:10:25 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 10 Apr 2013 19:10:25 +0200 Subject: Possible groupingBy simplification? In-Reply-To: References: <5164854A.3000006@oracle.com> Message-ID: <51659D01.2020606@univ-mlv.fr> Joe, collect(toList(), groupingBy(f)); => how do you express the fact that you may want to group in cascade ? collect(groupingBy(f)).toList() => what is the resulting type of collect(groupingBy(f)) ? is it a super-type of Stream ? Brian, I'm fine with the proposed changes. R?mi On 04/10/2013 06:42 PM, Joe Bowbeer wrote: > > Correction: All the grouping(f) should be groupingBy(f) > > On Apr 10, 2013 9:37 AM, "Joe Bowbeer" > wrote: > > For consistency with minBy and friends, all the 'By' methods > should take a single argument: f. Hence grouping(f). > > No-arg and one-arg forms are the easiest to use and maintain. Just > the additional comma, and which pair of parens contains it, is a > significant burden. > > The most readable forms of collect that have an explicit toList() > would be of the form: > > collect(grouping(f)).toList(); > > or maybe > > collect(toList(), groupingBy(f)); > > Joe > > On Apr 10, 2013 2:35 AM, "Paul Sandoz" > wrote: > > > On Apr 9, 2013, at 11:56 PM, Joe Bowbeer > > wrote: > > > I like the most popular form. In fact, I think it's the > only one that I've > > used. > > > > The argument that users will gain by removing their most > common form seems > > kind of far-fetched. > > > > If each method in Collectors does just one conceptual thing we > can concisely express in documentation it is easier to > remember and therefore easier to read the code, easier to find > in documentation be it using the IDE or otherwise. Thus to me > that suggests removing conceptual variants or renaming them. > > If the list variants were called say groupingByToList that > would ensure the "one conceptual thing": classifies elements > by key, and collects elements associated with that key to a > list. But i suspect we might not require those methods if the > leap of stream.collector(toList()) can be grasped. > > The same applies to toMap. I think it is easier to > understand/read if it does just one conceptual thing: elements > are keys, elements are mapped to values, conflicting keys > result in an exception. If that does not fit ones requirements > use groupingBy. > > Paul. > > > In my experience, I do a ctrl-space and look for my target > return type on > > the right-hand-side of the IDE popup, and then I try to fill > in the missing > > information, such as parameters. In this case, having to > provide toList() > > would probably be a stumbling block for me, as the IDE is > not as good when > > it comes to suggesting expressions for parameters. > > > > I sort of like the symmetry with collect(toList()) but not > enough to make > > up for the loss. > > > > > > > > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz > > wrote: > > > >> Paul suggested the following possible simplification for > groupingBy. It > >> is somewhat counterintuitive at first glance, in that it > removes the most > >> commonly used form (!), but might make things easier to > grasp in the long > >> run (aided by good docs.) > >> > >> Recall we currently have four forms of groupingBy: > >> > >> // classifier only -- maps keys to list of matching elements > >> Collector>> > >> groupingBy(Function classifier) > >> > >> // Like above, but with explicit map ctor > >> >> > >> Collector > >> groupingBy(Function classifier, > >> Supplier mapFactory) > >> > >> // basic cascaded form > >> Collector> > >> groupingBy(Function classifier, > >> Collector downstream) > >> > >> // cascaded form with explicit ctor > >> > > >> Collector > >> groupingBy(Function classifier, > >> Supplier mapFactory, > >> Collector downstream) > >> > >> Plus four corresponding forms for groupingByConcurrent. > >> > >> The first form is likely to be the most common, as it is > the traditional > >> "group by". It is equivalent to: > >> > >> groupingBy(classifier, toList()); > >> > >> The proposal is: Drop the first two forms. Just as users > can learn that > >> to collect elements into a list, you do: > >> > >> collect(toList()) > >> > >> people can learn that to do the simple form of groupBy, you > can do: > >> > >> collect(groupingBy(f, toList()); > >> > >> Which also reads perfectly well. > >> > >> By cutting the number of forms in half, it helps users to > realize that > >> groupingBy does just one thing -- classifies elements by > key, and collects > >> elements associated with that key. Obviously the docs for > groupingBy can > >> show examples of the simple grouping as well as more > sophisticated > >> groupings. > >> > >> > From brian.goetz at oracle.com Wed Apr 10 11:11:19 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 10 Apr 2013 14:11:19 -0400 Subject: Possible groupingBy simplification? In-Reply-To: <51659D01.2020606@univ-mlv.fr> References: <5164854A.3000006@oracle.com> <51659D01.2020606@univ-mlv.fr> Message-ID: <5165AB47.5050408@oracle.com> After staring at groupingBy and toMap for a while, I think there's a nice middle ground which should address the key use cases while reducing a little bit of the "which one do I use": groupingBy(f) groupingBy(f, downstreamCollector) groupingBy(f, mapSupplier, downstreamCollector) toMap(keyFn, valFn) toMap(keyFn, valFn, mergeFn) toMap(keyFn, valFn, mergeFn, mapSupplier) This cuts variants of each from 4 to 3, but more importantly, orders them into a nice telescoping set. Those wanting the groupingBy(f, mapSUpplier) version should be able to figure out easily (with aid from doc) that they can use groupingBy(f, mapSUpplier, toList()). On 4/10/2013 1:10 PM, Remi Forax wrote: > Joe, > collect(toList(), groupingBy(f)); > => how do you express the fact that you may want to group in cascade ? > > collect(groupingBy(f)).toList() > => what is the resulting type of collect(groupingBy(f)) ? > is it a super-type of Stream ? > > Brian, > I'm fine with the proposed changes. > > R?mi > > On 04/10/2013 06:42 PM, Joe Bowbeer wrote: >> >> Correction: All the grouping(f) should be groupingBy(f) >> >> On Apr 10, 2013 9:37 AM, "Joe Bowbeer" > > wrote: >> >> For consistency with minBy and friends, all the 'By' methods >> should take a single argument: f. Hence grouping(f). >> >> No-arg and one-arg forms are the easiest to use and maintain. Just >> the additional comma, and which pair of parens contains it, is a >> significant burden. >> >> The most readable forms of collect that have an explicit toList() >> would be of the form: >> >> collect(grouping(f)).toList(); >> >> or maybe >> >> collect(toList(), groupingBy(f)); >> >> Joe >> >> On Apr 10, 2013 2:35 AM, "Paul Sandoz" > > wrote: >> >> >> On Apr 9, 2013, at 11:56 PM, Joe Bowbeer >> > wrote: >> >> > I like the most popular form. In fact, I think it's the >> only one that I've >> > used. >> > >> > The argument that users will gain by removing their most >> common form seems >> > kind of far-fetched. >> > >> >> If each method in Collectors does just one conceptual thing we >> can concisely express in documentation it is easier to >> remember and therefore easier to read the code, easier to find >> in documentation be it using the IDE or otherwise. Thus to me >> that suggests removing conceptual variants or renaming them. >> >> If the list variants were called say groupingByToList that >> would ensure the "one conceptual thing": classifies elements >> by key, and collects elements associated with that key to a >> list. But i suspect we might not require those methods if the >> leap of stream.collector(toList()) can be grasped. >> >> The same applies to toMap. I think it is easier to >> understand/read if it does just one conceptual thing: elements >> are keys, elements are mapped to values, conflicting keys >> result in an exception. If that does not fit ones requirements >> use groupingBy. >> >> Paul. >> >> > In my experience, I do a ctrl-space and look for my target >> return type on >> > the right-hand-side of the IDE popup, and then I try to fill >> in the missing >> > information, such as parameters. In this case, having to >> provide toList() >> > would probably be a stumbling block for me, as the IDE is >> not as good when >> > it comes to suggesting expressions for parameters. >> > >> > I sort of like the symmetry with collect(toList()) but not >> enough to make >> > up for the loss. >> > >> > >> > >> > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz >> > wrote: >> > >> >> Paul suggested the following possible simplification for >> groupingBy. It >> >> is somewhat counterintuitive at first glance, in that it >> removes the most >> >> commonly used form (!), but might make things easier to >> grasp in the long >> >> run (aided by good docs.) >> >> >> >> Recall we currently have four forms of groupingBy: >> >> >> >> // classifier only -- maps keys to list of matching >> elements >> >> Collector>> >> >> groupingBy(Function classifier) >> >> >> >> // Like above, but with explicit map ctor >> >> >> >> >> Collector >> >> groupingBy(Function classifier, >> >> Supplier mapFactory) >> >> >> >> // basic cascaded form >> >> Collector> >> >> groupingBy(Function classifier, >> >> Collector downstream) >> >> >> >> // cascaded form with explicit ctor >> >> > >> >> Collector >> >> groupingBy(Function classifier, >> >> Supplier mapFactory, >> >> Collector downstream) >> >> >> >> Plus four corresponding forms for groupingByConcurrent. >> >> >> >> The first form is likely to be the most common, as it is >> the traditional >> >> "group by". It is equivalent to: >> >> >> >> groupingBy(classifier, toList()); >> >> >> >> The proposal is: Drop the first two forms. Just as users >> can learn that >> >> to collect elements into a list, you do: >> >> >> >> collect(toList()) >> >> >> >> people can learn that to do the simple form of groupBy, you >> can do: >> >> >> >> collect(groupingBy(f, toList()); >> >> >> >> Which also reads perfectly well. >> >> >> >> By cutting the number of forms in half, it helps users to >> realize that >> >> groupingBy does just one thing -- classifies elements by >> key, and collects >> >> elements associated with that key. Obviously the docs for >> groupingBy can >> >> show examples of the simple grouping as well as more >> sophisticated >> >> groupings. >> >> >> >> >> > From joe.bowbeer at gmail.com Wed Apr 10 12:45:34 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Wed, 10 Apr 2013 12:45:34 -0700 Subject: Possible groupingBy simplification? In-Reply-To: <5165AB47.5050408@oracle.com> References: <5164854A.3000006@oracle.com> <51659D01.2020606@univ-mlv.fr> <5165AB47.5050408@oracle.com> Message-ID: Looks good. I like the retention of the simple forms, and the telescopes. On Apr 10, 2013 11:11 AM, "Brian Goetz" wrote: > After staring at groupingBy and toMap for a while, I think there's a nice > middle ground which should address the key use cases while reducing a > little bit of the "which one do I use": > > groupingBy(f) > groupingBy(f, downstreamCollector) > groupingBy(f, mapSupplier, downstreamCollector) > > toMap(keyFn, valFn) > toMap(keyFn, valFn, mergeFn) > toMap(keyFn, valFn, mergeFn, mapSupplier) > > This cuts variants of each from 4 to 3, but more importantly, orders them > into a nice telescoping set. > > Those wanting the groupingBy(f, mapSUpplier) version should be able to > figure out easily (with aid from doc) that they can use groupingBy(f, > mapSUpplier, toList()). > > On 4/10/2013 1:10 PM, Remi Forax wrote: > >> Joe, >> collect(toList(), groupingBy(f)); >> => how do you express the fact that you may want to group in cascade ? >> >> collect(groupingBy(f)).toList(**) >> => what is the resulting type of collect(groupingBy(f)) ? >> is it a super-type of Stream ? >> >> Brian, >> I'm fine with the proposed changes. >> >> R?mi >> >> On 04/10/2013 06:42 PM, Joe Bowbeer wrote: >> >>> >>> Correction: All the grouping(f) should be groupingBy(f) >>> >>> On Apr 10, 2013 9:37 AM, "Joe Bowbeer" >> **> wrote: >>> >>> For consistency with minBy and friends, all the 'By' methods >>> should take a single argument: f. Hence grouping(f). >>> >>> No-arg and one-arg forms are the easiest to use and maintain. Just >>> the additional comma, and which pair of parens contains it, is a >>> significant burden. >>> >>> The most readable forms of collect that have an explicit toList() >>> would be of the form: >>> >>> collect(grouping(f)).toList(); >>> >>> or maybe >>> >>> collect(toList(), groupingBy(f)); >>> >>> Joe >>> >>> On Apr 10, 2013 2:35 AM, "Paul Sandoz" >> > wrote: >>> >>> >>> On Apr 9, 2013, at 11:56 PM, Joe Bowbeer >>> **> wrote: >>> >>> > I like the most popular form. In fact, I think it's the >>> only one that I've >>> > used. >>> > >>> > The argument that users will gain by removing their most >>> common form seems >>> > kind of far-fetched. >>> > >>> >>> If each method in Collectors does just one conceptual thing we >>> can concisely express in documentation it is easier to >>> remember and therefore easier to read the code, easier to find >>> in documentation be it using the IDE or otherwise. Thus to me >>> that suggests removing conceptual variants or renaming them. >>> >>> If the list variants were called say groupingByToList that >>> would ensure the "one conceptual thing": classifies elements >>> by key, and collects elements associated with that key to a >>> list. But i suspect we might not require those methods if the >>> leap of stream.collector(toList()) can be grasped. >>> >>> The same applies to toMap. I think it is easier to >>> understand/read if it does just one conceptual thing: elements >>> are keys, elements are mapped to values, conflicting keys >>> result in an exception. If that does not fit ones requirements >>> use groupingBy. >>> >>> Paul. >>> >>> > In my experience, I do a ctrl-space and look for my target >>> return type on >>> > the right-hand-side of the IDE popup, and then I try to fill >>> in the missing >>> > information, such as parameters. In this case, having to >>> provide toList() >>> > would probably be a stumbling block for me, as the IDE is >>> not as good when >>> > it comes to suggesting expressions for parameters. >>> > >>> > I sort of like the symmetry with collect(toList()) but not >>> enough to make >>> > up for the loss. >>> > >>> > >>> > >>> > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz >>> > >>> wrote: >>> > >>> >> Paul suggested the following possible simplification for >>> groupingBy. It >>> >> is somewhat counterintuitive at first glance, in that it >>> removes the most >>> >> commonly used form (!), but might make things easier to >>> grasp in the long >>> >> run (aided by good docs.) >>> >> >>> >> Recall we currently have four forms of groupingBy: >>> >> >>> >> // classifier only -- maps keys to list of matching >>> elements >>> >> Collector>> >>> >> groupingBy(Function classifier) >>> >> >>> >> // Like above, but with explicit map ctor >>> >> >> >>> >> Collector >>> >> groupingBy(Function classifier, >>> >> Supplier mapFactory) >>> >> >>> >> // basic cascaded form >>> >> Collector> >>> >> groupingBy(Function classifier, >>> >> Collector downstream) >>> >> >>> >> // cascaded form with explicit ctor >>> >> > >>> >> Collector >>> >> groupingBy(Function classifier, >>> >> Supplier mapFactory, >>> >> Collector downstream) >>> >> >>> >> Plus four corresponding forms for groupingByConcurrent. >>> >> >>> >> The first form is likely to be the most common, as it is >>> the traditional >>> >> "group by". It is equivalent to: >>> >> >>> >> groupingBy(classifier, toList()); >>> >> >>> >> The proposal is: Drop the first two forms. Just as users >>> can learn that >>> >> to collect elements into a list, you do: >>> >> >>> >> collect(toList()) >>> >> >>> >> people can learn that to do the simple form of groupBy, you >>> can do: >>> >> >>> >> collect(groupingBy(f, toList()); >>> >> >>> >> Which also reads perfectly well. >>> >> >>> >> By cutting the number of forms in half, it helps users to >>> realize that >>> >> groupingBy does just one thing -- classifies elements by >>> key, and collects >>> >> elements associated with that key. Obviously the docs for >>> groupingBy can >>> >> show examples of the simple grouping as well as more >>> sophisticated >>> >> groupings. >>> >> >>> >> >>> >>> >> From tim at peierls.net Wed Apr 10 13:00:19 2013 From: tim at peierls.net (Tim Peierls) Date: Wed, 10 Apr 2013 16:00:19 -0400 Subject: Possible groupingBy simplification? In-Reply-To: References: <5164854A.3000006@oracle.com> <51659D01.2020606@univ-mlv.fr> <5165AB47.5050408@oracle.com> Message-ID: Agreed. What mergeFn is used in two-arg toMap? --tim On Wed, Apr 10, 2013 at 3:45 PM, Joe Bowbeer wrote: > Looks good. I like the retention of the simple forms, and the telescopes. > On Apr 10, 2013 11:11 AM, "Brian Goetz" wrote: > >> After staring at groupingBy and toMap for a while, I think there's a nice >> middle ground which should address the key use cases while reducing a >> little bit of the "which one do I use": >> >> groupingBy(f) >> groupingBy(f, downstreamCollector) >> groupingBy(f, mapSupplier, downstreamCollector) >> >> toMap(keyFn, valFn) >> toMap(keyFn, valFn, mergeFn) >> toMap(keyFn, valFn, mergeFn, mapSupplier) >> >> This cuts variants of each from 4 to 3, but more importantly, orders them >> into a nice telescoping set. >> >> Those wanting the groupingBy(f, mapSUpplier) version should be able to >> figure out easily (with aid from doc) that they can use groupingBy(f, >> mapSUpplier, toList()). >> >> On 4/10/2013 1:10 PM, Remi Forax wrote: >> >>> Joe, >>> collect(toList(), groupingBy(f)); >>> => how do you express the fact that you may want to group in cascade ? >>> >>> collect(groupingBy(f)).toList(**) >>> => what is the resulting type of collect(groupingBy(f)) ? >>> is it a super-type of Stream ? >>> >>> Brian, >>> I'm fine with the proposed changes. >>> >>> R?mi >>> >>> On 04/10/2013 06:42 PM, Joe Bowbeer wrote: >>> >>>> >>>> Correction: All the grouping(f) should be groupingBy(f) >>>> >>>> On Apr 10, 2013 9:37 AM, "Joe Bowbeer" >>> **> wrote: >>>> >>>> For consistency with minBy and friends, all the 'By' methods >>>> should take a single argument: f. Hence grouping(f). >>>> >>>> No-arg and one-arg forms are the easiest to use and maintain. Just >>>> the additional comma, and which pair of parens contains it, is a >>>> significant burden. >>>> >>>> The most readable forms of collect that have an explicit toList() >>>> would be of the form: >>>> >>>> collect(grouping(f)).toList(); >>>> >>>> or maybe >>>> >>>> collect(toList(), groupingBy(f)); >>>> >>>> Joe >>>> >>>> On Apr 10, 2013 2:35 AM, "Paul Sandoz" >>> > wrote: >>>> >>>> >>>> On Apr 9, 2013, at 11:56 PM, Joe Bowbeer >>>> **> wrote: >>>> >>>> > I like the most popular form. In fact, I think it's the >>>> only one that I've >>>> > used. >>>> > >>>> > The argument that users will gain by removing their most >>>> common form seems >>>> > kind of far-fetched. >>>> > >>>> >>>> If each method in Collectors does just one conceptual thing we >>>> can concisely express in documentation it is easier to >>>> remember and therefore easier to read the code, easier to find >>>> in documentation be it using the IDE or otherwise. Thus to me >>>> that suggests removing conceptual variants or renaming them. >>>> >>>> If the list variants were called say groupingByToList that >>>> would ensure the "one conceptual thing": classifies elements >>>> by key, and collects elements associated with that key to a >>>> list. But i suspect we might not require those methods if the >>>> leap of stream.collector(toList()) can be grasped. >>>> >>>> The same applies to toMap. I think it is easier to >>>> understand/read if it does just one conceptual thing: elements >>>> are keys, elements are mapped to values, conflicting keys >>>> result in an exception. If that does not fit ones requirements >>>> use groupingBy. >>>> >>>> Paul. >>>> >>>> > In my experience, I do a ctrl-space and look for my target >>>> return type on >>>> > the right-hand-side of the IDE popup, and then I try to fill >>>> in the missing >>>> > information, such as parameters. In this case, having to >>>> provide toList() >>>> > would probably be a stumbling block for me, as the IDE is >>>> not as good when >>>> > it comes to suggesting expressions for parameters. >>>> > >>>> > I sort of like the symmetry with collect(toList()) but not >>>> enough to make >>>> > up for the loss. >>>> > >>>> > >>>> > >>>> > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz >>>> > >>>> wrote: >>>> > >>>> >> Paul suggested the following possible simplification for >>>> groupingBy. It >>>> >> is somewhat counterintuitive at first glance, in that it >>>> removes the most >>>> >> commonly used form (!), but might make things easier to >>>> grasp in the long >>>> >> run (aided by good docs.) >>>> >> >>>> >> Recall we currently have four forms of groupingBy: >>>> >> >>>> >> // classifier only -- maps keys to list of matching >>>> elements >>>> >> Collector>> >>>> >> groupingBy(Function classifier) >>>> >> >>>> >> // Like above, but with explicit map ctor >>>> >> >> >>>> >> Collector >>>> >> groupingBy(Function classifier, >>>> >> Supplier mapFactory) >>>> >> >>>> >> // basic cascaded form >>>> >> Collector> >>>> >> groupingBy(Function classifier, >>>> >> Collector downstream) >>>> >> >>>> >> // cascaded form with explicit ctor >>>> >> > >>>> >> Collector >>>> >> groupingBy(Function classifier, >>>> >> Supplier mapFactory, >>>> >> Collector downstream) >>>> >> >>>> >> Plus four corresponding forms for groupingByConcurrent. >>>> >> >>>> >> The first form is likely to be the most common, as it is >>>> the traditional >>>> >> "group by". It is equivalent to: >>>> >> >>>> >> groupingBy(classifier, toList()); >>>> >> >>>> >> The proposal is: Drop the first two forms. Just as users >>>> can learn that >>>> >> to collect elements into a list, you do: >>>> >> >>>> >> collect(toList()) >>>> >> >>>> >> people can learn that to do the simple form of groupBy, you >>>> can do: >>>> >> >>>> >> collect(groupingBy(f, toList()); >>>> >> >>>> >> Which also reads perfectly well. >>>> >> >>>> >> By cutting the number of forms in half, it helps users to >>>> realize that >>>> >> groupingBy does just one thing -- classifies elements by >>>> key, and collects >>>> >> elements associated with that key. Obviously the docs for >>>> groupingBy can >>>> >> show examples of the simple grouping as well as more >>>> sophisticated >>>> >> groupings. >>>> >> >>>> >> >>>> >>>> >>> From spullara at gmail.com Wed Apr 10 13:38:21 2013 From: spullara at gmail.com (Sam Pullara) Date: Wed, 10 Apr 2013 13:38:21 -0700 Subject: Possible groupingBy simplification? In-Reply-To: References: <5164854A.3000006@oracle.com> <51659D01.2020606@univ-mlv.fr> <5165AB47.5050408@oracle.com> Message-ID: Don't know what it is, but I'd like it to throw an exception on clobber. My assumption is that in that case you know the keys are unique. Sam On Apr 10, 2013, at 1:00 PM, Tim Peierls wrote: > Agreed. > > What mergeFn is used in two-arg toMap? > > --tim > > On Wed, Apr 10, 2013 at 3:45 PM, Joe Bowbeer wrote: > Looks good. I like the retention of the simple forms, and the telescopes. > > On Apr 10, 2013 11:11 AM, "Brian Goetz" wrote: > After staring at groupingBy and toMap for a while, I think there's a nice middle ground which should address the key use cases while reducing a little bit of the "which one do I use": > > groupingBy(f) > groupingBy(f, downstreamCollector) > groupingBy(f, mapSupplier, downstreamCollector) > > toMap(keyFn, valFn) > toMap(keyFn, valFn, mergeFn) > toMap(keyFn, valFn, mergeFn, mapSupplier) > > This cuts variants of each from 4 to 3, but more importantly, orders them into a nice telescoping set. > > Those wanting the groupingBy(f, mapSUpplier) version should be able to figure out easily (with aid from doc) that they can use groupingBy(f, mapSUpplier, toList()). > > On 4/10/2013 1:10 PM, Remi Forax wrote: > Joe, > collect(toList(), groupingBy(f)); > => how do you express the fact that you may want to group in cascade ? > > collect(groupingBy(f)).toList() > => what is the resulting type of collect(groupingBy(f)) ? > is it a super-type of Stream ? > > Brian, > I'm fine with the proposed changes. > > R?mi > > On 04/10/2013 06:42 PM, Joe Bowbeer wrote: > > Correction: All the grouping(f) should be groupingBy(f) > > On Apr 10, 2013 9:37 AM, "Joe Bowbeer" > wrote: > > For consistency with minBy and friends, all the 'By' methods > should take a single argument: f. Hence grouping(f). > > No-arg and one-arg forms are the easiest to use and maintain. Just > the additional comma, and which pair of parens contains it, is a > significant burden. > > The most readable forms of collect that have an explicit toList() > would be of the form: > > collect(grouping(f)).toList(); > > or maybe > > collect(toList(), groupingBy(f)); > > Joe > > On Apr 10, 2013 2:35 AM, "Paul Sandoz" > wrote: > > > On Apr 9, 2013, at 11:56 PM, Joe Bowbeer > > wrote: > > > I like the most popular form. In fact, I think it's the > only one that I've > > used. > > > > The argument that users will gain by removing their most > common form seems > > kind of far-fetched. > > > > If each method in Collectors does just one conceptual thing we > can concisely express in documentation it is easier to > remember and therefore easier to read the code, easier to find > in documentation be it using the IDE or otherwise. Thus to me > that suggests removing conceptual variants or renaming them. > > If the list variants were called say groupingByToList that > would ensure the "one conceptual thing": classifies elements > by key, and collects elements associated with that key to a > list. But i suspect we might not require those methods if the > leap of stream.collector(toList()) can be grasped. > > The same applies to toMap. I think it is easier to > understand/read if it does just one conceptual thing: elements > are keys, elements are mapped to values, conflicting keys > result in an exception. If that does not fit ones requirements > use groupingBy. > > Paul. > > > In my experience, I do a ctrl-space and look for my target > return type on > > the right-hand-side of the IDE popup, and then I try to fill > in the missing > > information, such as parameters. In this case, having to > provide toList() > > would probably be a stumbling block for me, as the IDE is > not as good when > > it comes to suggesting expressions for parameters. > > > > I sort of like the symmetry with collect(toList()) but not > enough to make > > up for the loss. > > > > > > > > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz > > wrote: > > > >> Paul suggested the following possible simplification for > groupingBy. It > >> is somewhat counterintuitive at first glance, in that it > removes the most > >> commonly used form (!), but might make things easier to > grasp in the long > >> run (aided by good docs.) > >> > >> Recall we currently have four forms of groupingBy: > >> > >> // classifier only -- maps keys to list of matching > elements > >> Collector>> > >> groupingBy(Function classifier) > >> > >> // Like above, but with explicit map ctor > >> >> > >> Collector > >> groupingBy(Function classifier, > >> Supplier mapFactory) > >> > >> // basic cascaded form > >> Collector> > >> groupingBy(Function classifier, > >> Collector downstream) > >> > >> // cascaded form with explicit ctor > >> > > >> Collector > >> groupingBy(Function classifier, > >> Supplier mapFactory, > >> Collector downstream) > >> > >> Plus four corresponding forms for groupingByConcurrent. > >> > >> The first form is likely to be the most common, as it is > the traditional > >> "group by". It is equivalent to: > >> > >> groupingBy(classifier, toList()); > >> > >> The proposal is: Drop the first two forms. Just as users > can learn that > >> to collect elements into a list, you do: > >> > >> collect(toList()) > >> > >> people can learn that to do the simple form of groupBy, you > can do: > >> > >> collect(groupingBy(f, toList()); > >> > >> Which also reads perfectly well. > >> > >> By cutting the number of forms in half, it helps users to > realize that > >> groupingBy does just one thing -- classifies elements by > key, and collects > >> elements associated with that key. Obviously the docs for > groupingBy can > >> show examples of the simple grouping as well as more > sophisticated > >> groupings. > >> > >> > > > From paul.sandoz at oracle.com Thu Apr 11 05:09:58 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 11 Apr 2013 14:09:58 +0200 Subject: Whither FlatMapper? In-Reply-To: References: <5161F773.6050705@oracle.com> <51631A39.30001@cs.oswego.edu> <51632319.4040704@oracle.com> <51634C6A.1080301@oracle.com> Message-ID: <6CCCD759-F544-48C8-9C79-62C316200B11@oracle.com> An initial version of StreamBuilder has been pushed: http://hg.openjdk.java.net/lambda/lambda/jdk/rev/105d2c765fae It is optimized for 0 and 1 elements (reused for singleton streams). In addition an optimization has been implemented when using forEach on the head of the stream. Those two optmizations should reduce the performance gap between the stream-based flatMap and the consumer-based flatMap. Currently StreamBuilder does not allow for reuse, easy to add that though. Paul. On Apr 9, 2013, at 1:14 AM, Sam Pullara wrote: > That seems reasonable to me. > > Sam > > On Apr 8, 2013, at 4:02 PM, Brian Goetz wrote: > >> Actually, there is an allocation-free path to get almost the Consumer-version performance with the non-consumer version, using the proposed StreamBuilder type (that also implements Spliterator and Stream, so "building" is allocation-free), and stuffing that into a ThreadLocal: >> >> ThreadLocal tl = ... >> >> ... >> >> stream.flatMap(e -> { >> StreamBuilder sb = tl.get(); >> sb.init(); >> // stuff elements into sb >> return sb.build(); // basically a no-op >> }); >> >> So I recant my earlier statement that there's no efficient way to simulate the consumer form. Its just ugly. >> >> And the above can be captured by a wrapping helper: >> >> Function> = wrapWithThreadLocalStreamBuilder( >> (T t, Consumer target) -> { /* old way */ }); >> >> So, I'm even more firmly in the "remove it" camp. >> >> On 4/8/2013 4:05 PM, Brian Goetz wrote: >>> A slight correction: if we remove the flatMap(FlatMapper), there is no >>> fluent form that is as efficient as the removed form that accepts (T, >>> Consumer), since there's no other way to get your hands on the >>> downstream Sink. (Not that this dampens my enthusiasm for removing it >>> much.) >>> >>> For the truly diffident, a middle ground does exist: remove FlatMapper >>> and its six brothers as a named SAM, and replace it with BiConsumer>> Consumer>, leaving both forms of flatMap methods in place: >>> flatMap(Function>) >>> flapMap(BiConsumer>) >>> >>> The main advantage being that the package javadoc is not polluted by >>> seven forms of FlatMapper. >>> >>> On 4/8/2013 3:27 PM, Doug Lea wrote: >>>> On 04/07/13 19:01, Sam Pullara wrote: >>>>> I'm a big fan of the current FlatMapper stuff that takes a Consumer. >>>>> Much more >>>>> efficient and straightforward when you don't have a stream or >>>>> collection to just >>>>> return. Here is some code that uses 3 of them for good effect: >>>> >>>> I think the main issue is whether, given the user reactions so far, we >>>> should insist on people using a generally better but non-obvious >>>> approach to flat-mapping. Considering that anyone *could* write their own >>>> FlatMappers layered on top of existing functionality (we could >>>> even show how to do it as a code example somewhere), I'm with >>>> Brian on this: give people the obvious forms in the API. People >>>> who are most likely to use it are the least likely to be obsessive >>>> about its performance. And when they are, they can learn about >>>> alternatives. >>>> >>>> -Doug >>>> > From mike.duigou at oracle.com Thu Apr 11 09:32:09 2013 From: mike.duigou at oracle.com (Mike Duigou) Date: Thu, 11 Apr 2013 09:32:09 -0700 Subject: Map Default Methods Message-ID: Hi Doug; I wanted to call your attention to three points in the the current ongoing review of the proposed Map default methods. - I've added an additional default getOrDefault() to ConcurrentMap which preserves the atomic behaviour of ConcurrentMap at the cost of not supporting null values in maps. - I've changed the method documentation warning regarding synchronization, atomicity, concurrency. Please ensure that it still matches your intent: *

The default implementation makes no guarantees about synchronization * or atomicity properties of this method. Any class which wishes to provide * specific synchronization, atomicity or concurrency behaviour should * override this method. - The retry behaviour of the compute(), computeIfPresent() and merge() defaults makes sense for concurrent maps but possibly not for non-concurrent maps. For non-concurrent maps the retry behaviour will mask concurrent usage errors. How do you feel about moving these defaults (along with computeIfAbsent()) to ConcurrentMap and providing implementations that generate ConcurrentModificationException for the Map defaults? Thanks, Mike From brian.goetz at oracle.com Thu Apr 11 09:49:24 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 11 Apr 2013 12:49:24 -0400 Subject: Map Default Methods In-Reply-To: References: Message-ID: <5166E994.5000603@oracle.com> If getOrDefault is going to hold up the train here, we should consider peeling it off and handle separately, since it was only added as a "while we're here" and not currently used by any of the code that this putback is blocking. On 4/11/2013 12:32 PM, Mike Duigou wrote: > Hi Doug; > > I wanted to call your attention to three points in the the current ongoing review of the proposed Map default methods. > > - I've added an additional default getOrDefault() to ConcurrentMap which preserves the atomic behaviour of ConcurrentMap at the cost of not supporting null values in maps. > > - I've changed the method documentation warning regarding synchronization, atomicity, concurrency. Please ensure that it still matches your intent: > > *

The default implementation makes no guarantees about synchronization > * or atomicity properties of this method. Any class which wishes to provide > * specific synchronization, atomicity or concurrency behaviour should > * override this method. > > - The retry behaviour of the compute(), computeIfPresent() and merge() defaults makes sense for concurrent maps but possibly not for non-concurrent maps. For non-concurrent maps the retry behaviour will mask concurrent usage errors. How do you feel about moving these defaults (along with computeIfAbsent()) to ConcurrentMap and providing implementations that generate ConcurrentModificationException for the Map defaults? > > Thanks, > > Mike > From mike.duigou at oracle.com Thu Apr 11 09:52:58 2013 From: mike.duigou at oracle.com (Mike Duigou) Date: Thu, 11 Apr 2013 09:52:58 -0700 Subject: Map Default Methods In-Reply-To: <5166E994.5000603@oracle.com> References: <5166E994.5000603@oracle.com> Message-ID: <565BCBEF-41E4-450A-9D36-B23A377CE9BD@oracle.com> I don't think any of these are blockers for the current review. We can change our answers later in future commits. Mike On Apr 11 2013, at 09:49 , Brian Goetz wrote: > If getOrDefault is going to hold up the train here, we should consider peeling it off and handle separately, since it was only added as a "while we're here" and not currently used by any of the code that this putback is blocking. > > On 4/11/2013 12:32 PM, Mike Duigou wrote: >> Hi Doug; >> >> I wanted to call your attention to three points in the the current ongoing review of the proposed Map default methods. >> >> - I've added an additional default getOrDefault() to ConcurrentMap which preserves the atomic behaviour of ConcurrentMap at the cost of not supporting null values in maps. >> >> - I've changed the method documentation warning regarding synchronization, atomicity, concurrency. Please ensure that it still matches your intent: >> >> *

The default implementation makes no guarantees about synchronization >> * or atomicity properties of this method. Any class which wishes to provide >> * specific synchronization, atomicity or concurrency behaviour should >> * override this method. >> >> - The retry behaviour of the compute(), computeIfPresent() and merge() defaults makes sense for concurrent maps but possibly not for non-concurrent maps. For non-concurrent maps the retry behaviour will mask concurrent usage errors. How do you feel about moving these defaults (along with computeIfAbsent()) to ConcurrentMap and providing implementations that generate ConcurrentModificationException for the Map defaults? >> >> Thanks, >> >> Mike >> From dl at cs.oswego.edu Thu Apr 11 10:31:19 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 11 Apr 2013 13:31:19 -0400 Subject: Map Default Methods In-Reply-To: References: Message-ID: <5166F367.70202@cs.oswego.edu> On 04/11/13 12:32, Mike Duigou wrote: > - I've added an additional default getOrDefault() to ConcurrentMap which preserves the atomic behaviour of ConcurrentMap at the cost of not supporting null values in maps. > I suppose this is OK. As mentioned in some list discussion, the unfortunate part is that ConcurrentMap does not explicitly ban null either. So all this does is push the issue one level deeper. On the other hand, all known implementations ban nulls because it would be stupid to support them -- for example putIfAbsent is useless is such cases. So the on-paper issue doesn't have any interesting impact. > - I've changed the method documentation warning regarding synchronization, atomicity, concurrency. Please ensure that it still matches your intent: > > *

The default implementation makes no guarantees about synchronization > * or atomicity properties of this method. Any class which wishes to provide > * specific synchronization, atomicity or concurrency behaviour should > * override this method. > Change to to avoid wishery: ... Any implementation providing atomicity guarantees must override this method and document its concurrency properties. -Doug From brian.goetz at oracle.com Thu Apr 11 10:51:00 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 11 Apr 2013 13:51:00 -0400 Subject: Dividing Streams.java Message-ID: <5166F804.50101@oracle.com> Joe quite correctly pointed out in the survey that Streams.java is a mix of two things for two audiences: - Utility methods for users to generate streams, like intRange() - Low level methods for library writers to generate streams from things like iterators or spliterators. Merging them in one file is confusing, because users come away with the idea that writing spliterators is something they're supposed to do, whereas in reality, if we've done our jobs, they should never even be aware that spliterators exist. So I think we should separate them into a "high level" and "low level" bag of tricks. Since today, Paul has added some new ones: - singletonStream(v) (four flavors) - builder() (four flavors) So, we have to identify appropriate homes for the two groupings, and separate them. Here's a first cut at separating them: High level: xxxRange xxxBuilder emptyXxxStream singletonXxxStream concat zip Low level: all spliterator-related stream building methods Not sure where (or even if): iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...) generate (infinite stream of independent applications of a generator, good for infinite constant and random streams, though not much else, used by impl of Random.{ints,longs,gaussians}). Others that we've talked about adding: ints(), longs() // to enable things like ints().filter(...).limit(n) indexedGenerate(i -> T) I think the high-level stuff should stay in Streams. So we need a name for the low-level stuff. (Which also then becomes the right home for "how do I turn my data sturcture into a stream" doc.) What should we call that? From tim at peierls.net Thu Apr 11 11:08:44 2013 From: tim at peierls.net (Tim Peierls) Date: Thu, 11 Apr 2013 14:08:44 -0400 Subject: Dividing Streams.java In-Reply-To: <5166F804.50101@oracle.com> References: <5166F804.50101@oracle.com> Message-ID: On Thu, Apr 11, 2013 at 1:51 PM, Brian Goetz wrote: > I think the high-level stuff should stay in Streams. So we need a name > for the low-level stuff. (Which also then becomes the right home for "how > do I turn my data sturcture into a stream" doc.) > > What should we call that? > Streams.Internal Never mind that they aren't really internal. It needs to sound like you're breaking the manufacturer's seal if you use it. And having it nested means it's not too far away, but not in your face if you're looking at Streams. --tim From joe.bowbeer at gmail.com Thu Apr 11 15:05:49 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 11 Apr 2013 15:05:49 -0700 Subject: Dividing Streams.java In-Reply-To: References: <5166F804.50101@oracle.com> Message-ID: I would hide everything that mentions Spliterator (or descendant) in its signature. I would not hide infinite stream toys such as iterate or generate. These are easy to understand and use, even if they have limited use -- which is not the case with doubleParallelStream and friends. On Thu, Apr 11, 2013 at 11:08 AM, Tim Peierls wrote: > On Thu, Apr 11, 2013 at 1:51 PM, Brian Goetz wrote: > >> I think the high-level stuff should stay in Streams. So we need a name >> for the low-level stuff. (Which also then becomes the right home for "how >> do I turn my data sturcture into a stream" doc.) >> >> What should we call that? >> > > Streams.Internal > > Never mind that they aren't really internal. It needs to sound like you're > breaking the manufacturer's seal if you use it. > > And having it nested means it's not too far away, but not in your face if > you're looking at Streams. > > --tim > > From brian.goetz at oracle.com Thu Apr 11 15:20:11 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 11 Apr 2013 18:20:11 -0400 Subject: Dividing Streams.java In-Reply-To: References: <5166F804.50101@oracle.com> Message-ID: <5167371B.9020605@oracle.com> StreamImplementors ? StreamViews ? SpliteratorToStream ? On 4/11/2013 6:05 PM, Joe Bowbeer wrote: > I would hide everything that mentions Spliterator (or descendant) in its > signature. > > I would not hide infinite stream toys such as iterate or generate. > These are easy to understand and use, even if they have limited use -- > which is not the case with doubleParallelStream and friends. > > > On Thu, Apr 11, 2013 at 11:08 AM, Tim Peierls > wrote: > > On Thu, Apr 11, 2013 at 1:51 PM, Brian Goetz > wrote: > > I think the high-level stuff should stay in Streams. So we need > a name for the low-level stuff. (Which also then becomes the > right home for "how do I turn my data sturcture into a stream" doc.) > > What should we call that? > > > Streams.Internal > > Never mind that they aren't really internal. It needs to sound like > you're breaking the manufacturer's seal if you use it. > > And having it nested means it's not too far away, but not in your > face if you're looking at Streams. > > --tim > > From dl at cs.oswego.edu Thu Apr 11 15:49:28 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 11 Apr 2013 18:49:28 -0400 Subject: Dividing Streams.java In-Reply-To: <5166F804.50101@oracle.com> References: <5166F804.50101@oracle.com> Message-ID: <51673DF8.1050301@cs.oswego.edu> On 04/11/13 13:51, Brian Goetz wrote: > Joe quite correctly pointed out in the survey that Streams.java is a mix of two > things for two audiences: > > - Utility methods for users to generate streams, like intRange() > - Low level methods for library writers to generate streams from things like > iterators or spliterators. > I'm not too tempted by this. Classes Collections and Arrays have lots of stuff and people don't seem to complain. -Doug From joe.bowbeer at gmail.com Thu Apr 11 16:01:44 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 11 Apr 2013 16:01:44 -0700 Subject: Dividing Streams.java In-Reply-To: <51673DF8.1050301@cs.oswego.edu> References: <5166F804.50101@oracle.com> <51673DF8.1050301@cs.oswego.edu> Message-ID: But I am (and represent) Joe Programmer, and I've already complained :O At the top of the list is the confusing name doubleParallelStream, which does not create two parallel streams! It's very difficult to find anything useful in there, and the ones that take Spliterator arguments are a devil to figure out how to use, which adds to Joe's frustration. Simply removing everything that references a spliterator thing cleans it up a lot. On Thu, Apr 11, 2013 at 3:49 PM, Doug Lea

wrote: > On 04/11/13 13:51, Brian Goetz wrote: > >> Joe quite correctly pointed out in the survey that Streams.java is a mix >> of two >> things for two audiences: >> >> - Utility methods for users to generate streams, like intRange() >> - Low level methods for library writers to generate streams from things >> like >> iterators or spliterators. >> >> > I'm not too tempted by this. Classes Collections and Arrays have lots > of stuff and people don't seem to complain. > > -Doug > > From tim at peierls.net Thu Apr 11 16:13:44 2013 From: tim at peierls.net (Tim Peierls) Date: Thu, 11 Apr 2013 19:13:44 -0400 Subject: Dividing Streams.java In-Reply-To: References: <5166F804.50101@oracle.com> <51673DF8.1050301@cs.oswego.edu> Message-ID: Agreed. What harm is there in parceling the Spliterator stuff off? On Thu, Apr 11, 2013 at 7:01 PM, Joe Bowbeer wrote: > But I am (and represent) Joe Programmer, and I've already complained :O > > At the top of the list is the confusing name doubleParallelStream, which > does not create two parallel streams! > > It's very difficult to find anything useful in there, and the ones that > take Spliterator arguments are a devil to figure out how to use, which adds > to Joe's frustration. > > Simply removing everything that references a spliterator thing cleans it > up a lot. > > > On Thu, Apr 11, 2013 at 3:49 PM, Doug Lea
wrote: > >> On 04/11/13 13:51, Brian Goetz wrote: >> >>> Joe quite correctly pointed out in the survey that Streams.java is a mix >>> of two >>> things for two audiences: >>> >>> - Utility methods for users to generate streams, like intRange() >>> - Low level methods for library writers to generate streams from >>> things like >>> iterators or spliterators. >>> >>> >> I'm not too tempted by this. Classes Collections and Arrays have lots >> of stuff and people don't seem to complain. >> >> -Doug >> >> > From dl at cs.oswego.edu Thu Apr 11 16:20:28 2013 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 11 Apr 2013 19:20:28 -0400 Subject: Dividing Streams.java In-Reply-To: <5166F804.50101@oracle.com> References: <5166F804.50101@oracle.com> Message-ID: <5167453C.8070102@cs.oswego.edu> On 04/11/13 13:51, Brian Goetz wrote: > What should we call that? Still not too tempted, but I don't care enough to argue. One name with precedent is StreamSupport (like j.u.c.locks.LockSupport). -Doug From tim at peierls.net Thu Apr 11 16:24:53 2013 From: tim at peierls.net (Tim Peierls) Date: Thu, 11 Apr 2013 19:24:53 -0400 Subject: Dividing Streams.java In-Reply-To: <5167453C.8070102@cs.oswego.edu> References: <5166F804.50101@oracle.com> <5167453C.8070102@cs.oswego.edu> Message-ID: I like StreamSupport. --tim On Thu, Apr 11, 2013 at 7:20 PM, Doug Lea
wrote: > On 04/11/13 13:51, Brian Goetz wrote: > >> What should we call that? >> > > Still not too tempted, but I don't care enough to argue. > > One name with precedent is StreamSupport (like j.u.c.locks.LockSupport). > > -Doug > > > > From joe.bowbeer at gmail.com Thu Apr 11 16:39:51 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 11 Apr 2013 16:39:51 -0700 Subject: Dividing Streams.java In-Reply-To: References: <5166F804.50101@oracle.com> <5167453C.8070102@cs.oswego.edu> Message-ID: I also like StreamSupport On Thu, Apr 11, 2013 at 4:24 PM, Tim Peierls wrote: > I like StreamSupport. > > --tim > > > On Thu, Apr 11, 2013 at 7:20 PM, Doug Lea
wrote: > >> On 04/11/13 13:51, Brian Goetz wrote: >> >>> What should we call that? >>> >> >> Still not too tempted, but I don't care enough to argue. >> >> One name with precedent is StreamSupport (like j.u.c.locks.LockSupport). >> >> -Doug >> >> >> >> > From brian.goetz at oracle.com Thu Apr 11 17:14:38 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 11 Apr 2013 20:14:38 -0400 Subject: Dividing Streams.java In-Reply-To: References: <5166F804.50101@oracle.com> <51673DF8.1050301@cs.oswego.edu> Message-ID: <516751EE.7090207@oracle.com> I'm with Joe on this one. Because streams are new, people are looking for the best way to get the stream they want, and often settle (incorrectly) on "I guess I have to write a spliterator." Which is not the best way to have a good experience. We have the same problem with docs. There's a whole lot of documentation (that needs to be written) for people writing spliterators, that is totally confusing and overwhelming for people who just want an integer range. On 4/11/2013 7:01 PM, Joe Bowbeer wrote: > But I am (and represent) Joe Programmer, and I've already complained :O > > At the top of the list is the confusing name doubleParallelStream, which > does not create two parallel streams! > > It's very difficult to find anything useful in there, and the ones that > take Spliterator arguments are a devil to figure out how to use, which > adds to Joe's frustration. > > Simply removing everything that references a spliterator thing cleans it > up a lot. > > > On Thu, Apr 11, 2013 at 3:49 PM, Doug Lea
> wrote: > > On 04/11/13 13:51, Brian Goetz wrote: > > Joe quite correctly pointed out in the survey that Streams.java > is a mix of two > things for two audiences: > > - Utility methods for users to generate streams, like intRange() > - Low level methods for library writers to generate streams > from things like > iterators or spliterators. > > > I'm not too tempted by this. Classes Collections and Arrays have lots > of stuff and people don't seem to complain. > > -Doug > > From brian.goetz at oracle.com Thu Apr 11 17:15:10 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 11 Apr 2013 20:15:10 -0400 Subject: Dividing Streams.java In-Reply-To: References: <5166F804.50101@oracle.com> <5167453C.8070102@cs.oswego.edu> Message-ID: <5167520E.6010206@oracle.com> Better than anything I came up with! StreamSupport it is. On 4/11/2013 7:39 PM, Joe Bowbeer wrote: > I also like StreamSupport > > > On Thu, Apr 11, 2013 at 4:24 PM, Tim Peierls > wrote: > > I like StreamSupport. > > --tim > > > On Thu, Apr 11, 2013 at 7:20 PM, Doug Lea
> wrote: > > On 04/11/13 13:51, Brian Goetz wrote: > > What should we call that? > > > Still not too tempted, but I don't care enough to argue. > > One name with precedent is StreamSupport (like > j.u.c.locks.LockSupport). > > -Doug > > > > > From brian.goetz at oracle.com Thu Apr 11 17:37:32 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 11 Apr 2013 20:37:32 -0400 Subject: Dividing Streams.java In-Reply-To: References: <5166F804.50101@oracle.com> <5167453C.8070102@cs.oswego.edu> Message-ID: <5167574C.7090902@oracle.com> Done. On 4/11/2013 7:39 PM, Joe Bowbeer wrote: > I also like StreamSupport > > > On Thu, Apr 11, 2013 at 4:24 PM, Tim Peierls > wrote: > > I like StreamSupport. > > --tim > > > On Thu, Apr 11, 2013 at 7:20 PM, Doug Lea
> wrote: > > On 04/11/13 13:51, Brian Goetz wrote: > > What should we call that? > > > Still not too tempted, but I don't care enough to argue. > > One name with precedent is StreamSupport (like > j.u.c.locks.LockSupport). > > -Doug > > > > > From brian.goetz at oracle.com Thu Apr 11 18:13:04 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 11 Apr 2013 21:13:04 -0400 Subject: Dividing Streams.java In-Reply-To: <5166F804.50101@oracle.com> References: <5166F804.50101@oracle.com> Message-ID: <51675FA0.3080800@oracle.com> > Not sure where (or even if): > iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...) > generate (infinite stream of independent applications of a generator, > good for infinite constant and random streams, though not much else, > used by impl of Random.{ints,longs,gaussians}). Anyone want to argue for narrowing or expanding this list? > Others that we've talked about adding: > ints(), longs() // to enable things like ints().filter(...).limit(n) Anyone compelled by these? I kind of like them. Do we want to add inclusive as well as half-open ranges? > indexedGenerate(i -> T) Anyone compelled by this one? From paul.sandoz at oracle.com Fri Apr 12 07:50:49 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 12 Apr 2013 16:50:49 +0200 Subject: Streams.generate: infinite or finite? Message-ID: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com> Hi, Currently Streams.generate produces an infinite stream. This is theoretically nice but splits poorly (right-balanced trees). Implementation-wise Streams.generate creates a spliterator from an iterator: public static Stream generate(Supplier s) { Objects.requireNonNull(s); InfiniteIterator iterator = s::get; return StreamSupport.stream(Spliterators.spliteratorUnknownSize( iterator, Spliterator.ORDERED | Spliterator.IMMUTABLE)); } The method is used in java.util.Random: public IntStream ints() { return Streams.generateInt(this::nextInt); } There might be a nasty surprise in store for developers that expect the randomly generated stream of int values to have the best parallel performance. We can change Streams.generate to be finite (or not know to be finite in the time allotted to do some computation) by implementing as follows: public static Stream generate(Supplier s) { return Streams.longRange(0, Long.MAX_VALUE).mapToObj(i -> s.get()); } This will yield better parallel performance because the splits are balanced. We can further change to: public static Stream generate(Supplier s) { return Streams.longs().mapToObj(i -> s.get()); } if we introduce the longs() idiom. I think we should go finite! and add Streams.longs(). Agree? or disagree? Then it is actually questionable if Streams.generate should exist at all. It does have some pedagogic value since the idiom Streams.longs().map() may not be obvious. So i would be mostly inclined to keep it for that reason. Paul. From brian.goetz at oracle.com Fri Apr 12 08:14:45 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 12 Apr 2013 11:14:45 -0400 Subject: Streams.generate: infinite or finite? In-Reply-To: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com> References: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com> Message-ID: <516824E5.6000409@oracle.com> I think this is slightly unfortunate but I think we're probably stuck doing it anyway. The theoretical benefit of generating an infinite stream is not really worth the very real cost to people trying to use these in parallel and getting surprising performance. +1 on ints(), longs() +1 on making these finite +1 on making generate(f) essentially be longs().map(f::get) On 4/12/2013 10:50 AM, Paul Sandoz wrote: > Hi, > > Currently Streams.generate produces an infinite stream. This is theoretically nice but splits poorly (right-balanced trees). > > Implementation-wise Streams.generate creates a spliterator from an iterator: > > public static Stream generate(Supplier s) { > Objects.requireNonNull(s); > InfiniteIterator iterator = s::get; > return StreamSupport.stream(Spliterators.spliteratorUnknownSize( > iterator, > Spliterator.ORDERED | Spliterator.IMMUTABLE)); > } > > The method is used in java.util.Random: > > public IntStream ints() { > return Streams.generateInt(this::nextInt); > } > > There might be a nasty surprise in store for developers that expect the randomly generated stream of int values to have the best parallel performance. > > > We can change Streams.generate to be finite (or not know to be finite in the time allotted to do some computation) by implementing as follows: > > public static Stream generate(Supplier s) { > return Streams.longRange(0, Long.MAX_VALUE).mapToObj(i -> s.get()); > } > > This will yield better parallel performance because the splits are balanced. > > We can further change to: > > public static Stream generate(Supplier s) { > return Streams.longs().mapToObj(i -> s.get()); > } > > if we introduce the longs() idiom. > > > I think we should go finite! and add Streams.longs(). Agree? or disagree? > > Then it is actually questionable if Streams.generate should exist at all. It does have some pedagogic value since the idiom Streams.longs().map() may not be obvious. So i would be mostly inclined to keep it for that reason. > > Paul. > From jim at pentastich.org Fri Apr 12 14:18:00 2013 From: jim at pentastich.org (Jim Mayer) Date: Fri, 12 Apr 2013 17:18:00 -0400 Subject: Streams.generate: infinite or finite? In-Reply-To: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com> References: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com> Message-ID: Perhaps I'm missing something, but what makes it hard to do a high performance parallel split of an infinite stream? Sure, you can't break it up into even chunks, but is that really all that different than breaking up Long.MAX_VALUE values? On an 8 core machine you're not going to break them up into 2^60 element chunks either, right? Yours confusedly, Jim Mayer On Fri, Apr 12, 2013 at 10:50 AM, Paul Sandoz wrote: > Hi, > > Currently Streams.generate produces an infinite stream. This is > theoretically nice but splits poorly (right-balanced trees). > > Implementation-wise Streams.generate creates a spliterator from an > iterator: > > public static Stream generate(Supplier s) { > Objects.requireNonNull(s); > InfiniteIterator iterator = s::get; > return StreamSupport.stream(Spliterators.spliteratorUnknownSize( > iterator, > Spliterator.ORDERED | Spliterator.IMMUTABLE)); > } > > The method is used in java.util.Random: > > public IntStream ints() { > return Streams.generateInt(this::nextInt); > } > > There might be a nasty surprise in store for developers that expect the > randomly generated stream of int values to have the best parallel > performance. > > > We can change Streams.generate to be finite (or not know to be finite in > the time allotted to do some computation) by implementing as follows: > > public static Stream generate(Supplier s) { > return Streams.longRange(0, Long.MAX_VALUE).mapToObj(i -> s.get()); > } > > This will yield better parallel performance because the splits are > balanced. > > We can further change to: > > public static Stream generate(Supplier s) { > return Streams.longs().mapToObj(i -> s.get()); > } > > if we introduce the longs() idiom. > > > I think we should go finite! and add Streams.longs(). Agree? or disagree? > > Then it is actually questionable if Streams.generate should exist at all. > It does have some pedagogic value since the idiom Streams.longs().map() may > not be obvious. So i would be mostly inclined to keep it for that reason. > > Paul. From brian.goetz at oracle.com Fri Apr 12 14:22:22 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 12 Apr 2013 17:22:22 -0400 Subject: StreamBuilder Message-ID: <51687B0E.2030602@oracle.com> In the wake of taking away flatMap(FlatMapper), we had to provide a way for people to build streams by generation. For object-valued streams, they could just use an ArrayList, but for primitive-valued streams, there's no easy buffering tool. (Hopefully also we can make StreamBuffer more efficient that ArrayList (at least it doesn't have to copy elements on resize)). What we've got now is: interface StreamBuilder extends Consumer { Stream build(); } with nested specializations for OfInt, OfLong, OfDouble. and factories in Streams to get one: static StreamBuilder builder(); Someone commented that it wasn't obvious that StreamBuilder is just a buffer, and the Stream class itself is a sort of builder for streams (you add stages one by one), so maybe a better name might be StreamBuffer? And I guess the corresponding factories are Streams.makeBuffer()? .newBuffer()? From brian.goetz at oracle.com Sat Apr 13 08:24:02 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 13 Apr 2013 11:24:02 -0400 Subject: Stream constructors for stream(Iterator) in StreamSupport? Message-ID: <51697892.5010205@oracle.com> Currently StreamSupport contains seq/par versions of stream(Spliterator) stream(Supplier) for ref/int/long/double. In java.util.Spliterators, there are adapters to turn an Iterator into a Spliterator. I think we should add convenience factories for stream(Iterator) to StreamSupport as well. From tim at peierls.net Sat Apr 13 09:06:40 2013 From: tim at peierls.net (Tim Peierls) Date: Sat, 13 Apr 2013 12:06:40 -0400 Subject: Stream constructors for stream(Iterator) in StreamSupport? In-Reply-To: <51697892.5010205@oracle.com> References: <51697892.5010205@oracle.com> Message-ID: Doesn't that seem like something that belongs in Streams? If you're stuck with a legacy API that exposes Iterator but not Iterable, you'd still want to be able to make a Stream out of it, and you wouldn't want to have to look in StreamSupport for that. It's a lot different from stream(Spliterator). On Sat, Apr 13, 2013 at 11:24 AM, Brian Goetz wrote: > Currently StreamSupport contains seq/par versions of > stream(Spliterator) > stream(Supplier) > for ref/int/long/double. > > In java.util.Spliterators, there are adapters to turn an Iterator into a > Spliterator. > > I think we should add convenience factories for > > stream(Iterator) > > to StreamSupport as well. > From brian.goetz at oracle.com Sat Apr 13 12:25:54 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 13 Apr 2013 15:25:54 -0400 Subject: Stream constructors for stream(Iterator) in StreamSupport? In-Reply-To: References: <51697892.5010205@oracle.com> Message-ID: <5169B142.6000002@oracle.com> Good question. Here's my reasoning about why I thought it lives better in SS than S; let me know if you find this argument compelling. (Also, this speaks to an area currently missing in the docs.) There are lots of ways to make a stream, and some are better than others. The absolute worst is via an Iterator. Best way is to get one from your data source directly (e.g., ArrayList.stream()). The streams provided by collections and other JDK classes have highly optimized spliterators (thanks Doug!), work directly with knowledge of the data structure, are late-binding to minimize CME-like interference, and preserve the most information (such as sorted-ness, sized-ness, distinct-ness) that the streams framework can use directly to optimize execution. The next best way is via one of the factories in Streams -- things like intRange, iterate, generate. These are mire flexible than they first appear; for example, if you have a function int -> T, and you want to generate a sequence of f(0), f(1), ... f(n) in a parallel-friendly way, you can just do: intRange(0, n).map(f); The next best way is via a Spliterator that properly declares its properties, is SIZED, SUBSIZED, and has a good trySplit implementation. These will ensure that things decompose well. Many of the JDK spliterators have these characteristics. We then slide down the scale of spliterator quality; SUBSIZED is probably the first to go, then SIZED, then trySplit. As the spliterator quality degrades, the quality of decomposition and opportunity for pipeline optimization degrades too. We then come to the bottom of the barrel, iterators. Making a Spliterator from an iterator sucks in at least the following ways: - Splitting will suck. We can still extract some parallelism for high-Q problems, but it will never be good, placing a lid on how much parallelism you can get. - Iterators throw away a lot of useful information about the underlying data source, such as its size. It may be that whoever wrote the Iterator knows the size, but the Iterator does not. (We've got an iterator+size to spliterator conversion, but that's brittle because of "early binding" to the size information.) - Element access overhead. One of the reasons for doing Spliterator is that Iterator sucks so badly! (High per-element cost; two method calls per element, often with redundant computation due to required defensive coding; Iterator protocol often requires lookahead and buffering; inherent race between hasNext() and next().) So you're taking a sucky way to get elements out of a source, and wrapping it with more junk. So, while Iterator to Stream is still a fine last resort, putting it in Streams will likely have the unfortunate effect of guiding users to the worst way of making a stream, without fully understanding the tradeoffs. On 4/13/2013 12:06 PM, Tim Peierls wrote: > Doesn't that seem like something that belongs in Streams? If you're > stuck with a legacy API that exposes Iterator but not Iterable, you'd > still want to be able to make a Stream out of it, and you wouldn't want > to have to look in StreamSupport for that. It's a lot different from > stream(Spliterator). > > On Sat, Apr 13, 2013 at 11:24 AM, Brian Goetz > wrote: > > Currently StreamSupport contains seq/par versions of > stream(Spliterator) > stream(Supplier) > for ref/int/long/double. > > In java.util.Spliterators, there are adapters to turn an Iterator > into a Spliterator. > > I think we should add convenience factories for > > stream(Iterator) > > to StreamSupport as well. > > From tim at peierls.net Sat Apr 13 13:57:20 2013 From: tim at peierls.net (Tim Peierls) Date: Sat, 13 Apr 2013 16:57:20 -0400 Subject: Stream constructors for stream(Iterator) in StreamSupport? In-Reply-To: <5169B142.6000002@oracle.com> References: <51697892.5010205@oracle.com> <5169B142.6000002@oracle.com> Message-ID: On Sat, Apr 13, 2013 at 3:25 PM, Brian Goetz wrote: > There are lots of ways to make a stream, and some are better than others. > ... > > Best way is to get one from your data source directly (e.g., > ArrayList.stream()). ... > The next best way is via one of the factories in Streams -- things like > intRange, iterate, generate. ... > The next best way is via a Spliterator that properly declares its > properties, is SIZED, SUBSIZED, and has a good trySplit implementation. > We then slide down the scale of spliterator quality; ... > We then come to the bottom of the barrel, iterators. ... > So, while Iterator to Stream is still a fine last resort, putting it in > Streams will likely have the unfortunate effect of guiding users to the > worst way of making a stream, without fully understanding the tradeoffs. That's a great taxonomy of ways to make a stream, but the division of static factory methods into Streams and StreamSupport wasn't, as I understood it, along those lines. It was about keeping concepts that most users aren't going to want to mess with (i.e., Spliterator) out of their line of sight. If all you have is an Iterator, you don't want have to go down into the basement to get something that turns it into a Stream. Put those tradeoffs on the packaging but leave the package in the kitchen. --tim From brian.goetz at oracle.com Sat Apr 13 14:15:30 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 13 Apr 2013 17:15:30 -0400 Subject: Stream constructors for stream(Iterator) in StreamSupport? In-Reply-To: References: <51697892.5010205@oracle.com> <5169B142.6000002@oracle.com> Message-ID: <5169CAF2.5070109@oracle.com> > That's a great taxonomy of ways to make a stream, but the division of > static factory methods into Streams and StreamSupport wasn't, as I > understood it, along those lines. It was about keeping concepts that > most users aren't going to want to mess with (i.e., Spliterator) out of > their line of sight. That was indeed part of it. But the other part of it was guiding them away from low-level tools they might mistakenly (mis)use because we'd not sufficiently labeled things into bins of "for users" and "for library writers." I still think stream-from-iterator is low-level, because it involves choosing things like stream flags (and doing it wrong will have bad results.) > If all you have is an Iterator, you don't want have to go down into the > basement to get something that turns it into a Stream. Put those > tradeoffs on the packaging but leave the package in the kitchen. So, the tension here is: - helping the poor users for whom all they can get is an Iterator, and they want a stream; - avoiding the moral hazard of encouraging people to think that Iterator is actually a *good* way to make a stream (which might even encourage them to write more Iterators!) I want them to think is "Iterator is the last possible resort for making a stream, including a number of resorts that I should learn about first before writing an Iterator." The current status quo is either better or worse in this, depending on which of the two above forces you are more compelled by. The way to make a Stream from an iterator currently is: Streams.stream(Spliterators.spliteratorUnknownSize(iterator, flags)); Streams.stream(Spliterators.spliterator(iterator, size, flags)); Which do the job but suffer from poor discoverability. On the other hand, it has none of the moral hazard -- its pretty clear you're nailing bags on bags, and I don't think this status quo is so awful. Another direction (as discussed previously without convergence) would be to augment Iterable with a stream() method. This helps users of non-Collection Iterable classes, but still has some of the moral hazard as it does not put enough pressure on writers of Iterable classes to write better stream() implementations. From joe.bowbeer at gmail.com Sat Apr 13 14:32:05 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 13 Apr 2013 14:32:05 -0700 Subject: Stream constructors for stream(Iterator) in StreamSupport? In-Reply-To: <5169CAF2.5070109@oracle.com> References: <51697892.5010205@oracle.com> <5169B142.6000002@oracle.com> <5169CAF2.5070109@oracle.com> Message-ID: I think the signature fits better in support. (The default method alternative negates this.) However another argument is based on the expected users. If they are not library writers then it should not go in the support class. On Apr 13, 2013 2:15 PM, "Brian Goetz" wrote: > That's a great taxonomy of ways to make a stream, but the division of >> static factory methods into Streams and StreamSupport wasn't, as I >> understood it, along those lines. It was about keeping concepts that >> most users aren't going to want to mess with (i.e., Spliterator) out of >> their line of sight. >> > > That was indeed part of it. But the other part of it was guiding them > away from low-level tools they might mistakenly (mis)use because we'd not > sufficiently labeled things into bins of "for users" and "for library > writers." I still think stream-from-iterator is low-level, because it > involves choosing things like stream flags (and doing it wrong will have > bad results.) > > If all you have is an Iterator, you don't want have to go down into the >> basement to get something that turns it into a Stream. Put those >> tradeoffs on the packaging but leave the package in the kitchen. >> > > So, the tension here is: > - helping the poor users for whom all they can get is an Iterator, and > they want a stream; > - avoiding the moral hazard of encouraging people to think that Iterator > is actually a *good* way to make a stream (which might even encourage them > to write more Iterators!) I want them to think is "Iterator is the last > possible resort for making a stream, including a number of resorts that I > should learn about first before writing an Iterator." > > The current status quo is either better or worse in this, depending on > which of the two above forces you are more compelled by. The way to make a > Stream from an iterator currently is: > > Streams.stream(Spliterators.**spliteratorUnknownSize(**iterator, > flags)); > Streams.stream(Spliterators.**spliterator(iterator, size, flags)); > > Which do the job but suffer from poor discoverability. On the other hand, > it has none of the moral hazard -- its pretty clear you're nailing bags on > bags, and I don't think this status quo is so awful. > > > Another direction (as discussed previously without convergence) would be > to augment Iterable with a stream() method. This helps users of > non-Collection Iterable classes, but still has some of the moral hazard as > it does not put enough pressure on writers of Iterable classes to write > better stream() implementations. > > From brian.goetz at oracle.com Sat Apr 13 15:02:55 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 13 Apr 2013 18:02:55 -0400 Subject: Stream constructors for stream(Iterator) in StreamSupport? In-Reply-To: References: <51697892.5010205@oracle.com> <5169B142.6000002@oracle.com> <5169CAF2.5070109@oracle.com> Message-ID: <5169D60F.3010300@oracle.com> I think Tim's concern is that he recognizes two categories of potential users: - Library writers who want to expose a stream() method but are not ready to take the plunge to Spliterator; - Poor users who want a stream and all they can get out of their damn library is an Iterator. The question is, is there a way to not hose the second category without making things worse? On 4/13/2013 5:32 PM, Joe Bowbeer wrote: > I think the signature fits better in support. (The default method > alternative negates this.) However another argument is based on the > expected users. If they are not library writers then it should not go in > the support class. > > On Apr 13, 2013 2:15 PM, "Brian Goetz" > wrote: > > That's a great taxonomy of ways to make a stream, but the > division of > static factory methods into Streams and StreamSupport wasn't, as I > understood it, along those lines. It was about keeping concepts that > most users aren't going to want to mess with (i.e., Spliterator) > out of > their line of sight. > > > That was indeed part of it. But the other part of it was guiding > them away from low-level tools they might mistakenly (mis)use > because we'd not sufficiently labeled things into bins of "for > users" and "for library writers." I still think > stream-from-iterator is low-level, because it involves choosing > things like stream flags (and doing it wrong will have bad results.) > > If all you have is an Iterator, you don't want have to go down > into the > basement to get something that turns it into a Stream. Put those > tradeoffs on the packaging but leave the package in the kitchen. > > > So, the tension here is: > - helping the poor users for whom all they can get is an Iterator, > and they want a stream; > - avoiding the moral hazard of encouraging people to think that > Iterator is actually a *good* way to make a stream (which might even > encourage them to write more Iterators!) I want them to think is > "Iterator is the last possible resort for making a stream, including > a number of resorts that I should learn about first before writing > an Iterator." > > The current status quo is either better or worse in this, depending > on which of the two above forces you are more compelled by. The way > to make a Stream from an iterator currently is: > > Streams.stream(Spliterators.__spliteratorUnknownSize(__iterator, > flags)); > Streams.stream(Spliterators.__spliterator(iterator, size, flags)); > > Which do the job but suffer from poor discoverability. On the other > hand, it has none of the moral hazard -- its pretty clear you're > nailing bags on bags, and I don't think this status quo is so awful. > > > Another direction (as discussed previously without convergence) > would be to augment Iterable with a stream() method. This helps > users of non-Collection Iterable classes, but still has some of the > moral hazard as it does not put enough pressure on writers of > Iterable classes to write better stream() implementations. > From jim at pentastich.org Sat Apr 13 19:08:40 2013 From: jim at pentastich.org (Jim Mayer) Date: Sat, 13 Apr 2013 22:08:40 -0400 Subject: Stream constructors for stream(Iterator) in StreamSupport? In-Reply-To: <5169D60F.3010300@oracle.com> References: <51697892.5010205@oracle.com> <5169B142.6000002@oracle.com> <5169CAF2.5070109@oracle.com> <5169D60F.3010300@oracle.com> Message-ID: Hi Brian, How about introducing a new method name, like "adapt", "asStream", or "convert" that implies overhead? Another possibility would be to introduce a new class, like "StreamConverters" whose name, again, implies the additional overhead. Jim On Sat, Apr 13, 2013 at 6:02 PM, Brian Goetz wrote: > I think Tim's concern is that he recognizes two categories of potential > users: > > - Library writers who want to expose a stream() method but are not ready > to take the plunge to Spliterator; > - Poor users who want a stream and all they can get out of their damn > library is an Iterator. > > The question is, is there a way to not hose the second category without > making things worse? > > > On 4/13/2013 5:32 PM, Joe Bowbeer wrote: > >> I think the signature fits better in support. (The default method >> alternative negates this.) However another argument is based on the >> expected users. If they are not library writers then it should not go in >> the support class. >> >> On Apr 13, 2013 2:15 PM, "Brian Goetz" > > wrote: >> >> That's a great taxonomy of ways to make a stream, but the >> division of >> static factory methods into Streams and StreamSupport wasn't, as I >> understood it, along those lines. It was about keeping concepts >> that >> most users aren't going to want to mess with (i.e., Spliterator) >> out of >> their line of sight. >> >> >> That was indeed part of it. But the other part of it was guiding >> them away from low-level tools they might mistakenly (mis)use >> because we'd not sufficiently labeled things into bins of "for >> users" and "for library writers." I still think >> stream-from-iterator is low-level, because it involves choosing >> things like stream flags (and doing it wrong will have bad results.) >> >> If all you have is an Iterator, you don't want have to go down >> into the >> basement to get something that turns it into a Stream. Put those >> tradeoffs on the packaging but leave the package in the kitchen. >> >> >> So, the tension here is: >> - helping the poor users for whom all they can get is an Iterator, >> and they want a stream; >> - avoiding the moral hazard of encouraging people to think that >> Iterator is actually a *good* way to make a stream (which might even >> encourage them to write more Iterators!) I want them to think is >> "Iterator is the last possible resort for making a stream, including >> a number of resorts that I should learn about first before writing >> an Iterator." >> >> The current status quo is either better or worse in this, depending >> on which of the two above forces you are more compelled by. The way >> to make a Stream from an iterator currently is: >> >> Streams.stream(Spliterators.__**spliteratorUnknownSize(__** >> iterator, >> flags)); >> Streams.stream(Spliterators.__**spliterator(iterator, size, >> flags)); >> >> >> Which do the job but suffer from poor discoverability. On the other >> hand, it has none of the moral hazard -- its pretty clear you're >> nailing bags on bags, and I don't think this status quo is so awful. >> >> >> Another direction (as discussed previously without convergence) >> would be to augment Iterable with a stream() method. This helps >> users of non-Collection Iterable classes, but still has some of the >> moral hazard as it does not put enough pressure on writers of >> Iterable classes to write better stream() implementations. >> >> From brian.goetz at oracle.com Sun Apr 14 15:48:11 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 14 Apr 2013 18:48:11 -0400 Subject: Default methods for SAM types Message-ID: <516B322B.8070207@oracle.com> Here's a list of the default *instance* methods we've currently got (or should have, for consistency) for SAM types in java.util.function. Static methods will follow in a separate message. Predicate: Predicate and(Predicate) Predicate or(Predicate) Predicate xor(Predicate) Predicate negate() (same for {Int,Long,Double}Predicate, BiPredicate.) Function: Function compose(Function before) Function andThen(Function after) BiFunction: BiFunction andThen(Function after) Consumer: Consumer chain(Consumer other) (Same for {Int,Long,Double}Consumer, BiConsumer.) This seems a reasonable minimal set; not even clear whether BiFunction.andThen carries its weight. Is there anything that's obviously missing? Are there any of these that don't carry their weight? From brian.goetz at oracle.com Sun Apr 14 17:27:24 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 14 Apr 2013 20:27:24 -0400 Subject: Static methods for SAM types Message-ID: <516B496C.9040506@oracle.com> As of Java 8 we can have static methods in interfaces. Here's a small set of static methods for the java.util.function SAM types. Function: public static Function identity() Actually, that's the end of my must-have list! But there have also been some others suggested, here's a sampling. Do any of these speak to anyone? Predicate: // Like o::isEquals, but also works if target is null public static Predicate isEqual(Object target) Function: static Function substitute(T subOut, T subIn) { return t -> Objects.equals(subOut, t) ? subIn : t; } static Function constant(R constant) { return t -> constant; } // Or could be default Predicate.asFunction(forTrue, forFalse) static forPredicate(Predicate, R forTrue, R forFalse) // Like map::get, but throws if not present static Function forMap(Map map) From mike.duigou at oracle.com Mon Apr 15 16:30:08 2013 From: mike.duigou at oracle.com (Mike Duigou) Date: Mon, 15 Apr 2013 16:30:08 -0700 Subject: RFR : 8010953: Add primitive summary statistics utils Message-ID: Hello all; Another integration review in the JSR-335 libraries series. These three classes provide a utility for conveniently finding count, sum, min, max and average of ints, longs or doubles. They can be used with existing code but will most likely be used with the Collectors utilities or directly with primitive or boxed streams. http://cr.openjdk.java.net/~mduigou/JDK-8010953/1/webrev/ (this is an updated version of the webrev sent to core-libs-dev). Mike From paul.sandoz at oracle.com Tue Apr 16 01:13:12 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 16 Apr 2013 10:13:12 +0200 Subject: Streams.generate: infinite or finite? In-Reply-To: References: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com> Message-ID: On Apr 12, 2013, at 11:18 PM, Jim Mayer wrote: > Perhaps I'm missing something, but what makes it hard to do a > high performance parallel split of an infinite stream? Sure, you can't > break it up into even chunks, but is that really all that different than > breaking up Long.MAX_VALUE values? On an 8 core machine you're not going > to break them up into 2^60 element chunks either, right? I think comes down to resource management. There is no ideal solution but some solutions might be more efficient than others. Splitting from an iterator produces a right-balanced tree of a depth that increases as one keeps splitting. Each left-split is created by copying elements into an array. The current implementation will produce a right-balanced tree of depth 128 for 2^32 elements. With Streams.longRange(0, Long.MAX_VALUE) the maximum depth will be 63, the tree is balanced, and no buffering is required when splitting. But to work effectively we require a wrapping spliterator that performs the limit operation and discards underlying splits that are not in the required range. We probably need to bias splits for such large ranges say 1:7 or 1:15 so that is quicker to get to the left but without creating trees of unduly large depth [*]. The assumption being the range to limit will often be 0 to something much much less than 2^63 - 1. So while the latter is not perfect i think it can avoid some costs of the former. Brian suggested an idea of: nulls() that just keeps producing nulls, thus generate would be nulls().map(n -> s.get()). I think it possible to for nulls() to use Streams.longRange(0, Long.MAX_VALUE) to get a truly infinite stream. i.e. the left split of nulls() does the equivalent of Streams.longRange(0, Long.MAX_VALUE) and the right split is of unknown size. However, practically i wonder if this is really worth it? Paul. [*] It is not possible to hint to a Spliterator.trySplit to bias the splitting. 2 4 8 16 32 64 128 256 1024 > > Yours confusedly, > > Jim Mayer > > > On Fri, Apr 12, 2013 at 10:50 AM, Paul Sandoz wrote: > >> Hi, >> >> Currently Streams.generate produces an infinite stream. This is >> theoretically nice but splits poorly (right-balanced trees). >> >> Implementation-wise Streams.generate creates a spliterator from an >> iterator: >> >> public static Stream generate(Supplier s) { >> Objects.requireNonNull(s); >> InfiniteIterator iterator = s::get; >> return StreamSupport.stream(Spliterators.spliteratorUnknownSize( >> iterator, >> Spliterator.ORDERED | Spliterator.IMMUTABLE)); >> } >> >> The method is used in java.util.Random: >> >> public IntStream ints() { >> return Streams.generateInt(this::nextInt); >> } >> >> There might be a nasty surprise in store for developers that expect the >> randomly generated stream of int values to have the best parallel >> performance. >> >> >> We can change Streams.generate to be finite (or not know to be finite in >> the time allotted to do some computation) by implementing as follows: >> >> public static Stream generate(Supplier s) { >> return Streams.longRange(0, Long.MAX_VALUE).mapToObj(i -> s.get()); >> } >> >> This will yield better parallel performance because the splits are >> balanced. >> >> We can further change to: >> >> public static Stream generate(Supplier s) { >> return Streams.longs().mapToObj(i -> s.get()); >> } >> >> if we introduce the longs() idiom. >> >> >> I think we should go finite! and add Streams.longs(). Agree? or disagree? >> >> Then it is actually questionable if Streams.generate should exist at all. >> It does have some pedagogic value since the idiom Streams.longs().map() may >> not be obvious. So i would be mostly inclined to keep it for that reason. >> >> Paul. From paul.sandoz at oracle.com Tue Apr 16 01:41:21 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 16 Apr 2013 10:41:21 +0200 Subject: Streams.generate: infinite or finite? In-Reply-To: References: <898A6DD0-35CC-46C8-990A-05B334927A79@oracle.com> Message-ID: On Apr 16, 2013, at 10:13 AM, Paul Sandoz wrote: > > On Apr 12, 2013, at 11:18 PM, Jim Mayer wrote: > >> Perhaps I'm missing something, but what makes it hard to do a >> high performance parallel split of an infinite stream? Sure, you can't >> break it up into even chunks, but is that really all that different than >> breaking up Long.MAX_VALUE values? On an 8 core machine you're not going >> to break them up into 2^60 element chunks either, right? > > I think comes down to resource management. There is no ideal solution but some solutions might be more efficient than others. > > Splitting from an iterator produces a right-balanced tree of a depth that increases as one keeps splitting. Each left-split is created by copying elements into an array. The current implementation will produce a right-balanced tree of depth 128 for 2^32 elements. > That should be a depth of 2896 for 2^32 elements. (I forgot to take into account the arithmetic progression of increasing sizes when splitting. The left split size starts at 2^10 and increases by 2^10 until it reaches 2^25) Paul. > With Streams.longRange(0, Long.MAX_VALUE) the maximum depth will be 63, the tree is balanced, and no buffering is required when splitting. But to work effectively we require a wrapping spliterator that performs the limit operation and discards underlying splits that are not in the required range. We probably need to bias splits for such large ranges say 1:7 or 1:15 so that is quicker to get to the left but without creating trees of unduly large depth [*]. The assumption being the range to limit will often be 0 to something much much less than 2^63 - 1. > > So while the latter is not perfect i think it can avoid some costs of the former. > > Brian suggested an idea of: > > nulls() > > that just keeps producing nulls, thus generate would be nulls().map(n -> s.get()). > > I think it possible to for nulls() to use Streams.longRange(0, Long.MAX_VALUE) to get a truly infinite stream. i.e. the left split of nulls() does the equivalent of Streams.longRange(0, Long.MAX_VALUE) and the right split is of unknown size. However, practically i wonder if this is really worth it? > > Paul. > > [*] It is not possible to hint to a Spliterator.trySplit to bias the splitting. > > > 2 4 8 16 32 64 128 256 1024 >> >> Yours confusedly, >> >> Jim Mayer >> >> >> On Fri, Apr 12, 2013 at 10:50 AM, Paul Sandoz wrote: >> >>> Hi, >>> >>> Currently Streams.generate produces an infinite stream. This is >>> theoretically nice but splits poorly (right-balanced trees). >>> >>> Implementation-wise Streams.generate creates a spliterator from an >>> iterator: >>> >>> public static Stream generate(Supplier s) { >>> Objects.requireNonNull(s); >>> InfiniteIterator iterator = s::get; >>> return StreamSupport.stream(Spliterators.spliteratorUnknownSize( >>> iterator, >>> Spliterator.ORDERED | Spliterator.IMMUTABLE)); >>> } >>> >>> The method is used in java.util.Random: >>> >>> public IntStream ints() { >>> return Streams.generateInt(this::nextInt); >>> } >>> >>> There might be a nasty surprise in store for developers that expect the >>> randomly generated stream of int values to have the best parallel >>> performance. >>> >>> >>> We can change Streams.generate to be finite (or not know to be finite in >>> the time allotted to do some computation) by implementing as follows: >>> >>> public static Stream generate(Supplier s) { >>> return Streams.longRange(0, Long.MAX_VALUE).mapToObj(i -> s.get()); >>> } >>> >>> This will yield better parallel performance because the splits are >>> balanced. >>> >>> We can further change to: >>> >>> public static Stream generate(Supplier s) { >>> return Streams.longs().mapToObj(i -> s.get()); >>> } >>> >>> if we introduce the longs() idiom. >>> >>> >>> I think we should go finite! and add Streams.longs(). Agree? or disagree? >>> >>> Then it is actually questionable if Streams.generate should exist at all. >>> It does have some pedagogic value since the idiom Streams.longs().map() may >>> not be obvious. So i would be mostly inclined to keep it for that reason. >>> >>> Paul. > From david.holmes at oracle.com Tue Apr 16 03:10:17 2013 From: david.holmes at oracle.com (David Holmes) Date: Tue, 16 Apr 2013 20:10:17 +1000 Subject: RFR : 8010953: Add primitive summary statistics utils In-Reply-To: References: Message-ID: <516D2389.2050609@oracle.com> Hi Mike, On 16/04/2013 9:30 AM, Mike Duigou wrote: > Hello all; > > Another integration review in the JSR-335 libraries series. These three classes provide a utility for conveniently finding count, sum, min, max and average of ints, longs or doubles. They can be used with existing code but will most likely be used with the Collectors utilities or directly with primitive or boxed streams. > > http://cr.openjdk.java.net/~mduigou/JDK-8010953/1/webrev/ > > (this is an updated version of the webrev sent to core-libs-dev). A couple of minor nits: DoubleSummaryStatistics: getMin/getMax: The main doc should read the same as the @return. Presently the initial sentence: 120 * Returns the recorded value closest to {@code Double.NEGATIVE_INFINITY}, 121 * {@code Double.POSITIVE_INFINITY} if no values have been recorded or if 122 * any recorded value is NaN, then the result is NaN. if very difficult to read and parse. The @return is much simpler - just say minimum/maximum value recorded, rather than "value closest to ...". In all classes: minimal -> minimum maximal -> maximum David > Mike > From brian.goetz at oracle.com Tue Apr 16 12:47:32 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 16 Apr 2013 15:47:32 -0400 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: <516315C1.3080509@oracle.com> References: <516315C1.3080509@oracle.com> Message-ID: <516DAAD4.1070506@oracle.com> We never converged on this one. Here's another stab at framing the problem. (I'm pretty much ready to time out and make these collectors declare UNORDERED unless someone can convince me otherwise.) Streams consist of source + intermediate ops + terminal. Denote ordered/unordered variants of these as SO/SU, IO/IU/IA (A=agnostic), and TA/TU. We can define the ordered-ness of any stream pipeline as follows: ordered(SO) = true ordered(SU) = false ordered(X+IO) = true ordered(X+IU) = false ordered(X+IA) = ordered(X) ordered(X+TA) = ordered(X) ordered(X+TU) = false A concurrent calculation may be performed if the stream is unordered *and* the destination is concurrent. Collectors like toSet() are marked TU, and toList() are marked TA. Collectors like groupingByConcurrent will definitely be marked concurrent. Question is, should it be marked TA or TU? Either choice is defensible. Note that collectors individually get to choose whether they are TA or TU. Choices we make for our canned collectors need not affect user-written collectors. The model can handle both and users can predict the behavior of both. On 4/8/2013 3:08 PM, Brian Goetz wrote: > Now that we've removed collectUnordered in favor of a more general > unordered() op, we should consider what should be the default behavior for: > > orderedStream.collect(groupingByConcurrent(f)) > > Currently, the collect-to-ConcurrentMap collectors are *not* defined as > UNORDERED. Which means, if the stream is ordered, we will attempt to do > an ordered collection anyway, which is incompatible with concurrent > collection, and we will do the plain old partition-and-merge with > ConcurrentMap. > > Here, we have competing evidence for the user intent. On the one hand, > the stream is ordered, and the user could have chosen unordered. On the > other, the user has asked for concurrent grouping. Its not 100% obvious > which should win. > > On the other hand, ordered map collections are so awful that they will > almost certainly be unhappy with the performance if they forget to say > unordered here in the parallel case (and it makes no difference in the > sequential case.) So I'm inclined to make groupingByConcurrent / > toConcurrentMap be UNORDERED collections. From mike.duigou at oracle.com Tue Apr 16 12:45:26 2013 From: mike.duigou at oracle.com (Mike Duigou) Date: Tue, 16 Apr 2013 12:45:26 -0700 Subject: RFR : 8010953: Add primitive summary statistics utils In-Reply-To: <516D2389.2050609@oracle.com> References: <516D2389.2050609@oracle.com> Message-ID: <3E218825-9EE6-4F59-BC76-6424A6C67F19@oracle.com> On Apr 16 2013, at 03:10 , David Holmes wrote: > Hi Mike, > > On 16/04/2013 9:30 AM, Mike Duigou wrote: >> Hello all; >> >> Another integration review in the JSR-335 libraries series. These three classes provide a utility for conveniently finding count, sum, min, max and average of ints, longs or doubles. They can be used with existing code but will most likely be used with the Collectors utilities or directly with primitive or boxed streams. >> >> http://cr.openjdk.java.net/~mduigou/JDK-8010953/1/webrev/ >> >> (this is an updated version of the webrev sent to core-libs-dev). > > A couple of minor nits: > > DoubleSummaryStatistics: > > getMin/getMax: > > The main doc should read the same as the @return. Presently the initial sentence: > > 120 * Returns the recorded value closest to {@code Double.NEGATIVE_INFINITY}, > 121 * {@code Double.POSITIVE_INFINITY} if no values have been recorded or if > 122 * any recorded value is NaN, then the result is NaN. > > if very difficult to read and parse. The @return is much simpler - just say minimum/maximum value recorded, rather than "value closest to ...". Done. (the "value closest to ..." text was copied from Math.min/max"). > > In all classes: > > minimal -> minimum > maximal -> maximum Done. > David > >> Mike >> From howard.lovatt at gmail.com Tue Apr 16 18:25:06 2013 From: howard.lovatt at gmail.com (Howard Lovatt) Date: Wed, 17 Apr 2013 11:25:06 +1000 Subject: looking for FAQ on interconversion of IntFoo and Foo In-Reply-To: <516D0B67.9020206@univ-mlv.fr> References: <516D0B67.9020206@univ-mlv.fr> Message-ID: There is a way to get IntStream extends Stream etc. to play well with type inference. 1. Have interfaces such that the primitive form extends the reference form, e.g.: @FunctionalInterface public interface SupplierOfInt extends Supplier { default Integer get() { return getAsInt(); } int getAsInt(); } 2. Put all your makers, builders, factories, etc., in a single class for each kind as static methods (i.e. do not have Arrays containing some, Streams containing others etc. - one class containing all the makers). One class per kind, i.e. Streams, StreamsOfInt, StreamsOfLong, and StreamsOfDouble. EG: import java.util.function.SupplierOfInt; // No need to import java.util.function.Supplier since it isn't needed and hence no inference problem ... public final class StreamsOfInt { private StreamsOfInt() {} public static final SupplierOfInt forever(final int value) { return () -> value; } ... } 3. When you want to use StreamsOfInt you do a single static import when you want to use a stream, e.g. import static java.util.stream.StreamsOfInt.*; This way the compiler doesn't get confused because you have not imported java.util.function.SupplierOfInt and java.util.function.Supplier into one compilation unit (unlike the current package arrangement). If you do statically import say Streams and StreamsOfInt into one compilation unit you will have to qualify the start of the stream, e.g. Streams.forever(new Object())... and StreamsOfInt.forever(1), and importantly you do not have to qualify the rest of the stream code. In practice I have found multiple stream source types rare in one file and when you do have them the qualification is a good idea anyway. I do this in my own stream library and it works great. -- Howard. PS I also dropped the dot in Supplier.OfInt etc. since Netbeans hates this! It also doesn't seem to add value and makes the Javadoc confusing. On 16 April 2013 18:27, Remi Forax wrote: > On 04/16/2013 09:01 AM, John Rose wrote: > > Where is the standard place to find the design discussion for > primitive-type specializations of the new interfaces (functions, producers, > consumers, optionals...)? > > > > In particular, will users be able to ignore higher-order functions of > unboxed values and later adjust code locally for efficiency? > > > > If so, are there conversion operators that correspond to auto-box and > auto-unbox of non-functional, which can be used to make adjustments at the > boundaries? > > From the compiler point of view, in a lambda conversion Integer as is > not a boxed int, > so there is no such auto-boxing. Said differently a Stream of Integers > and a Stream of ints are not the same object. > > > > > Finally (and this is what prompted me to ask) why not make IntSupplier > or OptionalInt be sub-interfaces of the reference-bearing ones, with > autoboxing around the edges (see below)? > > At some point in the past, IntSupplier was a subtype of Supplier but it > doesn't play well with type inference > and method resolution in the compiler > Supplier or IntSupplier because they are functional interface are seen > by the compiler as structural types > and mixing structural types and classical subtyping relationship has > some hairy interactions. > > > > > Since this is probably a rehash of past discussions, I'm looking to be > pointed at some sort of Email thread or even (!) a wiki page. > > > > Best, > > ? John > > cheers, > R?mi > > > > > P.S. It looks like this sort of stuff was in the repo at first and then > was yanked recently. > > > > diff --git a/src/share/classes/java/util/function/IntSupplier.java > b/src/share/classes/java/util/function/IntSupplier.java > > --- a/src/share/classes/java/util/function/IntSupplier.java > > +++ b/src/share/classes/java/util/function/IntSupplier.java > > @@ -32,7 +32,12 @@ > > * @since 1.8 > > */ > > @FunctionalInterface > > -public interface IntSupplier { > > +public interface IntSupplier > > + extends Supplier > > +{ > > + /** Returns the result of {@code getAsInt}, boxed. */ > > + // in my dreams, this would allows IntSupplier to convert to > Supplier > > + public default Integer get() { return getAsInt(); } > > > > /** > > * Returns an {@code int} value. > > > > > > > -- -- Howard. From brian.goetz at oracle.com Wed Apr 17 11:48:39 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 17 Apr 2013 14:48:39 -0400 Subject: Survey: API review for Collectors Message-ID: <516EEE87.3000103@oracle.com> I've posted a survey for the static methods in Collectors at: https://www.surveymonkey.com/s/LGV85RH I think the API here is mostly done; the spec and tutorial material still need work. Usual password. From brian.goetz at oracle.com Wed Apr 17 12:56:09 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 17 Apr 2013 15:56:09 -0400 Subject: Survey: API review for Collectors In-Reply-To: <516EEE87.3000103@oracle.com> References: <516EEE87.3000103@oracle.com> Message-ID: <516EFE59.8060006@oracle.com> Sam asks: Why not specifically return an immutable set from toSet()? I'd like this too. This is due to the limitation of not being able to support a post-function in Collector. (See my recent post on this on lambda-dev: http://mail.openjdk.java.net/pipermail/lambda-dev/2013-April/009394.html). Related question: why does toStringJoiner not expose prefix/suffix? This one is related -- we don't have any easy way to treat the root caculation differently from sub-results. If we used () -> new StringJoiner(", ", "[", "]") as our result container, then if we did a parallel collect of intRange(1,6) where it happens to get split in half, the reslut would be: [1,2,3],[4,5,6] instead of [1,2,3,4,5,6] TO be able to do this right, we'd have to use a different construction of the stringjoiner for non-root results. Extending Collector to handle all these cases (efficiently) was going to be pretty disruptive. So we said goodbye to these pretty use cases. On 4/17/2013 2:48 PM, Brian Goetz wrote: > I've posted a survey for the static methods in Collectors at: > https://www.surveymonkey.com/s/LGV85RH > > I think the API here is mostly done; the spec and tutorial material > still need work. > > Usual password. > > From tim at peierls.net Wed Apr 17 14:06:47 2013 From: tim at peierls.net (Tim Peierls) Date: Wed, 17 Apr 2013 17:06:47 -0400 Subject: Survey: API review for Collectors In-Reply-To: <516EFE59.8060006@oracle.com> References: <516EEE87.3000103@oracle.com> <516EFE59.8060006@oracle.com> Message-ID: On Wed, Apr 17, 2013 at 3:56 PM, Brian Goetz wrote: > TO be able to do this right, we'd have to use a different construction of > the stringjoiner for non-root results. Extending Collector to handle all > these cases (efficiently) was going to be pretty disruptive. So we said > goodbye to these pretty use cases. Good! "Cure" would have been worse than the disease. --tim From brian.goetz at oracle.com Wed Apr 17 18:21:02 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 17 Apr 2013 21:21:02 -0400 Subject: mergers Message-ID: <516F4A7E.2080602@oracle.com> Collectors defines three merge functions: throwingMerger -- always throws firstWinsMerger -- takes first lastWinsMerger -- takes last These are plain old BinaryOperators that can be used for Map.merge as well as the toMap collectors. Someone commented that these look a little out of place in Collectors, and they are certainly not Collector-specific. Is there a better place for them? From paul.sandoz at oracle.com Thu Apr 18 01:35:28 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 18 Apr 2013 10:35:28 +0200 Subject: mergers In-Reply-To: <516F4A7E.2080602@oracle.com> References: <516F4A7E.2080602@oracle.com> Message-ID: <9DBCAEC3-5E4F-494D-9352-4D830FEC8716@oracle.com> On Apr 18, 2013, at 3:21 AM, Brian Goetz wrote: > Collectors defines three merge functions: > > throwingMerger -- always throws > firstWinsMerger -- takes first > lastWinsMerger -- takes last > > These are plain old BinaryOperators that can be used for Map.merge as well as the toMap collectors. > > Someone commented that these look a little out of place in Collectors, and they are certainly not Collector-specific. Is there a better place for them? > Someone also commented (separately from a survey) that method refs could be used instead? e.g. like using Integer::sum. e.g. Objects::first, Objects::second, Objects:throwing But i thought that might make it harder to correlate with map merging. They tend to read well when used with toMap code, but perhaps make more sense as static methods on Map due to Map.merge being present? Paul. From Paul.Sandoz at oracle.com Thu Apr 18 02:25:54 2013 From: Paul.Sandoz at oracle.com (Paul Sandoz) Date: Thu, 18 Apr 2013 11:25:54 +0200 Subject: Setting of UNORDERED on concurrent collectors In-Reply-To: <516DAAD4.1070506@oracle.com> References: <516315C1.3080509@oracle.com> <516DAAD4.1070506@oracle.com> Message-ID: <27593794-2973-4D5B-943D-A8DD750678E4@oracle.com> On Apr 16, 2013, at 9:47 PM, Brian Goetz wrote: > We never converged on this one. Here's another stab at framing the problem. (I'm pretty much ready to time out and make these collectors declare UNORDERED unless someone can convince me otherwise.) > > Streams consist of source + intermediate ops + terminal. > > Denote ordered/unordered variants of these as SO/SU, IO/IU/IA (A=agnostic), and TA/TU. We can define the ordered-ness of any stream pipeline as follows: > > ordered(SO) = true > ordered(SU) = false > > ordered(X+IO) = true > ordered(X+IU) = false > ordered(X+IA) = ordered(X) > > ordered(X+TA) = ordered(X) > ordered(X+TU) = false > > A concurrent calculation may be performed if the stream is unordered *and* the destination is concurrent. > > Collectors like toSet() are marked TU, and toList() are marked TA. Collectors like groupingByConcurrent will definitely be marked concurrent. Question is, should it be marked TA or TU? Either choice is defensible. > I think it should be TU, even though it is only triggered when the upstream is unordered. The intent is, when triggered, that our concurrent collectors should be used with a forEach-like mechanism by which the collector concurrently receives elements in a temporal order. Paul. > Note that collectors individually get to choose whether they are TA or TU. Choices we make for our canned collectors need not affect user-written collectors. The model can handle both and users can predict the behavior of both. From brian.goetz at oracle.com Thu Apr 18 10:40:40 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 18 Apr 2013 13:40:40 -0400 Subject: mergers In-Reply-To: <9DBCAEC3-5E4F-494D-9352-4D830FEC8716@oracle.com> References: <516F4A7E.2080602@oracle.com> <9DBCAEC3-5E4F-494D-9352-4D830FEC8716@oracle.com> Message-ID: <51703018.3070403@oracle.com> I am OK with using method refs instead of function-returning methods. But I think key is that "merge" needs to appear in the name, because, while a function that returns the first of its arguments is useful, the key here is that we're trying to identify a set of reasonable merging policies that are useful when doing "dump a stream into a map". I think even these three simple ones will greatly reduce people's need to write mergers themselves for toMap. Having them live in some place more Mappy would be fine too, but I don't want to create a Maps class for them. Are they important enough to be static methods on Map? (I doubt it.) So it mostly seems like they're in the "desirable to have, but not a great place to shove them" place now. Is Collectors good enough, or do we have to think harder about making a better place? On 4/18/2013 4:35 AM, Paul Sandoz wrote: > > On Apr 18, 2013, at 3:21 AM, Brian Goetz wrote: > >> Collectors defines three merge functions: >> >> throwingMerger -- always throws >> firstWinsMerger -- takes first >> lastWinsMerger -- takes last >> >> These are plain old BinaryOperators that can be used for Map.merge as well as the toMap collectors. >> >> Someone commented that these look a little out of place in Collectors, and they are certainly not Collector-specific. Is there a better place for them? >> > > Someone also commented (separately from a survey) that method refs could be used instead? e.g. like using Integer::sum. > > e.g. Objects::first, Objects::second, Objects:throwing > > But i thought that might make it harder to correlate with map merging. > > They tend to read well when used with toMap code, but perhaps make more sense as static methods on Map due to Map.merge being present? > > Paul. > > > From brian.goetz at oracle.com Thu Apr 18 19:18:58 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 18 Apr 2013 22:18:58 -0400 Subject: Dividing Streams.java In-Reply-To: <5166F804.50101@oracle.com> References: <5166F804.50101@oracle.com> Message-ID: <5170A992.80703@oracle.com> Now that we've cleared away the spliterator methods from Streams, all, or nearly all, of the remaining methods in Streams are candidates for moving to the respective interfaces. And in many ways get nicer when they do. We've got: builder() emptyStream() singletonStream() iterate() generate() for all the types (so emptyStream(), emptyIntStream(), etc), plus ranges for the numeric types. Plus concat() zip() for ref streams. All of these are good candidate for statics in their respective interfaces: Stream.builder() Stream.emptyStream(); IntStream.generate(f); IntStream.range(f); They read well, most are "important" enough to live with the main interface, and the names get less redundant since we don't have to say "intRange" but just "range". All of them? Most of them? None of them? On 4/11/2013 1:51 PM, Brian Goetz wrote: > Joe quite correctly pointed out in the survey that Streams.java is a mix > of two things for two audiences: > > - Utility methods for users to generate streams, like intRange() > - Low level methods for library writers to generate streams from > things like iterators or spliterators. > > Merging them in one file is confusing, because users come away with the > idea that writing spliterators is something they're supposed to do, > whereas in reality, if we've done our jobs, they should never even be > aware that spliterators exist. So I think we should separate them into > a "high level" and "low level" bag of tricks. > > Since today, Paul has added some new ones: > - singletonStream(v) (four flavors) > - builder() (four flavors) > > So, we have to identify appropriate homes for the two groupings, and > separate them. Here's a first cut at separating them: > > High level: > xxxRange > xxxBuilder > emptyXxxStream > singletonXxxStream > concat > zip > > Low level: > all spliterator-related stream building methods > > Not sure where (or even if): > iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...) > generate (infinite stream of independent applications of a generator, > good for infinite constant and random streams, though not much else, > used by impl of Random.{ints,longs,gaussians}). > > Others that we've talked about adding: > ints(), longs() // to enable things like ints().filter(...).limit(n) > indexedGenerate(i -> T) > > > > I think the high-level stuff should stay in Streams. So we need a name > for the low-level stuff. (Which also then becomes the right home for > "how do I turn my data sturcture into a stream" doc.) > > What should we call that? From paul.sandoz at oracle.com Fri Apr 19 03:01:41 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 19 Apr 2013 12:01:41 +0200 Subject: mergers In-Reply-To: <51703018.3070403@oracle.com> References: <516F4A7E.2080602@oracle.com> <9DBCAEC3-5E4F-494D-9352-4D830FEC8716@oracle.com> <51703018.3070403@oracle.com> Message-ID: <35E1D4C2-B71C-4088-B35C-7E7E8DAA7D9A@oracle.com> On Apr 18, 2013, at 7:40 PM, Brian Goetz wrote: > I am OK with using method refs instead of function-returning methods. But I think key is that "merge" needs to appear in the name, because, while a function that returns the first of its arguments is useful, the key here is that we're trying to identify a set of reasonable merging policies that are useful when doing "dump a stream into a map". I think even these three simple ones will greatly reduce people's need to write mergers themselves for toMap. > > Having them live in some place more Mappy would be fine too, but I don't want to create a Maps class for them. Are they important enough to be static methods on Map? (I doubt it.) So it mostly seems like they're in the "desirable to have, but not a great place to shove them" place now. Is Collectors good enough, or do we have to think harder about making a better place? > My inclination is Collectors is OK since those methods are designed to be closely associated with Collectors.toMap. FWIW i think it is also possible to offset some of the need for "merge" name with some documentation in Collectors.toMap, however i still like the way it reads in code when those methods are used. Paul. > On 4/18/2013 4:35 AM, Paul Sandoz wrote: >> >> On Apr 18, 2013, at 3:21 AM, Brian Goetz wrote: >> >>> Collectors defines three merge functions: >>> >>> throwingMerger -- always throws >>> firstWinsMerger -- takes first >>> lastWinsMerger -- takes last >>> >>> These are plain old BinaryOperators that can be used for Map.merge as well as the toMap collectors. >>> >>> Someone commented that these look a little out of place in Collectors, and they are certainly not Collector-specific. Is there a better place for them? >>> >> >> Someone also commented (separately from a survey) that method refs could be used instead? e.g. like using Integer::sum. >> >> e.g. Objects::first, Objects::second, Objects:throwing >> >> But i thought that might make it harder to correlate with map merging. >> >> They tend to read well when used with toMap code, but perhaps make more sense as static methods on Map due to Map.merge being present? >> >> Paul. >> >> >> From spullara at gmail.com Fri Apr 19 08:29:58 2013 From: spullara at gmail.com (Sam Pullara) Date: Fri, 19 Apr 2013 08:29:58 -0700 Subject: Dividing Streams.java In-Reply-To: <5170A992.80703@oracle.com> References: <5166F804.50101@oracle.com> <5170A992.80703@oracle.com> Message-ID: <6E8C35D9-3391-4F3A-9B6E-80F3DDF6F9C4@gmail.com> I think it is a good idea to move all of them to their interfaces. Much easier to find. Sam On Apr 18, 2013, at 7:18 PM, Brian Goetz wrote: > Now that we've cleared away the spliterator methods from Streams, all, or nearly all, of the remaining methods in Streams are candidates for moving to the respective interfaces. And in many ways get nicer when they do. > > We've got: > > builder() > emptyStream() > singletonStream() > iterate() > generate() > > for all the types (so emptyStream(), emptyIntStream(), etc), plus ranges for the numeric types. Plus > > concat() > zip() > > for ref streams. > > All of these are good candidate for statics in their respective interfaces: > > Stream.builder() > Stream.emptyStream(); > IntStream.generate(f); > IntStream.range(f); > > They read well, most are "important" enough to live with the main interface, and the names get less redundant since we don't have to say "intRange" but just "range". > > All of them? Most of them? None of them? > > On 4/11/2013 1:51 PM, Brian Goetz wrote: >> Joe quite correctly pointed out in the survey that Streams.java is a mix >> of two things for two audiences: >> >> - Utility methods for users to generate streams, like intRange() >> - Low level methods for library writers to generate streams from >> things like iterators or spliterators. >> >> Merging them in one file is confusing, because users come away with the >> idea that writing spliterators is something they're supposed to do, >> whereas in reality, if we've done our jobs, they should never even be >> aware that spliterators exist. So I think we should separate them into >> a "high level" and "low level" bag of tricks. >> >> Since today, Paul has added some new ones: >> - singletonStream(v) (four flavors) >> - builder() (four flavors) >> >> So, we have to identify appropriate homes for the two groupings, and >> separate them. Here's a first cut at separating them: >> >> High level: >> xxxRange >> xxxBuilder >> emptyXxxStream >> singletonXxxStream >> concat >> zip >> >> Low level: >> all spliterator-related stream building methods >> >> Not sure where (or even if): >> iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...) >> generate (infinite stream of independent applications of a generator, >> good for infinite constant and random streams, though not much else, >> used by impl of Random.{ints,longs,gaussians}). >> >> Others that we've talked about adding: >> ints(), longs() // to enable things like ints().filter(...).limit(n) >> indexedGenerate(i -> T) >> >> >> >> I think the high-level stuff should stay in Streams. So we need a name >> for the low-level stuff. (Which also then becomes the right home for >> "how do I turn my data sturcture into a stream" doc.) >> >> What should we call that? From brian.goetz at oracle.com Sat Apr 20 08:46:10 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 20 Apr 2013 11:46:10 -0400 Subject: Varargs stream factory methods Message-ID: <5172B842.5090804@oracle.com> Currently we have, in Arrays: public static Stream stream(T[] array) { return stream(array, 0, array.length); } public static IntStream stream(int[] array) { return stream(array, 0, array.length); } etc. We *could* make these varargs methods, which is useful as creating ad-hoc stream literals: Arrays.stream(1, 2, 4, 8).map(...) The downside is that we would have to lose (or rename) methods like: public static IntStream stream(int[] array, int fromIndex, int toIndex) { since stream(1, 2, 3) would be ambiguous. Probably better, make these static factories in the various stream interfaces: Stream.of("foo", "bar") IntStream.of(1, 2, 4, 8) From brian.goetz at oracle.com Sat Apr 20 08:46:46 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 20 Apr 2013 11:46:46 -0400 Subject: Dividing Streams.java In-Reply-To: <6E8C35D9-3391-4F3A-9B6E-80F3DDF6F9C4@gmail.com> References: <5166F804.50101@oracle.com> <5170A992.80703@oracle.com> <6E8C35D9-3391-4F3A-9B6E-80F3DDF6F9C4@gmail.com> Message-ID: <5172B866.3050007@oracle.com> Unless anyone objects, I plan to do this. On 4/19/2013 11:29 AM, Sam Pullara wrote: > I think it is a good idea to move all of them to their interfaces. Much easier to find. > > Sam > > On Apr 18, 2013, at 7:18 PM, Brian Goetz wrote: > >> Now that we've cleared away the spliterator methods from Streams, all, or nearly all, of the remaining methods in Streams are candidates for moving to the respective interfaces. And in many ways get nicer when they do. >> >> We've got: >> >> builder() >> emptyStream() >> singletonStream() >> iterate() >> generate() >> >> for all the types (so emptyStream(), emptyIntStream(), etc), plus ranges for the numeric types. Plus >> >> concat() >> zip() >> >> for ref streams. >> >> All of these are good candidate for statics in their respective interfaces: >> >> Stream.builder() >> Stream.emptyStream(); >> IntStream.generate(f); >> IntStream.range(f); >> >> They read well, most are "important" enough to live with the main interface, and the names get less redundant since we don't have to say "intRange" but just "range". >> >> All of them? Most of them? None of them? >> >> On 4/11/2013 1:51 PM, Brian Goetz wrote: >>> Joe quite correctly pointed out in the survey that Streams.java is a mix >>> of two things for two audiences: >>> >>> - Utility methods for users to generate streams, like intRange() >>> - Low level methods for library writers to generate streams from >>> things like iterators or spliterators. >>> >>> Merging them in one file is confusing, because users come away with the >>> idea that writing spliterators is something they're supposed to do, >>> whereas in reality, if we've done our jobs, they should never even be >>> aware that spliterators exist. So I think we should separate them into >>> a "high level" and "low level" bag of tricks. >>> >>> Since today, Paul has added some new ones: >>> - singletonStream(v) (four flavors) >>> - builder() (four flavors) >>> >>> So, we have to identify appropriate homes for the two groupings, and >>> separate them. Here's a first cut at separating them: >>> >>> High level: >>> xxxRange >>> xxxBuilder >>> emptyXxxStream >>> singletonXxxStream >>> concat >>> zip >>> >>> Low level: >>> all spliterator-related stream building methods >>> >>> Not sure where (or even if): >>> iterate (given T0 and f, infinite stream of T0, f(T0), f(f(T0)), ...) >>> generate (infinite stream of independent applications of a generator, >>> good for infinite constant and random streams, though not much else, >>> used by impl of Random.{ints,longs,gaussians}). >>> >>> Others that we've talked about adding: >>> ints(), longs() // to enable things like ints().filter(...).limit(n) >>> indexedGenerate(i -> T) >>> >>> >>> >>> I think the high-level stuff should stay in Streams. So we need a name >>> for the low-level stuff. (Which also then becomes the right home for >>> "how do I turn my data sturcture into a stream" doc.) >>> >>> What should we call that? > From tim at peierls.net Sat Apr 20 08:50:35 2013 From: tim at peierls.net (Tim Peierls) Date: Sat, 20 Apr 2013 11:50:35 -0400 Subject: Varargs stream factory methods In-Reply-To: <5172B842.5090804@oracle.com> References: <5172B842.5090804@oracle.com> Message-ID: On Sat, Apr 20, 2013 at 11:46 AM, Brian Goetz wrote: > Currently we have, in Arrays: > > public static Stream stream(T[] array) { > return stream(array, 0, array.length); > } > > public static IntStream stream(int[] array) { > return stream(array, 0, array.length); > } > > etc. > > We *could* make these varargs methods, which is useful as creating ad-hoc > stream literals: > > Arrays.stream(1, 2, 4, 8).map(...) > > The downside is that we would have to lose (or rename) methods like: > > public static IntStream stream(int[] array, > int fromIndex, int toIndex) { > > since stream(1, 2, 3) would be ambiguous. > > Probably better, make these static factories in the various stream > interfaces: > > Stream.of("foo", "bar") > > IntStream.of(1, 2, 4, 8) > I'm used to varargs static factories named "of" from Guava, so that last approach appeals to me. --tim From spullara at gmail.com Sat Apr 20 10:02:12 2013 From: spullara at gmail.com (Sam Pullara) Date: Sat, 20 Apr 2013 10:02:12 -0700 Subject: Varargs stream factory methods In-Reply-To: <5172B842.5090804@oracle.com> References: <5172B842.5090804@oracle.com> Message-ID: <6244092293909388376@unknownmsgid> I like the .of() idea better than overloading .stream(). Sam On Apr 20, 2013, at 8:47 AM, Brian Goetz wrote: > Currently we have, in Arrays: > > public static Stream stream(T[] array) { > return stream(array, 0, array.length); > } > > public static IntStream stream(int[] array) { > return stream(array, 0, array.length); > } > > etc. > > We *could* make these varargs methods, which is useful as creating ad-hoc stream literals: > > Arrays.stream(1, 2, 4, 8).map(...) > > The downside is that we would have to lose (or rename) methods like: > > public static IntStream stream(int[] array, > int fromIndex, int toIndex) { > > since stream(1, 2, 3) would be ambiguous. > > Probably better, make these static factories in the various stream interfaces: > > Stream.of("foo", "bar") > > IntStream.of(1, 2, 4, 8) > From forax at univ-mlv.fr Sat Apr 20 13:38:28 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 20 Apr 2013 22:38:28 +0200 Subject: Varargs stream factory methods In-Reply-To: <6244092293909388376@unknownmsgid> References: <5172B842.5090804@oracle.com> <6244092293909388376@unknownmsgid> Message-ID: <5172FCC4.5030403@univ-mlv.fr> On 04/20/2013 07:02 PM, Sam Pullara wrote: > I like the .of() idea better than overloading .stream(). > > Sam I agree, 'of' is already used in EnumSet for that purpose. R?mi > > On Apr 20, 2013, at 8:47 AM, Brian Goetz wrote: > >> Currently we have, in Arrays: >> >> public static Stream stream(T[] array) { >> return stream(array, 0, array.length); >> } >> >> public static IntStream stream(int[] array) { >> return stream(array, 0, array.length); >> } >> >> etc. >> >> We *could* make these varargs methods, which is useful as creating ad-hoc stream literals: >> >> Arrays.stream(1, 2, 4, 8).map(...) >> >> The downside is that we would have to lose (or rename) methods like: >> >> public static IntStream stream(int[] array, >> int fromIndex, int toIndex) { >> >> since stream(1, 2, 3) would be ambiguous. >> >> Probably better, make these static factories in the various stream interfaces: >> >> Stream.of("foo", "bar") >> >> IntStream.of(1, 2, 4, 8) >> From brian.goetz at oracle.com Sat Apr 20 14:47:49 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 20 Apr 2013 17:47:49 -0400 Subject: Drop Arrays.parallelStream()? Message-ID: <51730D05.5030007@oracle.com> We dropped the parallel versions of all the static generator/factory methods in Streams a while ago, in favor of just letting people do (say) IntStream.range(...).parallel(). Since then, we have also greatly reduce the runtime cost of Stream.parallel(). We still have the separate .parallelStream() method on Collection and in the static methods in Arrays. I still really like Collection.parallelStream; it has huge discoverability advantages, and offers a pretty big return on API surface area -- one more method, but provides value in a lot of places, since Collection will be a really common case of a stream source. Arrays are in a middle ground. We have eight Arrays.stream() methods and eight Arrays.parallelStream() methods (four types, both whole-array and slice versions). I'm having a bit of a YAGNI twinge for the Arrays.parallelStream forms, and could see ditching them. (The implementations are trivial and small, so that is not an argument to ditch them -- we should make this decision purely on API considerations.) If we did this, Collection would have the sole parallelStream method; everything else would have to go through .parallel(). Which seems fine to me. From mike.duigou at oracle.com Sat Apr 20 15:01:29 2013 From: mike.duigou at oracle.com (Mike Duigou) Date: Sat, 20 Apr 2013 15:01:29 -0700 Subject: Drop Arrays.parallelStream()? In-Reply-To: <51730D05.5030007@oracle.com> References: <51730D05.5030007@oracle.com> Message-ID: On Apr 20 2013, at 14:47 , Brian Goetz wrote: > We dropped the parallel versions of all the static generator/factory methods in Streams a while ago, in favor of just letting people do (say) IntStream.range(...).parallel(). Since then, we have also greatly reduce the runtime cost of Stream.parallel(). > > We still have the separate .parallelStream() method on Collection and in the static methods in Arrays. > > I still really like Collection.parallelStream; it has huge discoverability advantages, and offers a pretty big return on API surface area -- one more method, but provides value in a lot of places, since Collection will be a really common case of a stream source. > > Arrays are in a middle ground. We have eight Arrays.stream() methods and eight Arrays.parallelStream() methods (four types, both whole-array and slice versions). I'm having a bit of a YAGNI twinge for the Arrays.parallelStream forms, and could see ditching them. (The implementations are trivial and small, so that is not an argument to ditch them -- we should make this decision purely on API considerations.) > > If we did this, Collection would have the sole parallelStream method; everything else would have to go through .parallel(). Which seems fine to me. > I would probably always use always .stream().parallel() idiomatically for consistency unless parallelStream() told me why I should use it instead. I say toss all of the parallelStream() methods unless there's an impl efficiency dependent reason to retain some of them. Mike From tim at peierls.net Sat Apr 20 15:10:33 2013 From: tim at peierls.net (Tim Peierls) Date: Sat, 20 Apr 2013 18:10:33 -0400 Subject: Drop Arrays.parallelStream()? In-Reply-To: References: <51730D05.5030007@oracle.com> Message-ID: On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou wrote: > I would probably always use always .stream().parallel() idiomatically for > consistency unless parallelStream() told me why I should use it instead. I > say toss all of the parallelStream() methods unless there's an impl > efficiency dependent reason to retain some of them. > Agreed. I see the discoverability of Collection.parallelStream() as a potential pedagogical drawback. "Do I use parallelStream() or stream().parallel()?" For most folks, the expectation and intuition will be sequential, so take advantage of that: Let people come to c.stream().parallel() slowly and deliberately, after getting their feet wet with c.stream(). --tim From joe.bowbeer at gmail.com Sat Apr 20 15:16:46 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 20 Apr 2013 15:16:46 -0700 Subject: Drop Arrays.parallelStream()? In-Reply-To: References: <51730D05.5030007@oracle.com> Message-ID: I agree with Mike and Tim. I'd remove all the parallelStream() methods now - and add some or all back later if they ARE needed. I don't like the inconsistency of having parallelStream available on some stream factories and not on others. On Sat, Apr 20, 2013 at 3:10 PM, Tim Peierls wrote: > On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou wrote: > >> I would probably always use always .stream().parallel() idiomatically for >> consistency unless parallelStream() told me why I should use it instead. I >> say toss all of the parallelStream() methods unless there's an impl >> efficiency dependent reason to retain some of them. >> > > Agreed. > > I see the discoverability of Collection.parallelStream() as a potential > pedagogical drawback. "Do I use parallelStream() or stream().parallel()?" > > For most folks, the expectation and intuition will be sequential, so take > advantage of that: Let people come to c.stream().parallel() slowly and > deliberately, after getting their feet wet with c.stream(). > > --tim > From brian.goetz at oracle.com Sat Apr 20 15:28:04 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 20 Apr 2013 18:28:04 -0400 Subject: Drop Arrays.parallelStream()? In-Reply-To: References: <51730D05.5030007@oracle.com> Message-ID: <51731674.4010401@oracle.com> > For most folks, the expectation and intuition will be sequential, so > take advantage of that: Let people come to c.stream().parallel() slowly > and deliberately, after getting their feet wet with c.stream(). I have a slightly different viewpoint about the value of this sequential intuition -- I view the pervasive "sequential expectation" as one if the biggest challenges of this entire effort; people are *constantly* bringing their incorrect sequential bias, which leads them to do stupid things like using a one-element array as a way to "trick" the "stupid" compiler into letting them capture a mutable local, or using lambdas as arguments to map that mutate state that will be used during the computation (in a non-thread-safe way), and then, when its pointed out that what they're doing, shrug it off and say "yeah, but I'm not doing it in parallel." We've made a lot of design tradeoffs to merge sequential and parallel streams. The result, I believe, is a clean one and will add to the library's chances of still being useful in 10+ years, but I don't particularly like the idea of encouraging people to think this is a sequential library with some parallel bags nailed on the side. From brian.goetz at oracle.com Sat Apr 20 15:37:08 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 20 Apr 2013 18:37:08 -0400 Subject: Drop Arrays.parallelStream()? In-Reply-To: References: <51730D05.5030007@oracle.com> Message-ID: <51731894.80107@oracle.com> For what its worth, the internal tracking title of this project is "Bulk data-parallel operations on Collections." I'm not willing to relegate such central functionality to something that is tucked into a remote corner of the API -- it was *already* a huge (but warranted) discoverability compromise to have the stream/parallelStream "bun" methods in the first place! Two buns for this case -- which will be probably 90% of stream constructions -- would be too much. So, I cannot see my way to removing Collection.parallelStream. However, I am willing to ditch the parallel versions of the static stream factory methods, largely on the basis that the Collection versions will be used 100x as much as any one of the static factories. The "inconsistency" of this position doesn't bother me one tiny bit; it is a pragmatic compromise. (In fact, I'm not even sure its an inconsistency at all, since they're kind of different beasts -- one is a static factory, the other is a view onto an existing data structure.) So I'm willing to meet you 95% of the way there. On 4/20/2013 6:16 PM, Joe Bowbeer wrote: > I agree with Mike and Tim. I'd remove all the parallelStream() methods > now - and add some or all back later if they ARE needed. > > I don't like the inconsistency of having parallelStream available on > some stream factories and not on others. > > > > > On Sat, Apr 20, 2013 at 3:10 PM, Tim Peierls > wrote: > > On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou > wrote: > > I would probably always use always .stream().parallel() > idiomatically for consistency unless parallelStream() told me > why I should use it instead. I say toss all of the > parallelStream() methods unless there's an impl efficiency > dependent reason to retain some of them. > > > Agreed. > > I see the discoverability of Collection.parallelStream() as a > potential pedagogical drawback. "Do I use parallelStream() or > stream().parallel()?" > > For most folks, the expectation and intuition will be sequential, so > take advantage of that: Let people come to c.stream().parallel() > slowly and deliberately, after getting their feet wet with c.stream(). > > --tim > > From brian.goetz at oracle.com Sat Apr 20 15:38:34 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 20 Apr 2013 18:38:34 -0400 Subject: Drop Arrays.parallelStream()? In-Reply-To: References: <51730D05.5030007@oracle.com> Message-ID: <517318EA.7060801@oracle.com> > For most folks, the expectation and intuition will be sequential, so > take advantage of that: Let people come to c.stream().parallel() slowly > and deliberately, after getting their feet wet with c.stream(). I have a slightly different viewpoint about the value of this sequential intuition -- I view the pervasive "sequential expectation" as one if the biggest challenges of this entire effort; people are *constantly* bringing their incorrect sequential bias, which leads them to do stupid things like using a one-element array as a way to "trick" the "stupid" compiler into letting them capture a mutable local, or using lambdas as arguments to map that mutate state that will be used during the computation (in a non-thread-safe way), and then, when its pointed out that what they're doing, shrug it off and say "yeah, but I'm not doing it in parallel." We've made a lot of design tradeoffs to merge sequential and parallel streams. The result, I believe, is a clean one and will add to the library's chances of still being useful in 10+ years, but I don't particularly like the idea of encouraging people to think this is a sequential library with some parallel bags nailed on the side. From joe.bowbeer at gmail.com Sat Apr 20 16:12:12 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 20 Apr 2013 16:12:12 -0700 Subject: Drop Arrays.parallelStream()? In-Reply-To: <51731894.80107@oracle.com> References: <51730D05.5030007@oracle.com> <51731894.80107@oracle.com> Message-ID: Brian, What do you mean by the following? it was *already* a huge (but warranted) discoverability compromise to have > the stream/parallelStream "bun" methods in the first place! Two buns for > this case [...] would be too much. Are you referring to the fact the there is no ParallelStream type? Note that the common point that Mike, Tim, and I raised is consistency. Your proposal to remove methods is creating the inconsistency, so I don't understand the comment that you're meeting us 95% of the way there... That said, I think I can view Collection and Arrays as two different things that have little bearing on each other (if I squint). Still, why, if you're so interested in advertising the parallel features, do you *want* to remove these methods from Arrays? Finally, Brian writes: but I don't particularly like the idea of encouraging people to think this > is a sequential library with some parallel bags nailed on the side Then again, users like consistency... Joe On Sat, Apr 20, 2013 at 3:37 PM, Brian Goetz wrote: > For what its worth, the internal tracking title of this project is "Bulk > data-parallel operations on Collections." I'm not willing to relegate such > central functionality to something that is tucked into a remote corner of > the API -- it was *already* a huge (but warranted) discoverability > compromise to have the stream/parallelStream "bun" methods in the first > place! Two buns for this case -- which will be probably 90% of stream > constructions -- would be too much. So, I cannot see my way to removing > Collection.parallelStream. However, I am willing to ditch the parallel > versions of the static stream factory methods, largely on the basis that > the Collection versions will be used 100x as much as any one of the static > factories. > > The "inconsistency" of this position doesn't bother me one tiny bit; it is > a pragmatic compromise. (In fact, I'm not even sure its an inconsistency > at all, since they're kind of different beasts -- one is a static factory, > the other is a view onto an existing data structure.) > > So I'm willing to meet you 95% of the way there. > > > > On 4/20/2013 6:16 PM, Joe Bowbeer wrote: > >> I agree with Mike and Tim. I'd remove all the parallelStream() methods >> now - and add some or all back later if they ARE needed. >> >> I don't like the inconsistency of having parallelStream available on >> some stream factories and not on others. >> >> >> >> >> On Sat, Apr 20, 2013 at 3:10 PM, Tim Peierls > > wrote: >> >> On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou > > wrote: >> >> I would probably always use always .stream().parallel() >> idiomatically for consistency unless parallelStream() told me >> why I should use it instead. I say toss all of the >> parallelStream() methods unless there's an impl efficiency >> dependent reason to retain some of them. >> >> >> Agreed. >> >> I see the discoverability of Collection.parallelStream() as a >> potential pedagogical drawback. "Do I use parallelStream() or >> stream().parallel()?" >> >> For most folks, the expectation and intuition will be sequential, so >> take advantage of that: Let people come to c.stream().parallel() >> slowly and deliberately, after getting their feet wet with c.stream(). >> >> --tim >> >> >> From brian.goetz at oracle.com Sat Apr 20 16:29:00 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 20 Apr 2013 19:29:00 -0400 Subject: Drop Arrays.parallelStream()? In-Reply-To: References: <51730D05.5030007@oracle.com> <51731894.80107@oracle.com> Message-ID: <517324BC.2050003@oracle.com> > Brian, What do you mean by the following? > > it was *already* a huge (but warranted) discoverability compromise > to have the stream/parallelStream "bun" methods in the first place! > Two buns for this case [...] would be too much. > > Are you referring to the fact the there is no ParallelStream type? No, that's a big plus -- one Stream to rule them all! The negative was having to have the .stream() and .parallelStream() methods at all. We originally really liked the idea of collection.filter(..)... ^ no view method! but for various reasons reluctantly concluded it was untenable. But that still doesn't mean we like having the new functionality be so far removed from Collection. And if one layer removed is suboptimal, two is worse. > Note that the common point that Mike, Tim, and I raised is consistency. > Your proposal to remove methods is creating the inconsistency, so I Not really. We were already inconsistent; they were present for Collection and for Array factories but not for range factories, generator factories, etc. You could argue the new proposed state is more consistent (all the view methods have a parallel counterpart; all the static factories don't) but that's not what I consider its primary benefit. > don't understand the comment that you're meeting us 95% of the way there... Removing all but one of the parallelStream methods. > Still, why, if you're so interested in advertising the parallel > features, do you *want* to remove these methods from Arrays? Simply: return on API surface area. The return for having it the one extra method on Collection is large; the return for having 100 extra methods for 100 infrequently-used factory methods is small (even in the aggregate). Arrays.parallelStream() was eight more methods -- four types x two forms. (Others here have argued that "too many forms of the same method is a smell"; also there have been plenty of calls for a round of YAGNIism.) > but I don't particularly like the idea of encouraging people to > think this is a sequential library with some parallel bags nailed on > the side > > Then again, users like consistency... I'm not saying consistency is unimportant. But in my experience, "consistency" can be used to justify nearly any position -- one can always find a precedent in a complex system to be "consistent" with. So I want more than mere consistency. (Not to mention many consistencies are of the "foolish hobgoblin" variety.) From jgetino at telefonica.net Sat Apr 20 16:41:33 2013 From: jgetino at telefonica.net (Jose) Date: Sun, 21 Apr 2013 01:41:33 +0200 Subject: lambda-libs-spec-observers Digest, Vol 8, Issue 30 In-Reply-To: References: Message-ID: -----Mensaje original----- De: lambda-libs-spec-observers-bounces at openjdk.java.net [mailto:lambda-libs-spec-observers-bounces at openjdk.java.net] En nombre de lambda-libs-spec-observers-request at openjdk.java.net Enviado el: domingo, 21 de abril de 2013 0:39 Para: lambda-libs-spec-observers at openjdk.java.net Asunto: lambda-libs-spec-observers Digest, Vol 8, Issue 30 Send lambda-libs-spec-observers mailing list submissions to lambda-libs-spec-observers at openjdk.java.net To subscribe or unsubscribe via the World Wide Web, visit http://mail.openjdk.java.net/mailman/listinfo/lambda-libs-spec-observers or, via email, send a message with subject or body 'help' to lambda-libs-spec-observers-request at openjdk.java.net You can reach the person managing the list at lambda-libs-spec-observers-owner at openjdk.java.net When replying, please edit your Subject line so it is more specific than "Re: Contents of lambda-libs-spec-observers digest..." Today's Topics: 1. Re: Varargs stream factory methods (Remi Forax) 2. Drop Arrays.parallelStream()? (Brian Goetz) 3. Re: Drop Arrays.parallelStream()? (Mike Duigou) 4. Re: Drop Arrays.parallelStream()? (Tim Peierls) 5. Re: Drop Arrays.parallelStream()? (Joe Bowbeer) 6. Re: Drop Arrays.parallelStream()? (Brian Goetz) 7. Re: Drop Arrays.parallelStream()? (Brian Goetz) ---------------------------------------------------------------------- Message: 1 Date: Sat, 20 Apr 2013 22:38:28 +0200 From: Remi Forax Subject: Re: Varargs stream factory methods To: lambda-libs-spec-experts at openjdk.java.net Message-ID: <5172FCC4.5030403 at univ-mlv.fr> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 04/20/2013 07:02 PM, Sam Pullara wrote: > I like the .of() idea better than overloading .stream(). > > Sam I agree, 'of' is already used in EnumSet for that purpose. R?mi > > On Apr 20, 2013, at 8:47 AM, Brian Goetz wrote: > >> Currently we have, in Arrays: >> >> public static Stream stream(T[] array) { >> return stream(array, 0, array.length); >> } >> >> public static IntStream stream(int[] array) { >> return stream(array, 0, array.length); >> } >> >> etc. >> >> We *could* make these varargs methods, which is useful as creating ad-hoc stream literals: >> >> Arrays.stream(1, 2, 4, 8).map(...) >> >> The downside is that we would have to lose (or rename) methods like: >> >> public static IntStream stream(int[] array, >> int fromIndex, int toIndex) { >> >> since stream(1, 2, 3) would be ambiguous. >> >> Probably better, make these static factories in the various stream interfaces: >> >> Stream.of("foo", "bar") >> >> IntStream.of(1, 2, 4, 8) >> ------------------------------ Message: 2 Date: Sat, 20 Apr 2013 17:47:49 -0400 From: Brian Goetz Subject: Drop Arrays.parallelStream()? To: "lambda-libs-spec-experts at openjdk.java.net" Message-ID: <51730D05.5030007 at oracle.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed We dropped the parallel versions of all the static generator/factory methods in Streams a while ago, in favor of just letting people do (say) IntStream.range(...).parallel(). Since then, we have also greatly reduce the runtime cost of Stream.parallel(). We still have the separate .parallelStream() method on Collection and in the static methods in Arrays. I still really like Collection.parallelStream; it has huge discoverability advantages, and offers a pretty big return on API surface area -- one more method, but provides value in a lot of places, since Collection will be a really common case of a stream source. Arrays are in a middle ground. We have eight Arrays.stream() methods and eight Arrays.parallelStream() methods (four types, both whole-array and slice versions). I'm having a bit of a YAGNI twinge for the Arrays.parallelStream forms, and could see ditching them. (The implementations are trivial and small, so that is not an argument to ditch them -- we should make this decision purely on API considerations.) If we did this, Collection would have the sole parallelStream method; everything else would have to go through .parallel(). Which seems fine to me. ------------------------------ Message: 3 Date: Sat, 20 Apr 2013 15:01:29 -0700 From: Mike Duigou Subject: Re: Drop Arrays.parallelStream()? To: lambda-libs-spec-experts at openjdk.java.net Message-ID: Content-Type: text/plain; charset=iso-8859-1 On Apr 20 2013, at 14:47 , Brian Goetz wrote: > We dropped the parallel versions of all the static generator/factory methods in Streams a while ago, in favor of just letting people do (say) IntStream.range(...).parallel(). Since then, we have also greatly reduce the runtime cost of Stream.parallel(). > > We still have the separate .parallelStream() method on Collection and in the static methods in Arrays. > > I still really like Collection.parallelStream; it has huge discoverability advantages, and offers a pretty big return on API surface area -- one more method, but provides value in a lot of places, since Collection will be a really common case of a stream source. > > Arrays are in a middle ground. We have eight Arrays.stream() methods and eight Arrays.parallelStream() methods (four types, both whole-array and slice versions). I'm having a bit of a YAGNI twinge for the Arrays.parallelStream forms, and could see ditching them. (The implementations are trivial and small, so that is not an argument to ditch them -- we should make this decision purely on API considerations.) > > If we did this, Collection would have the sole parallelStream method; everything else would have to go through .parallel(). Which seems fine to me. > I would probably always use always .stream().parallel() idiomatically for consistency unless parallelStream() told me why I should use it instead. I say toss all of the parallelStream() methods unless there's an impl efficiency dependent reason to retain some of them. Mike ------------------------------ Message: 4 Date: Sat, 20 Apr 2013 18:10:33 -0400 From: Tim Peierls Subject: Re: Drop Arrays.parallelStream()? To: Mike Duigou Cc: lambda-libs-spec-experts at openjdk.java.net Message-ID: Content-Type: text/plain; charset=ISO-8859-1 On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou wrote: > I would probably always use always .stream().parallel() idiomatically for > consistency unless parallelStream() told me why I should use it instead. I > say toss all of the parallelStream() methods unless there's an impl > efficiency dependent reason to retain some of them. > Agreed. I see the discoverability of Collection.parallelStream() as a potential pedagogical drawback. "Do I use parallelStream() or stream().parallel()?" For most folks, the expectation and intuition will be sequential, so take advantage of that: Let people come to c.stream().parallel() slowly and deliberately, after getting their feet wet with c.stream(). --tim ------------------------------ Message: 5 Date: Sat, 20 Apr 2013 15:16:46 -0700 From: Joe Bowbeer Subject: Re: Drop Arrays.parallelStream()? To: Tim Peierls Cc: "lambda-libs-spec-experts at openjdk.java.net" Message-ID: Content-Type: text/plain; charset=ISO-8859-1 I agree with Mike and Tim. I'd remove all the parallelStream() methods now - and add some or all back later if they ARE needed. I don't like the inconsistency of having parallelStream available on some stream factories and not on others. On Sat, Apr 20, 2013 at 3:10 PM, Tim Peierls wrote: > On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou wrote: > >> I would probably always use always .stream().parallel() idiomatically for >> consistency unless parallelStream() told me why I should use it instead. I >> say toss all of the parallelStream() methods unless there's an impl >> efficiency dependent reason to retain some of them. >> > > Agreed. > > I see the discoverability of Collection.parallelStream() as a potential > pedagogical drawback. "Do I use parallelStream() or stream().parallel()?" > > For most folks, the expectation and intuition will be sequential, so take > advantage of that: Let people come to c.stream().parallel() slowly and > deliberately, after getting their feet wet with c.stream(). > > --tim > ------------------------------ Message: 6 Date: Sat, 20 Apr 2013 18:28:04 -0400 From: Brian Goetz Subject: Re: Drop Arrays.parallelStream()? To: Tim Peierls Cc: lambda-libs-spec-experts at openjdk.java.net Message-ID: <51731674.4010401 at oracle.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed > For most folks, the expectation and intuition will be sequential, so > take advantage of that: Let people come to c.stream().parallel() slowly > and deliberately, after getting their feet wet with c.stream(). I have a slightly different viewpoint about the value of this sequential intuition -- I view the pervasive "sequential expectation" as one if the biggest challenges of this entire effort; people are *constantly* bringing their incorrect sequential bias, which leads them to do stupid things like using a one-element array as a way to "trick" the "stupid" compiler into letting them capture a mutable local, or using lambdas as arguments to map that mutate state that will be used during the computation (in a non-thread-safe way), and then, when its pointed out that what they're doing, shrug it off and say "yeah, but I'm not doing it in parallel." We've made a lot of design tradeoffs to merge sequential and parallel streams. The result, I believe, is a clean one and will add to the library's chances of still being useful in 10+ years, but I don't particularly like the idea of encouraging people to think this is a sequential library with some parallel bags nailed on the side. ------------------------------ Message: 7 Date: Sat, 20 Apr 2013 18:37:08 -0400 From: Brian Goetz Subject: Re: Drop Arrays.parallelStream()? To: Joe Bowbeer Cc: "lambda-libs-spec-experts at openjdk.java.net" Message-ID: <51731894.80107 at oracle.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed For what its worth, the internal tracking title of this project is "Bulk data-parallel operations on Collections." I'm not willing to relegate such central functionality to something that is tucked into a remote corner of the API -- it was *already* a huge (but warranted) discoverability compromise to have the stream/parallelStream "bun" methods in the first place! Two buns for this case -- which will be probably 90% of stream constructions -- would be too much. So, I cannot see my way to removing Collection.parallelStream. However, I am willing to ditch the parallel versions of the static stream factory methods, largely on the basis that the Collection versions will be used 100x as much as any one of the static factories. The "inconsistency" of this position doesn't bother me one tiny bit; it is a pragmatic compromise. (In fact, I'm not even sure its an inconsistency at all, since they're kind of different beasts -- one is a static factory, the other is a view onto an existing data structure.) So I'm willing to meet you 95% of the way there. On 4/20/2013 6:16 PM, Joe Bowbeer wrote: > I agree with Mike and Tim. I'd remove all the parallelStream() methods > now - and add some or all back later if they ARE needed. > > I don't like the inconsistency of having parallelStream available on > some stream factories and not on others. > > > > > On Sat, Apr 20, 2013 at 3:10 PM, Tim Peierls > wrote: > > On Sat, Apr 20, 2013 at 6:01 PM, Mike Duigou > wrote: > > I would probably always use always .stream().parallel() > idiomatically for consistency unless parallelStream() told me > why I should use it instead. I say toss all of the > parallelStream() methods unless there's an impl efficiency > dependent reason to retain some of them. > > > Agreed. > > I see the discoverability of Collection.parallelStream() as a > potential pedagogical drawback. "Do I use parallelStream() or > stream().parallel()?" > > For most folks, the expectation and intuition will be sequential, so > take advantage of that: Let people come to c.stream().parallel() > slowly and deliberately, after getting their feet wet with c.stream(). > > --tim > > End of lambda-libs-spec-observers Digest, Vol 8, Issue 30 ********************************************************* From joe.bowbeer at gmail.com Sat Apr 20 16:47:08 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 20 Apr 2013 16:47:08 -0700 Subject: Drop Arrays.parallelStream()? In-Reply-To: <517324BC.2050003@oracle.com> References: <51730D05.5030007@oracle.com> <51731894.80107@oracle.com> <517324BC.2050003@oracle.com> Message-ID: Thanks for clarifying. Most of your justifications seem to be a matter of taste, so there is no use arguing. (Taste, like foolishness, defies argument.) Alas, after cleansing my mind of "foolish hobgoblins" and other distracting remarks, I think your proposal is an improvement, even without squinting. On Apr 20, 2013 4:29 PM, "Brian Goetz" wrote: > Brian, What do you mean by the following? >> >> it was *already* a huge (but warranted) discoverability compromise >> to have the stream/parallelStream "bun" methods in the first place! >> Two buns for this case [...] would be too much. >> >> Are you referring to the fact the there is no ParallelStream type? >> > > No, that's a big plus -- one Stream to rule them all! > > The negative was having to have the > > .stream() > and > .parallelStream() > > methods at all. We originally really liked the idea of > > collection.filter(..)... > ^ no view method! > > but for various reasons reluctantly concluded it was untenable. But that > still doesn't mean we like having the new functionality be so far removed > from Collection. And if one layer removed is suboptimal, two is worse. > > Note that the common point that Mike, Tim, and I raised is consistency. >> Your proposal to remove methods is creating the inconsistency, so I >> > > Not really. We were already inconsistent; they were present for > Collection and for Array factories but not for range factories, generator > factories, etc. You could argue the new proposed state is more consistent > (all the view methods have a parallel counterpart; all the static factories > don't) but that's not what I consider its primary benefit. > > don't understand the comment that you're meeting us 95% of the way >> there... >> > > Removing all but one of the parallelStream methods. > > Still, why, if you're so interested in advertising the parallel >> features, do you *want* to remove these methods from Arrays? >> > > Simply: return on API surface area. The return for having it the one > extra method on Collection is large; the return for having 100 extra > methods for 100 infrequently-used factory methods is small (even in the > aggregate). Arrays.parallelStream() was eight more methods -- four types x > two forms. (Others here have argued that "too many forms of the same > method is a smell"; also there have been plenty of calls for a round of > YAGNIism.) > > but I don't particularly like the idea of encouraging people to >> think this is a sequential library with some parallel bags nailed on >> the side >> >> Then again, users like consistency... >> > > I'm not saying consistency is unimportant. But in my experience, > "consistency" can be used to justify nearly any position -- one can always > find a precedent in a complex system to be "consistent" with. So I want > more than mere consistency. (Not to mention many consistencies are of the > "foolish hobgoblin" variety.) > From tim at peierls.net Sat Apr 20 16:49:18 2013 From: tim at peierls.net (Tim Peierls) Date: Sat, 20 Apr 2013 19:49:18 -0400 Subject: Drop Arrays.parallelStream()? In-Reply-To: References: <51730D05.5030007@oracle.com> <51731894.80107@oracle.com> <517324BC.2050003@oracle.com> Message-ID: On Sat, Apr 20, 2013 at 7:47 PM, Joe Bowbeer wrote: > Thanks for clarifying. > > Most of your justifications seem to be a matter of taste, so there is no > use arguing. (Taste, like foolishness, defies argument.) > > Alas, after cleansing my mind of "foolish hobgoblins" and other > distracting remarks, I think your proposal is an improvement, even without > squinting. > Why "alas"? Or did was it auto-corrected/mistyped from "Also"? --tim From joe.bowbeer at gmail.com Sat Apr 20 16:56:16 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sat, 20 Apr 2013 16:56:16 -0700 Subject: Drop Arrays.parallelStream()? In-Reply-To: References: <51730D05.5030007@oracle.com> <51731894.80107@oracle.com> <517324BC.2050003@oracle.com> Message-ID: Strike "Alas". Thanks. On Sat, Apr 20, 2013 at 4:49 PM, Tim Peierls wrote: > On Sat, Apr 20, 2013 at 7:47 PM, Joe Bowbeer wrote: > >> Thanks for clarifying. >> >> Most of your justifications seem to be a matter of taste, so there is no >> use arguing. (Taste, like foolishness, defies argument.) >> >> Alas, after cleansing my mind of "foolish hobgoblins" and other >> distracting remarks, I think your proposal is an improvement, even without >> squinting. >> > > Why "alas"? Or did was it auto-corrected/mistyped from "Also"? > > --tim > From brian.goetz at oracle.com Sun Apr 21 11:19:35 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 21 Apr 2013 14:19:35 -0400 Subject: Static methods on Stream and friends Message-ID: <51742DB7.1000302@oracle.com> I moved the following from Streams to Stream: Stream.builder() Stream.empty() Stream.singleton(T) Stream.of(T...) Stream.iterate(T, T -> T) Stream.generate(i -> T) with the same on {Int,Long,Double}Stream, and also {Int,Long,Double}Stream.range(start, end) {Int,Long,Double}Stream.range(start, end, step) It was suggested on lambda-dev that we should rename singleton to simply be an overload of "of": Stream.of(T) Stream.of(T...) which seems reasonable. Remaining open issues: - Some people are unhappy that range is half-open (which also means people are constrained to ranges topping out at MAX_VALUE-1 rather than MAX_VALUE). Some options: - Add XxxStream.rangeExclusive(start, end) - Further doc hints, such as renaming the parameters to startInclusive / endExclusive - Nothing - Paul has suggested that generate be finite. While this is kind of yucky, the practical difference between infinite and long-sized is pretty much negligible, and the version based on LongStream.range().map() parallellizes much better. I propose to accept the suggestion of s/singleton/of/, go the "doc hint" route on range, and go finite on generate. Also never closed on whether there was value to ints() / longs() -- these show up in lots of teaching examples, though less so in real-world code. Still, teaching people how to think about this stuff is important. From tim at peierls.net Sun Apr 21 11:30:26 2013 From: tim at peierls.net (Tim Peierls) Date: Sun, 21 Apr 2013 14:30:26 -0400 Subject: Static methods on Stream and friends In-Reply-To: <51742DB7.1000302@oracle.com> References: <51742DB7.1000302@oracle.com> Message-ID: On Sun, Apr 21, 2013 at 2:19 PM, Brian Goetz wrote: > It was suggested on lambda-dev that we should rename singleton to simply > be an overload of "of": > > Stream.of(T) > Stream.of(T...) > > which seems reasonable. > Aren't there ambiguity problems with that pair of signatures? I would have thought something like this: Stream.of() // for empty Stream.of(T) // for singleton Stream.of(T, T, T...) // for two or more --tim From brian.goetz at oracle.com Sun Apr 21 11:35:32 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 21 Apr 2013 14:35:32 -0400 Subject: Static methods on Stream and friends In-Reply-To: References: <51742DB7.1000302@oracle.com> Message-ID: <51743174.1000007@oracle.com> Two things here: 1. Tim may be suggesting to go further and rename "Stream.empty" to "Stream.of()"? 2. Query about method selection. Method selection proceeds in three phases (see JLS 7/e 15.12.2): 1. no boxing or unboxing 2. with boxing/unboxing, but no varargs 3. with varargs. So, Stream.of(T) will be considered before Stream.of(T...) is -- even for boxed streams like Stream. So I believe there is no need to extend the variable arity signature to of(T, T, T...). On 4/21/2013 2:30 PM, Tim Peierls wrote: > On Sun, Apr 21, 2013 at 2:19 PM, Brian Goetz > wrote: > > It was suggested on lambda-dev that we should rename singleton to > simply be an overload of "of": > > Stream.of(T) > Stream.of(T...) > > which seems reasonable. > > > Aren't there ambiguity problems with that pair of signatures? I would > have thought something like this: > > Stream.of() // for empty > Stream.of(T) // for singleton > Stream.of(T, T, T...) // for two or more > > --tim From ali.ebrahimi1781 at gmail.com Sun Apr 21 14:05:19 2013 From: ali.ebrahimi1781 at gmail.com (Ali Ebrahimi) Date: Mon, 22 Apr 2013 01:35:19 +0430 Subject: Static methods on Stream and friends In-Reply-To: <51743174.1000007@oracle.com> References: <51742DB7.1000302@oracle.com> <51743174.1000007@oracle.com> Message-ID: Hi, don't you think varargs version support the two others? On Sun, Apr 21, 2013 at 11:05 PM, Brian Goetz wrote: > Two things here: > > 1. Tim may be suggesting to go further and rename "Stream.empty" to > "Stream.of()"? > > 2. Query about method selection. > > Method selection proceeds in three phases (see JLS 7/e 15.12.2): > > 1. no boxing or unboxing > 2. with boxing/unboxing, but no varargs > 3. with varargs. > > So, Stream.of(T) will be considered before Stream.of(T...) is -- even for > boxed streams like Stream. So I believe there is no need to > extend the variable arity signature to of(T, T, T...). > > > On 4/21/2013 2:30 PM, Tim Peierls wrote: > >> On Sun, Apr 21, 2013 at 2:19 PM, Brian Goetz > > wrote: >> >> It was suggested on lambda-dev that we should rename singleton to >> simply be an overload of "of": >> >> Stream.of(T) >> Stream.of(T...) >> >> which seems reasonable. >> >> >> Aren't there ambiguity problems with that pair of signatures? I would >> have thought something like this: >> >> Stream.of() // for empty >> Stream.of(T) // for singleton >> Stream.of(T, T, T...) // for two or more >> >> --tim >> > From kasperni at gmail.com Sun Apr 21 23:32:06 2013 From: kasperni at gmail.com (Kasper Nielsen) Date: Mon, 22 Apr 2013 13:32:06 +0700 Subject: Static methods on Stream and friends In-Reply-To: References: <51742DB7.1000302@oracle.com> Message-ID: > Stream.of() // for empty > Stream.of(T) // for singleton > Stream.of(T, T, T...) // for two or more > I've stopped using the latter form in my code. Because sometimes you actually want to use an array for creating stuff and not just the vararg signature. Having do something like Stream.of(a[0], a[1], Arrays.copyOf(a, 2)) is annoying and a waste of resources for larger arrays. From david.lloyd at redhat.com Mon Apr 22 06:42:51 2013 From: david.lloyd at redhat.com (David M. Lloyd) Date: Mon, 22 Apr 2013 08:42:51 -0500 Subject: Drop Arrays.parallelStream()? In-Reply-To: <517318EA.7060801@oracle.com> References: <51730D05.5030007@oracle.com> <517318EA.7060801@oracle.com> Message-ID: <51753E5B.2040600@redhat.com> On 04/20/2013 05:38 PM, Brian Goetz wrote: >> For most folks, the expectation and intuition will be sequential, so >> take advantage of that: Let people come to c.stream().parallel() slowly >> and deliberately, after getting their feet wet with c.stream(). > > I have a slightly different viewpoint about the value of this sequential > intuition -- I view the pervasive "sequential expectation" as one if the > biggest challenges of this entire effort; people are *constantly* > bringing their incorrect sequential bias, which leads them to do stupid > things like using a one-element array as a way to "trick" the "stupid" > compiler into letting them capture a mutable local, or using lambdas as > arguments to map that mutate state that will be used during the > computation (in a non-thread-safe way), and then, when its pointed out > that what they're doing, shrug it off and say "yeah, but I'm not doing > it in parallel." > > We've made a lot of design tradeoffs to merge sequential and parallel > streams. The result, I believe, is a clean one and will add to the > library's chances of still being useful in 10+ years, but I don't > particularly like the idea of encouraging people to think this is a > sequential library with some parallel bags nailed on the side. Well, just the term "stream" really screams "sequential", so there's that. -- - DML From paul.sandoz at oracle.com Mon Apr 22 09:09:56 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 22 Apr 2013 18:09:56 +0200 Subject: Pattern.splitAsStream/asPredicate Message-ID: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com> Hi, It seems useful to provide an ability to create a stream from matches of a pattern, plus as a bonus create a predicate for matches of a pattern. See below for more details: http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/ Thoughts? Paul. From joe.bowbeer at gmail.com Wed Apr 24 09:23:43 2013 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Wed, 24 Apr 2013 09:23:43 -0700 Subject: Pattern.splitAsStream/asPredicate In-Reply-To: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com> References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com> Message-ID: Makes sense to me that one might want to generate a stream from a Pattern. Is there more to this than splitAsStream? It's also interesting to consider the absence of parallel stream options at this and the other stream factory sites. On Apr 22, 2013 9:10 AM, "Paul Sandoz" wrote: > Hi, > > It seems useful to provide an ability to create a stream from matches of a > pattern, plus as a bonus create a predicate for matches of a pattern. > > See below for more details: > > http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/ > > Thoughts? > > Paul. > > From brian.goetz at oracle.com Wed Apr 24 10:13:35 2013 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 24 Apr 2013 12:13:35 -0500 Subject: Pattern.splitAsStream/asPredicate In-Reply-To: References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com> Message-ID: There definitely could be more to this. For example, a common usage pattern for matching is: while (more) { // get the next match // get the stuff between the last match and the start of this match // do something with that // do something with the current match } So while getting the matches is good, getting at the stuff between the matches is also sometimes useful. Is there an easy way to do that, such as providing a Stream? There's an easy way for streams like this to be never-parallel -- create them from a Spliterator whose trySplit always returns null. Then, even parallel execution will always be serial. I don't think there's a need for an abstraction for that -- just build off a non-splittable iterator. But, there may also be some parallelism to extract, if the post-processing on a match is high-Q. Then you might still be able to overcome the sequentiality of generating matches if the per-match post processing is high enough. On Apr 24, 2013, at 11:23 AM, Joe Bowbeer wrote: > Makes sense to me that one might want to generate a stream from a Pattern. Is there more to this than splitAsStream? > > It's also interesting to consider the absence of parallel stream options at this and the other stream factory sites. > > > > > On Apr 22, 2013 9:10 AM, "Paul Sandoz" wrote: > Hi, > > It seems useful to provide an ability to create a stream from matches of a pattern, plus as a bonus create a predicate for matches of a pattern. > > See below for more details: > > http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/ > > Thoughts? > > Paul. > From forax at univ-mlv.fr Wed Apr 24 10:16:55 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 24 Apr 2013 19:16:55 +0200 Subject: Pattern.splitAsStream/asPredicate In-Reply-To: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com> References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com> Message-ID: <51781387.90902@univ-mlv.fr> On 04/22/2013 06:09 PM, Paul Sandoz wrote: > Hi, > > It seems useful to provide an ability to create a stream from matches of a pattern, plus as a bonus create a predicate for matches of a pattern. > > See below for more details: > > http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/ > > Thoughts? > > Paul. > Hi Paul, MatcherIterator should not be a local class of splitAsStream, because the reference to the current Pattern will be kept even if the Matcher not reference if anymore (note that the current implementation of the Matcher always references the Pattern object but maybe at some point the automata will be transformed to bytecode as by example V8 does). To summarize, the class MatcherIterator defines 4 fields instead of 3. The is no need to initialize current and nextElement to their default values, javac emits bytecodes for that. in next(), the else is useless and it's rare in the jdk sources to find a else after a throw. in hasNext(), you can re-order the branch of the first test to avoid the code to be shifted to the right. if (nextElement != null) { return true; } if (current == input.length()) { ... and yes, this method is useful :) cheers, R?mi From forax at univ-mlv.fr Fri Apr 26 02:22:07 2013 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 26 Apr 2013 11:22:07 +0200 Subject: RFR : JDK-8001642 : Add Optional, OptionalDouble, OptionalInt, OptionalLong In-Reply-To: References: <513710CC.3010903@univ-mlv.fr> Message-ID: <517A473F.3030906@univ-mlv.fr> On 03/28/2013 07:23 PM, Kevin Bourrillion wrote: > I do NOT wish to restart this discussion; I just noticed a falsehood > that was never exposed: What I should have written is that Guava unlike the JDK allows to create an Optional from null, the fact that it stores null or not is an implementation detail. R?mi > > > On Wed, Mar 6, 2013 at 1:47 AM, Remi Forax > wrote: > > Google's Guava, which is a popular library, defines a class named > Optional, but allow to store null unlike the current proposed > implementation, this will generate a lot of confusions and > frustrations. > > > Guava's Optional /cannot/ be used to hold null. So this particular > concern is not a concern at all. > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > From paul.sandoz at oracle.com Fri Apr 26 03:37:59 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 26 Apr 2013 12:37:59 +0200 Subject: Pattern.splitAsStream/asPredicate In-Reply-To: <51781387.90902@univ-mlv.fr> References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com> <51781387.90902@univ-mlv.fr> Message-ID: <7C8F8184-93B4-4895-8F03-46340B84177A@oracle.com> On Apr 24, 2013, at 7:16 PM, Remi Forax wrote: > On 04/22/2013 06:09 PM, Paul Sandoz wrote: >> Hi, >> >> It seems useful to provide an ability to create a stream from matches of a pattern, plus as a bonus create a predicate for matches of a pattern. >> >> See below for more details: >> >> http://cr.openjdk.java.net/~psandoz/lambda/jdk-8012646/webrev/ >> >> Thoughts? >> >> Paul. >> > > Hi Paul, > MatcherIterator should not be a local class of splitAsStream, > because the reference to the current Pattern will be kept > even if the Matcher not reference if anymore > (note that the current implementation of the Matcher always references > the Pattern object but maybe at some point the automata will be > transformed to bytecode as by example V8 does). Matcher returns it too: /** * Returns the pattern that is interpreted by this matcher. * * @return The pattern for which this matcher was created */ public Pattern pattern() { > To summarize, the class MatcherIterator defines 4 fields instead of 3. > Yes, it's an inner class, but I prefer the locality, since splitAsStream is the only method that uses the class. > The is no need to initialize current and nextElement to their default values, > javac emits bytecodes for that. > > in next(), the else is useless and it's rare in the jdk sources to find a else after a throw. > in hasNext(), you can re-order the branch of the first test to avoid the code to be shifted to the right. > if (nextElement != null) { > return true; > } > if (current == input.length()) { > ... > Thanks i have cleaned up that code. Paul. > and yes, this method is useful :) > > cheers, > R?mi > > > > > From paul.sandoz at oracle.com Fri Apr 26 04:00:27 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 26 Apr 2013 13:00:27 +0200 Subject: Pattern.splitAsStream/asPredicate In-Reply-To: References: <869735E7-2AE3-4D0B-B7B8-D7FC462F718F@oracle.com> Message-ID: <42E53CA6-122F-41B8-9B49-F394396F5DAB@oracle.com> On Apr 24, 2013, at 7:13 PM, Brian Goetz wrote: > There definitely could be more to this. For example, a common usage pattern for matching is: > > while (more) { > // get the next match > // get the stuff between the last match and the start of this match > // do something with that > // do something with the current match > } > > So while getting the matches is good, getting at the stuff between the matches is also sometimes useful. Is there an easy way to do that, such as providing a Stream? > It's awkward with the current types. A Matcher of a Pattern is mutable and MatchResult (which would need to be cloned via Matcher.toMatchResult) only provides access to a match. The prefix characters before a match need to be tracked independently, as do the remaining characters after no further matches. So we would require a stream of say (String prefix, MatchResult r) where r is null, or an empty match, for the last tuple in the stream. We can add methods to Matcher that behave the same way as the String bearing methods: public String replaceAll(Function f) public String replaceFirst(Function f) > There's an easy way for streams like this to be never-parallel -- create them from a Spliterator whose trySplit always returns null. Then, even parallel execution will always be serial. I don't think there's a need for an abstraction for that -- just build off a non-splittable iterator. > > But, there may also be some parallelism to extract, if the post-processing on a match is high-Q. Then you might still be able to overcome the sequentiality of generating matches if the per-match post processing is high enough. > Right, i think it would be incorrect to make any predictions about Q. Paul. From tim at peierls.net Fri Apr 26 04:45:37 2013 From: tim at peierls.net (Tim Peierls) Date: Fri, 26 Apr 2013 07:45:37 -0400 Subject: RFR : JDK-8001642 : Add Optional, OptionalDouble, OptionalInt, OptionalLong In-Reply-To: <517A473F.3030906@univ-mlv.fr> References: <513710CC.3010903@univ-mlv.fr> <517A473F.3030906@univ-mlv.fr> Message-ID: On Fri, Apr 26, 2013 at 5:22 AM, Remi Forax wrote: > On 03/28/2013 07:23 PM, Kevin Bourrillion wrote: > >> I do NOT wish to restart this discussion; I just noticed a falsehood that >> was never exposed: Guava's Optional /cannot/ be used to hold null. So this >> particular concern is not a concern at all. > > > What I should have written is that Guava unlike the JDK allows to create > an Optional from null, > the fact that it stores null or not is an implementation detail. > Kevin's point was that there's no need to worry about confusion over this particular difference. The method used in Guava to create an Optional from a reference that might be null has a name that makes this very clear: Optional.fromNullable. Both Guava and JDK have null-rejecting Optional.of() methods. --tim