From dl at cs.oswego.edu Mon Oct 1 07:37:55 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 01 Oct 2012 10:37:55 -0400 Subject: Non interference enforcement In-Reply-To: <5063028B.8040800@univ-mlv.fr> References: <505399F5.80600@oracle.com> <50548FAB.5070602@cs.oswego.edu> <50573CFC.6070607@oracle.com> <505C626F.3080900@univ-mlv.fr> <505C9196.8020301@univ-mlv.fr> <505C9510.1090606@oracle.com> <505DA847.80207@cs.oswego.edu> <505DAE3A.1000702@univ-mlv.fr> <50604602.5040306@cs.oswego.edu> <50608569.7070603@oracle.com> <5060EB07.5060007@cs.oswego.edu> <5060F67B.6060802@oracle.com> <5061978C.5040605@cs.oswego.edu> <5061C43C.10501@oracle.com> <5063028B.8040800@univ-mlv.fr> Message-ID: <5069AAC3.5070903@cs.oswego.edu> On 09/26/12 09:26, Remi Forax wrote: > We currently ask users to write lambdas that doesn't interfere with the source > collection > of a stream but it's not enforced in the code. > > By example, > list.stream().forEach(e -> { list.remove(e); }); > may works or not depending how the pipeline is implemented. > > This is a serous departure for the current way java.util collections works > and I wonder if we should not keep the fail-fast guarantee for those collections. > Relatedly: The soon (I hope) forthcoming StampedLock will make efficient read-write-locked collections etc easier to build. Background: There is nothing you can do to fully automate such things because you don't know for sure whether methods on an arbitrary collection that "should" be read-only actually are. But for those that do/will exist, iterators are still a big problem since you cannot implement as: readLock; process elements; unlock (or partially optimistic variants). Instead, these must do some expense stuff on each iteration, and either fail or do some even more expensive stuff on interference. So, at least for the StampedLock-based Vector work-alike class that will be in j.u.c preview (and may be releasable in j.u.c), I include: void forEachReadOnly(Block action) That will be much cheaper wrt sync, and hopefully not negated by megamorphism etc. It would *almost* be a good idea to default to using this for any Stream tie-ins, but it seems impossible to read people's minds about intent here. So for now I leave this issue as a future possibility. -Doug From andrey.breslav at jetbrains.com Sun Oct 7 09:52:32 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Sun, 7 Oct 2012 20:52:32 +0400 Subject: Stream operations -- current set In-Reply-To: <505399F5.80600@oracle.com> References: <505399F5.80600@oracle.com> Message-ID: > Intermediate / Lazy (Stateless) > ------------------------------- > > MapStream mapped(Mapper mapper); From my experience, the opposite one is much more useful: MapStream mapped(Mapper mapper); Are we implying that in this case one should use groupBy() (create many pointless collections) and then map() to throw the collections away? > Intermediate / Lazy (Stateful) > ------------------------------ > > Stream uniqueElements(); > > Stream sorted(Comparator comparator); I don't see how these two operations fit in here: sort() has no chance of being lazy, AFAIU, and uniqueElements() needs misleadingly much state to be stored to be considered "reasonably lazy". If I am wrong, please correct me. > Stream cumulate(BinaryOperator operator); Could you explain what this does? > T[] toArray(ArrayFactory) I support this and suggest to get rid of Object[] toArray() > Don has suggested a multi-valued version of groupBy: > > Map> groupByMulti(FlatMapper classifier); > > which is easy to implement and makes sense to me. A philosophical question: what is our take on how minimal we want to keep the Stream API? > There are a few others in the maybe-should-have list, including limit/skip/slice. But I'd like to nail down the details of the must-haves first. I wouldn't call some of those "must-haves". So, the minimality question above is important. -- Andrey Breslav http://jetbrains.com Develop with pleasure! From brian.goetz at oracle.com Sun Oct 7 10:52:21 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 07 Oct 2012 13:52:21 -0400 Subject: Stream operations -- current set In-Reply-To: References: <505399F5.80600@oracle.com> Message-ID: <5071C155.6060805@oracle.com> >> MapStream mapped(Mapper mapper); > From my experience, the opposite one is much more useful: > MapStream mapped(Mapper mapper); > Are we implying that in this case one should use groupBy() (create many pointless collections) and then map() to throw the collections away? That works, but cheaper is: foos.mapped(mapper).swap() which transposes keys and values (probably needs a better name.) >> Intermediate / Lazy (Stateful) >> ------------------------------ >> >> Stream uniqueElements(); >> >> Stream sorted(Comparator comparator); > > I don't see how these two operations fit in here: sort() has no chance of being lazy, AFAIU, and uniqueElements() needs misleadingly much state to be stored to be considered "reasonably lazy". If I am wrong, please correct me. They are both lazy in the sense that no significant computation happens when the sort() method is called. Unlike filter/map/flatMap/keys/swap/etc, they are both stateful. Sort does not disgorge the first element until all the elements have been seen, but it is still possible to get some laziness benefit anyway; for example, if you use a heap to sort, then foo.sorted().findFirst() can get the first element without having sorted the whole remainder of the stream. So there is still *some* laziness to be extracted. Duplicate removal is "even more lazy", in that it can operate in strict one-in, one-out fashion -- but has to accumulate a lot of state to do so. (Unless we know the stream to already be sorted, in which case we can use a one-element history buffer instead of a set of seen elements.) Alternately, we can fuse sort/uniq into a single operation if they appear contiguously. Also, such operations cannot be lazy in parallel, so they force a separate parallel pass. (However, such passes are often information-creating; for example, in foo.filter().sorted().map().toArray(), we don't know the size after filtering, but we do after sorting, meaning we can fuse mapping and array packing into one operation.) >> Stream cumulate(BinaryOperator operator); > Could you explain what this does? Also known as "prefix". Given a sequence of elements E1..En, and an associative operator *, computes the sequence E1, E1*E2, E1*E2*E3, etc. Perhaps surprisingly, it can be computed in parallel in O(log n) time. This shows up in all sorts of parallel algorithms, and is often the key to turning an n^2 algorithm into an n*log n algorithm. >> T[] toArray(ArrayFactory) > I support this and suggest to get rid of Object[] toArray() Alternately, Don's suggestion of T[] toArray(Class) >> Don has suggested a multi-valued version of groupBy: >> >> Map> groupByMulti(FlatMapper classifier); >> >> which is easy to implement and makes sense to me. > A philosophical question: what is our take on how minimal we want to keep the Stream API? Right. This is a good example of something in the grey area. From andrey.breslav at jetbrains.com Sun Oct 7 11:15:44 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Sun, 7 Oct 2012 22:15:44 +0400 Subject: Stream operations -- current set In-Reply-To: <5071C155.6060805@oracle.com> References: <505399F5.80600@oracle.com> <5071C155.6060805@oracle.com> Message-ID: <4ED3CD8E-E3E5-47BE-B4DD-7335E1BD53FF@jetbrains.com> >>> Intermediate / Lazy (Stateful) >>> ------------------------------ >>> >>> Stream uniqueElements(); >>> >>> Stream sorted(Comparator comparator); >> >> I don't see how these two operations fit in here: sort() has no chance of being lazy, AFAIU, and uniqueElements() needs misleadingly much state to be stored to be considered "reasonably lazy". If I am wrong, please correct me. > > They are both lazy in the sense that no significant computation happens when the sort() method is called. Unlike filter/map/flatMap/keys/swap/etc, they are both stateful. > > Sort does not disgorge the first element until all the elements have been seen, but it is still possible to get some laziness benefit anyway; for example, if you use a heap to sort, then > > foo.sorted().findFirst() > > can get the first element without having sorted the whole remainder of the stream. So there is still *some* laziness to be extracted. I don't really buy this. I can see two big concerns in favor of laziness: time and memory. For sort() we have to spend O(N) time to save the rest of O(N*log(N)), which isn't enough gain to have this method at all. As of memory, we don't save anything. I propose to call sort() a terminal method and have it honestly return a List and not mislead the users. For uniqueElements() it's correct that it is "more lazy", although still not convinced that it does not mislead the users. I have an unrelated request there: using "natural" equals() for defining what's a "duplicate" is unfair: we allow comparators for sorting, thus we should allow comparison strategies for uniqueElements() and other such methods. > Duplicate removal is "even more lazy", in that it can operate in strict one-in, one-out fashion -- but has to accumulate a lot of state to do so. (Unless we know the stream to already be sorted, in which case we can use a one-element history buffer instead of a set of seen elements.) Alternately, we can fuse sort/uniq into a single operation if they appear contiguously. Could you explain "fuzing" a little more? >>> Stream cumulate(BinaryOperator operator); >> Could you explain what this does? > > Also known as "prefix". Given a sequence of elements E1..En, and an associative operator *, computes the sequence > > E1, E1*E2, E1*E2*E3, etc. > > Perhaps surprisingly, it can be computed in parallel in O(log n) time. This shows up in all sorts of parallel algorithms, and is often the key to turning an n^2 algorithm into an n*log n algorithm. Makes sense. Thanks. Looks like we'd need to provide a textbook on implementing efficient algorithms over new Collections API. >>> T[] toArray(ArrayFactory) >> I support this and suggest to get rid of Object[] toArray() > > Alternately, Don's suggestion of > T[] toArray(Class) Is there a performance penalty for creating arrays through reflection? BTW, we could create a Destination that wraps an array and the users could retrieve it from there. This way we'd get rid of toArray() altogether, which seems appealing to me. >>> Don has suggested a multi-valued version of groupBy: >>> >>> Map> groupByMulti(FlatMapper classifier); >>> >>> which is easy to implement and makes sense to me. >> A philosophical question: what is our take on how minimal we want to keep the Stream API? > > Right. This is a good example of something in the grey area. And still, do we have some kind of a one-paragraph description of what concepts should be in this API and what shouldn't? So far I got one clear criterion: * if foo() is crucial for performant parallel algorithms, we add it. It would be nice if we had some other bullet-points like this. -- Andrey Breslav http://jetbrains.com Develop with pleasure! From brian.goetz at oracle.com Sun Oct 7 12:02:59 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 07 Oct 2012 15:02:59 -0400 Subject: Stream operations -- current set In-Reply-To: <4ED3CD8E-E3E5-47BE-B4DD-7335E1BD53FF@jetbrains.com> References: <505399F5.80600@oracle.com> <5071C155.6060805@oracle.com> <4ED3CD8E-E3E5-47BE-B4DD-7335E1BD53FF@jetbrains.com> Message-ID: <5071D1E3.2070502@oracle.com> > For sort() we have to spend O(N) time to save the rest of O(N*log(N)), which isn't enough gain to have this method at all. As of memory, we don't save anything. Its not strictly about efficiency; its about presenting a programming model that is consistent and effective for the users. Sorting shows up in the middle of a pipeline frequently enough that moving it outside the model makes it less useful. In an earlier version, where the stream methods were on Iterable rather than Stream, making sort a terminal operation would have been less disruptive than it is now, since you would have been able to keep going. Now, you'd have to say: collection.stream().filter().sort().stream().map().forEach() ^ extra internal bun The reality is that all the stateful ops are on a slippery slope between intermediate and terminal; saying "sort is terminal but uniq/limit/slice are not" is among the least consistent choices. >> Duplicate removal is "even more lazy", in that it can operate in strict one-in, one-out fashion -- but has to accumulate a lot of state to do so. (Unless we know the stream to already be sorted, in which case we can use a one-element history buffer instead of a set of seen elements.) Alternately, we can fuse sort/uniq into a single operation if they appear contiguously. > Could you explain "fuzing" a little more? When we construct a pipeline of operations: collection - filter - sorted - map - toArray we build a linked-list representation of the stages, and evaluate it when we know the user wants an answer. When we evaluate the pipeline, we construct a chain of iterators or sinks and pull/push the data through it. Many of these operations are amenable to being combined into a single operation. For example, filter+map can easily be fused into a single operation, even if we don't externally expose a filterMap operation, and this shortens the chains of iterators/sinks. A more effective fusing is sort+uniq, since it eliminating a lot of intermediate state and computation. >>>> T[] toArray(ArrayFactory) >>> I support this and suggest to get rid of Object[] toArray() >> >> Alternately, Don's suggestion of >> T[] toArray(Class) > Is there a performance penalty for creating arrays through reflection? We're evaluating whether this is always intrinsified or not. > BTW, we could create a Destination that wraps an array and the users could retrieve it from there. This way we'd get rid of toArray() altogether, which seems appealing to me. Yes, such wrappers exist in Arrays. From andrey.breslav at jetbrains.com Sun Oct 7 12:10:31 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Sun, 7 Oct 2012 23:10:31 +0400 Subject: Stream operations -- current set In-Reply-To: <5071D1E3.2070502@oracle.com> References: <505399F5.80600@oracle.com> <5071C155.6060805@oracle.com> <4ED3CD8E-E3E5-47BE-B4DD-7335E1BD53FF@jetbrains.com> <5071D1E3.2070502@oracle.com> Message-ID: <8FDA276A-2E2B-44FA-8A5B-86D63A725C4D@jetbrains.com> > Its not strictly about efficiency; its about presenting a programming model that is consistent and effective for the users. Sorting shows up in the middle of a pipeline frequently enough that moving it outside the model makes it less useful. Then, it seems that my problem would be solved by renaming some things: e.g., if we stop calling things "lazy" when they are lazy only in a special sense (albeit technically very correct). Or we could say something else instead of "stream", because sorting a stream feels wrong? Not that I had a particular name in mind, but it seems that there may be room for improvement. From the pedagogical perspective, it is super-important to keep the concepts names as neatly as possible, otherwise we'll struggle with lots of strings attached while explaining these APIs. >> BTW, we could create a Destination that wraps an array and the users could retrieve it from there. This way we'd get rid of toArray() altogether, which seems appealing to me. > Yes, such wrappers exist in Arrays. Then I'd be happy to drop toArray() from Stream. -- Andrey Breslav http://jetbrains.com Develop with pleasure! From andrey.breslav at jetbrains.com Sun Oct 7 12:38:43 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Sun, 7 Oct 2012 23:38:43 +0400 Subject: ArrayFactory SAM type / toArray In-Reply-To: References: <505A2D81.3050306@oracle.com> <505A30EF.3090201@univ-mlv.fr> <505A35DB.90601@oracle.com> <505A4397.6060501@oracle.com> <505AEF34.2070200@oracle.com> Message-ID: Just to repeat on this thread: I don't see why we need toArray() on Streams at all: we can make it work with .into() by providing corresponding Destination implementations: stream().into(new ArrayDestination(Class/Lambda/whatever)).array() And I doubt that there is such a huge number of use cases for creating arrays from streams that this solution isn't good enough. -- Andrey Breslav http://jetbrains.com Develop with pleasure! From brian.goetz at oracle.com Sun Oct 7 14:52:03 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 07 Oct 2012 17:52:03 -0400 Subject: ArrayFactory SAM type / toArray In-Reply-To: References: <505A2D81.3050306@oracle.com> <505A30EF.3090201@univ-mlv.fr> <505A35DB.90601@oracle.com> <505A4397.6060501@oracle.com> <505AEF34.2070200@oracle.com> Message-ID: <5071F983.3020406@oracle.com> That presumes you know the size of the result, or are willing to tolerate truncation/underflow. There are cases where the stream knows the size but the client does not, and in those cases, treating array as a destination leaves the user with no way to get an array without extra copying. On 10/7/2012 3:38 PM, Andrey Breslav wrote: > Just to repeat on this thread: > > I don't see why we need toArray() on Streams at all: we can make it work with .into() by providing corresponding Destination implementations: > > stream().into(new ArrayDestination(Class/Lambda/whatever)).array() > > And I doubt that there is such a huge number of use cases for creating arrays from streams that this solution isn't good enough. > > -- > Andrey Breslav > http://jetbrains.com > Develop with pleasure! > > From andrey.breslav at jetbrains.com Sun Oct 7 17:29:50 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Mon, 8 Oct 2012 04:29:50 +0400 Subject: ArrayFactory SAM type / toArray In-Reply-To: <5071F983.3020406@oracle.com> References: <505A2D81.3050306@oracle.com> <505A30EF.3090201@univ-mlv.fr> <505A35DB.90601@oracle.com> <505A4397.6060501@oracle.com> <505AEF34.2070200@oracle.com> <5071F983.3020406@oracle.com> Message-ID: <58CE2A9C-6725-464F-9368-281D4F1DD131@jetbrains.com> > That presumes you know the size of the result, or are willing to tolerate truncation/underflow. There are cases where the stream knows the size but the client does not, and in those cases, treating array as a destination leaves the user with no way to get an array without extra copying. Fair point, but is creating an array from a stream SO critical that a O(N) time/space is so much of a penalty here? -- Andrey Breslav http://jetbrains.com Develop with pleasure! From brian.goetz at oracle.com Sun Oct 7 21:19:14 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 08 Oct 2012 00:19:14 -0400 Subject: Stream operations -- current set In-Reply-To: <8FDA276A-2E2B-44FA-8A5B-86D63A725C4D@jetbrains.com> References: <505399F5.80600@oracle.com> <5071C155.6060805@oracle.com> <4ED3CD8E-E3E5-47BE-B4DD-7335E1BD53FF@jetbrains.com> <5071D1E3.2070502@oracle.com> <8FDA276A-2E2B-44FA-8A5B-86D63A725C4D@jetbrains.com> Message-ID: <50725442.2080708@oracle.com> >> Its not strictly about efficiency; its about presenting a >> programming model that is consistent and effective for the users. >> Sorting shows up in the middle of a pipeline frequently enough that >> moving it outside the model makes it less useful. > Then, it seems that my problem would be solved by renaming some > things: e.g., if we stop calling things "lazy" when they are lazy > only in a special sense (albeit technically very correct). Or we > could say something else instead of "stream", because sorting a > stream feels wrong? Not that I had a particular name in mind, but it > seems that there may be room for improvement. Agreed. In fact, we've done that already in the code -- but have lagged behind doing that in our discussions. We have renamed them "intermediate" and "terminal" operations, largely for the reasons you state. Intermediate operations have both stateless (truly lazy) and stateful (varying degrees of laziness) varieties. Are these names better? From forax at univ-mlv.fr Mon Oct 15 15:09:35 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 16 Oct 2012 00:09:35 +0200 Subject: Introduce Covariance/Contravariance at declaration site for Java 9 ? Message-ID: <507C899F.1000907@univ-mlv.fr> I've just read the presentation of Stuart Marks at JavaOne [1], all examples after slide 32, the first one that use lambdas are not written correctly because all method signatures do not use wildcards. Brian, I know that we will not be able to introduce covariance/contravariance at declaration site for Java 8, so the solution we will deliver will be far from perfect because nobody understand wildcards. Is there a way to free Dan and Maurizio enough time to investigate if covariance/contravariance can be added to Java 9. R?mi [1] https://stuartmarks.wordpress.com/2012/10/07/javaone-2012-jump-starting-lambda-programming/ From josh at bloch.us Mon Oct 15 17:24:52 2012 From: josh at bloch.us (Joshua Bloch) Date: Mon, 15 Oct 2012 17:24:52 -0700 Subject: Introduce Covariance/Contravariance at declaration site for Java 9 ? In-Reply-To: <507C899F.1000907@univ-mlv.fr> References: <507C899F.1000907@univ-mlv.fr> Message-ID: I believe that declaration site variance annotations are every bit as bad as use-site annotations. They're bad in a different way--they force you to write idiosyncratic types because natural types don't lend themselves to fixed variance restrictions--but they're still bad. Providing both use and declaration site variance in one language is the worst of both worlds (unless you're trying to kill the language). Josh On Mon, Oct 15, 2012 at 3:09 PM, Remi Forax wrote: > I've just read the presentation of Stuart Marks at JavaOne [1], > all examples after slide 32, the first one that use lambdas are not > written correctly > because all method signatures do not use wildcards. > > Brian, I know that we will not be able to introduce > covariance/contravariance > at declaration site for Java 8, so the solution we will deliver will be > far from perfect > because nobody understand wildcards. > Is there a way to free Dan and Maurizio enough time to investigate if > covariance/contravariance can be added to Java 9. > > R?mi > [1] https://stuartmarks.wordpress.**com/2012/10/07/javaone-2012-** > jump-starting-lambda-**programming/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121015/430fd309/attachment.html From forax at univ-mlv.fr Mon Oct 15 17:40:43 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 16 Oct 2012 02:40:43 +0200 Subject: Introduce Covariance/Contravariance at declaration site for Java 9 ? In-Reply-To: References: <507C899F.1000907@univ-mlv.fr> Message-ID: <507CAD0B.6030907@univ-mlv.fr> On 10/16/2012 02:24 AM, Joshua Bloch wrote: > I believe that declaration site variance annotations are every bit as > bad as use-site annotations. They're bad in a different way--they > force you to write idiosyncratic types because natural types don't > lend themselves to fixed variance restrictions--but they're still bad. > Providing both use and declaration site variance in one language is > the worst of both worlds (unless you're trying to kill the language). I agree that declaration variance as an effect on the way of people write APIs, and also agree that mixing the two kind of variance just makes things more complex to understand. The real issue is to having variance on SAMs, so it can be restricted only to SAM type. In that case, the variance can be inferred using the SAM descriptor without having to require user declaration. So instead of declaration site variance, let's call it infered variance at use-site for function type. > > Josh R?mi > > On Mon, Oct 15, 2012 at 3:09 PM, Remi Forax > wrote: > > I've just read the presentation of Stuart Marks at JavaOne [1], > all examples after slide 32, the first one that use lambdas are > not written correctly > because all method signatures do not use wildcards. > > Brian, I know that we will not be able to introduce > covariance/contravariance > at declaration site for Java 8, so the solution we will deliver > will be far from perfect > because nobody understand wildcards. > Is there a way to free Dan and Maurizio enough time to investigate if > covariance/contravariance can be added to Java 9. > > R?mi > [1] > https://stuartmarks.wordpress.com/2012/10/07/javaone-2012-jump-starting-lambda-programming/ > > From daniel.smith at oracle.com Thu Oct 18 12:20:40 2012 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 18 Oct 2012 13:20:40 -0600 Subject: Introduce Covariance/Contravariance at declaration site for Java 9 ? In-Reply-To: References: <507C899F.1000907@univ-mlv.fr> Message-ID: <31095704-E327-43CB-ACB3-98E3DD71BD6B@oracle.com> I think it's a good idea, at least worth serious consideration. There would be no _requirement_ to design libraries in declaration-site-friendly ways, but the fact is we already have _lots_ of types that are inherently co-/contra- variant, and the "right" way to use those types is to always use a wildcard. It turns into a mechanical transformation that obscures the code behind layers of wildcards and pointlessly punishes users if they mess up; it would sure be nice to remove that burden from clients of variant types. Anyway, I can say it's on the radar. But maybe we will conclude it's a horrible idea; or maybe other things will take priority. ?Dan On Oct 15, 2012, at 6:24 PM, Joshua Bloch wrote: > I believe that declaration site variance annotations are every bit as bad as use-site annotations. They're bad in a different way--they force you to write idiosyncratic types because natural types don't lend themselves to fixed variance restrictions--but they're still bad. Providing both use and declaration site variance in one language is the worst of both worlds (unless you're trying to kill the language). > > Josh > > On Mon, Oct 15, 2012 at 3:09 PM, Remi Forax wrote: > I've just read the presentation of Stuart Marks at JavaOne [1], > all examples after slide 32, the first one that use lambdas are not written correctly > because all method signatures do not use wildcards. > > Brian, I know that we will not be able to introduce covariance/contravariance > at declaration site for Java 8, so the solution we will deliver will be far from perfect > because nobody understand wildcards. > Is there a way to free Dan and Maurizio enough time to investigate if > covariance/contravariance can be added to Java 9. > > R?mi > [1] https://stuartmarks.wordpress.com/2012/10/07/javaone-2012-jump-starting-lambda-programming/ > > From kevinb at google.com Thu Oct 18 12:25:09 2012 From: kevinb at google.com (Kevin Bourrillion) Date: Thu, 18 Oct 2012 12:25:09 -0700 Subject: Introduce Covariance/Contravariance at declaration site for Java 9 ? In-Reply-To: <31095704-E327-43CB-ACB3-98E3DD71BD6B@oracle.com> References: <507C899F.1000907@univ-mlv.fr> <31095704-E327-43CB-ACB3-98E3DD71BD6B@oracle.com> Message-ID: FTR, I agree fairly strongly with everything Dan says here. On Thu, Oct 18, 2012 at 12:20 PM, Dan Smith wrote: > I think it's a good idea, at least worth serious consideration. > > There would be no _requirement_ to design libraries in > declaration-site-friendly ways, but the fact is we already have _lots_ of > types that are inherently co-/contra- variant, and the "right" way to use > those types is to always use a wildcard. It turns into a mechanical > transformation that obscures the code behind layers of wildcards and > pointlessly punishes users if they mess up; it would sure be nice to remove > that burden from clients of variant types. > > Anyway, I can say it's on the radar. But maybe we will conclude it's a > horrible idea; or maybe other things will take priority. > > ?Dan > > On Oct 15, 2012, at 6:24 PM, Joshua Bloch wrote: > > > I believe that declaration site variance annotations are every bit as > bad as use-site annotations. They're bad in a different way--they force > you to write idiosyncratic types because natural types don't lend > themselves to fixed variance restrictions--but they're still bad. Providing > both use and declaration site variance in one language is the worst of both > worlds (unless you're trying to kill the language). > > > > Josh > > > > On Mon, Oct 15, 2012 at 3:09 PM, Remi Forax wrote: > > I've just read the presentation of Stuart Marks at JavaOne [1], > > all examples after slide 32, the first one that use lambdas are not > written correctly > > because all method signatures do not use wildcards. > > > > Brian, I know that we will not be able to introduce > covariance/contravariance > > at declaration site for Java 8, so the solution we will deliver will be > far from perfect > > because nobody understand wildcards. > > Is there a way to free Dan and Maurizio enough time to investigate if > > covariance/contravariance can be added to Java 9. > > > > R?mi > > [1] > https://stuartmarks.wordpress.com/2012/10/07/javaone-2012-jump-starting-lambda-programming/ > > > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kevinb at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121018/74b564cc/attachment.html From brian.goetz at oracle.com Mon Oct 22 09:18:44 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 22 Oct 2012 12:18:44 -0400 Subject: Announcement: converting JSR-335 to JCP 2.8 Message-ID: <508571E4.8070804@oracle.com> JSR-335 was originally submitted in November 2010, under the rules of JCP 2.7. When JCP 2.8 (JSR 348) was approved in October 2011 [1], Oracle publicly committed to convert all in-flight JSRs led by Oracle to JCP 2.8. To satisfy the transparency requirements of JCP 2.8 for Java SE JSRs we had to update the OpenJDK Terms of Use, which required spending lots of quality time with attorneys. That was finally done this past July [2]. JCP 2.8 also requires us to have a public issue tracker, but the OpenJDK JIRA system isn't available yet [3][4]. I've therefore set up an issue tracker on java.net [5]; we may migrate issues from that system to the OpenJDK system when the latter becomes available, depending on timing. To convert an existing JSR to JCP 2.8, requires answering a set of questions posed by the JCP PMO [6]. Appended below are our answers to those questions. All members of the JSR-335 EG have agreed to the change to JCP 2.8, having indicated their consent on the lambda-spec-experts list. Answers to JCP 2.8 conversion questions for JSR 337 (http://jcp.org/en/resources/change_jcp_version) What is the specific URL for the document archive? http://openjdk.java.net/projects/lambda What is the specific URL for the Issue Tracker? http://java.net/jira/browse/JSR_335 What is the specific URL for the EG communication archive? http://mail.openjdk.java.net/pipermail/lambda-spec-experts/ http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/ What is the description of your communications channel/how the public should provide feedback? We have set up "comments" aliases: lambda-libs-spec-comments at openjdk.java.net (libraries) lambda-spec-comments at openjdk.java.net (language) The archives of these are public. How will you consult with the Expert Group of your JSR on new Expert Group nominations? I will ask if anyone has anything to say for or against an incoming nomination. How will you provide details of the Expert Group nominations for your JSR to the public? Details will be provided on the publicly-readable EG list. [1] https://blogs.oracle.com/pcurran/entry/no_more_smoke_filled_rooms [2] http://openjdk.java.net/legal/tou/ [3] http://mail.openjdk.java.net/pipermail/announce/2012-March/000120.html [4] https://blogs.oracle.com/darcy/entry/moving_monarchs_dragons [5] http://java.net/jira/browse/JSR_335 [6] http://jcp.org/en/resources/change_jcp_version From brian.goetz at oracle.com Mon Oct 22 21:05:17 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 23 Oct 2012 00:05:17 -0400 Subject: Encounter order Message-ID: <5086177D.8050706@oracle.com> For parallel calculations, there are several different things we might mean by order. In the context of bulk operations, I'll (try to) use these terms: - Encounter order. A source is said to have an encounter order when the order in which elements are yielded by iterator / forEach is predictable and meaningful. (This is a generalization of spatial order, since some sources may be purely generative rather than representing data structures.) Sources such as arrays, lists, queues, sorted collections, and IO channels have an encounter order; sources like HashSets or the key set of a HashMap do not (though their implementation may have a predictable iteration order anyway.) - Arrival order. Arrival order (aka temporal order) is the time order in which elements may arrive at a stage of an operation pipeline. In sequential calculations, the encounter order is generally the arrival order, but in parallel calculations may not be. Sometimes this is good, sometimes not. It is easy to track (via the stream flags mechanism) whether a stream source has (or more precisely, supports) an encounter order, and similarly easy to track whether an operation preserves that order. We also don't start any processing until we've seen the whole pipeline. Some operations have intrinsic semantic constraints that require us to produce a result that is consistent with processing the elements in encounter order. For example, a fold/reduce operation across a list using an associative but not commutative operation. (Though this can still be parallelized efficiently.) Others should probably be constrained to preserve order just because this is consistent with user expectations (e.g., applying the map function x -> x*2 to [ 1, 2, 3 ] should probably yield [ 2, 4, 6 ], not [ 4, 6, 2 ].) Other operations, such as forEach, may have neither constraint. Like any other constraint, requiring processing to be done consistently with encounter order has a cost. Finally, some operations have targets (like into or toArray), which might themselves support or not support an encounter order. What I am trying to accomplish here is to identify reasonable user expectations about order preservation for the (source, operation, target) combinations we have, and whether we want to / need to add additional operations to give users more control over ordering. Here are some starter questions. Let "list" contain the numbers 1..10. What should the following expressions compute? list.parallel().toArray() list.parallel().filter(e -> e > 5).toArray(); list.parallel().into(new ArrayList<>()); List target = new ArrayList(); list.parallel().forEach(e -> target.add(e)); I think the first one "obviously" should result in an array [ 1..10 ]. And the last should "obviously" not be constrained at all as to order, only that it print the numbers 1 through 10 in some order. The third one, though, isn't completely obvious what the right answer is. What about this: Map m = list.parallel().groupBy(e -> e % 2 == 0); This partitions the list into two parts; should the elements in each part be in the encounter order from the original list? (In other words, should this result in { 0 -> [ 2, 4, 6, 8, 10 ], 1 -> [ 1, 3, 5, 7, 9 ] } ?) The model I'm targeting is: IF the source has an encounter order AND the operation must preserve encounter order, then we need to be sensitive to encounter order, otherwise anything goes. Is this the right model? If so, what are the preservation settings for each operation? (A secondary consideration is that there are often situations where the computation structurally suggests an encounter order, but the user doesn't actually care, and then is paying for ordering he doesn't want. But there are fixes for this.) Lots of more detailed questions but I'll let people respond to these. From joe.bowbeer at gmail.com Mon Oct 22 21:48:57 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 22 Oct 2012 21:48:57 -0700 Subject: Encounter order In-Reply-To: <5086177D.8050706@oracle.com> References: <5086177D.8050706@oracle.com> Message-ID: I don't have any assumptions about ordering after .parallel(). list.parallel().toArray() is analogous to a parallel-ized for-loop. (See Parallel.ForEach in C#.) If I wanted it to execute in order, I wouldn't parallel() it. Or, if I wanted to "preserve" the order, I would zip-with-index before parallel execution. --Joe On Mon, Oct 22, 2012 at 9:05 PM, Brian Goetz wrote: > For parallel calculations, there are several different things we might > mean by order. In the context of bulk operations, I'll (try to) use these > terms: > > - Encounter order. A source is said to have an encounter order when the > order in which elements are yielded by iterator / forEach is predictable > and meaningful. (This is a generalization of spatial order, since some > sources may be purely generative rather than representing data structures.) > > Sources such as arrays, lists, queues, sorted collections, and IO channels > have an encounter order; sources like HashSets or the key set of a HashMap > do not (though their implementation may have a predictable iteration order > anyway.) > > - Arrival order. Arrival order (aka temporal order) is the time order in > which elements may arrive at a stage of an operation pipeline. In > sequential calculations, the encounter order is generally the arrival > order, but in parallel calculations may not be. Sometimes this is good, > sometimes not. > > It is easy to track (via the stream flags mechanism) whether a stream > source has (or more precisely, supports) an encounter order, and similarly > easy to track whether an operation preserves that order. We also don't > start any processing until we've seen the whole pipeline. > > Some operations have intrinsic semantic constraints that require us to > produce a result that is consistent with processing the elements in > encounter order. For example, a fold/reduce operation across a list using > an associative but not commutative operation. (Though this can still be > parallelized efficiently.) Others should probably be constrained to > preserve order just because this is consistent with user expectations > (e.g., applying the map function x -> x*2 to [ 1, 2, 3 ] should probably > yield [ 2, 4, 6 ], not [ 4, 6, 2 ].) Other operations, such as forEach, may > have neither constraint. > > Like any other constraint, requiring processing to be done consistently > with encounter order has a cost. > > Finally, some operations have targets (like into or toArray), which might > themselves support or not support an encounter order. > > What I am trying to accomplish here is to identify reasonable user > expectations about order preservation for the (source, operation, target) > combinations we have, and whether we want to / need to add additional > operations to give users more control over ordering. > > Here are some starter questions. Let "list" contain the numbers 1..10. > What should the following expressions compute? > > list.parallel().toArray() > > list.parallel().filter(e -> e > 5).toArray(); > > list.parallel().into(new ArrayList<>()); > > List target = new ArrayList(); > list.parallel().forEach(e -> target.add(e)); > > I think the first one "obviously" should result in an array [ 1..10 ]. > And the last should "obviously" not be constrained at all as to order, > only that it print the numbers 1 through 10 in some order. The third one, > though, isn't completely obvious what the right answer is. > > What about this: > > Map m = list.parallel().groupBy(e -> e % 2 == 0); > > This partitions the list into two parts; should the elements in each part > be in the encounter order from the original list? (In other words, should > this result in { 0 -> [ 2, 4, 6, 8, 10 ], 1 -> [ 1, 3, 5, 7, 9 ] } ?) > > The model I'm targeting is: IF the source has an encounter order AND the > operation must preserve encounter order, then we need to be sensitive to > encounter order, otherwise anything goes. Is this the right model? If so, > what are the preservation settings for each operation? > > (A secondary consideration is that there are often situations where the > computation structurally suggests an encounter order, but the user doesn't > actually care, and then is paying for ordering he doesn't want. But there > are fixes for this.) > > Lots of more detailed questions but I'll let people respond to these. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121022/fcf87943/attachment.html From brian.goetz at oracle.com Mon Oct 22 22:24:13 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 23 Oct 2012 01:24:13 -0400 Subject: Encounter order In-Reply-To: References: <5086177D.8050706@oracle.com> Message-ID: <508629FD.2040100@oracle.com> > I don't have any assumptions about ordering after .parallel(). > > list.parallel().toArray() is analogous to a parallel-ized for-loop. > (See Parallel.ForEach in C#.) Certainly, list.parallel().forEach() is analogous to a parallelized for-loop. > If I wanted it to execute in order, I wouldn't parallel() it. But I didn't anything about *executing* in order. The question was, to what extent should order be preserved. For example, array.parallel().map().toArray() parallelizes perfectly while preserving encounter order, though does not execute in order. > Or, if I wanted to "preserve" the order, I would zip-with-index before > parallel execution. Does that mean you assume that reducing functions passed to reduce are necessarily commutative? What about array.parallel().sorted().toArray() ? Would you expect the result to appear in the array in sorted order? > > --Joe > > On Mon, Oct 22, 2012 at 9:05 PM, Brian Goetz > wrote: > > For parallel calculations, there are several different things we > might mean by order. In the context of bulk operations, I'll (try > to) use these terms: > > - Encounter order. A source is said to have an encounter order > when the order in which elements are yielded by iterator / forEach > is predictable and meaningful. (This is a generalization of spatial > order, since some sources may be purely generative rather than > representing data structures.) > > Sources such as arrays, lists, queues, sorted collections, and IO > channels have an encounter order; sources like HashSets or the key > set of a HashMap do not (though their implementation may have a > predictable iteration order anyway.) > > - Arrival order. Arrival order (aka temporal order) is the time > order in which elements may arrive at a stage of an operation > pipeline. In sequential calculations, the encounter order is > generally the arrival order, but in parallel calculations may not > be. Sometimes this is good, sometimes not. > > It is easy to track (via the stream flags mechanism) whether a > stream source has (or more precisely, supports) an encounter order, > and similarly easy to track whether an operation preserves that > order. We also don't start any processing until we've seen the > whole pipeline. > > Some operations have intrinsic semantic constraints that require us > to produce a result that is consistent with processing the elements > in encounter order. For example, a fold/reduce operation across a > list using an associative but not commutative operation. (Though > this can still be parallelized efficiently.) Others should probably > be constrained to preserve order just because this is consistent > with user expectations (e.g., applying the map function x -> x*2 to > [ 1, 2, 3 ] should probably yield [ 2, 4, 6 ], not [ 4, 6, 2 ].) > Other operations, such as forEach, may have neither constraint. > > Like any other constraint, requiring processing to be done > consistently with encounter order has a cost. > > Finally, some operations have targets (like into or toArray), which > might themselves support or not support an encounter order. > > What I am trying to accomplish here is to identify reasonable user > expectations about order preservation for the (source, operation, > target) combinations we have, and whether we want to / need to add > additional operations to give users more control over ordering. > > Here are some starter questions. Let "list" contain the numbers > 1..10. What should the following expressions compute? > > list.parallel().toArray() > > list.parallel().filter(e -> e > 5).toArray(); > > list.parallel().into(new ArrayList<>()); > > List target = new ArrayList(); > list.parallel().forEach(e -> target.add(e)); > > I think the first one "obviously" should result in an array [ 1..10 > ]. And the last should "obviously" not be constrained at all as to > order, only that it print the numbers 1 through 10 in some order. > The third one, though, isn't completely obvious what the right > answer is. > > What about this: > > Map m = list.parallel().groupBy(e -> e % 2 == 0); > > This partitions the list into two parts; should the elements in each > part be in the encounter order from the original list? (In other > words, should this result in { 0 -> [ 2, 4, 6, 8, 10 ], 1 -> [ 1, 3, > 5, 7, 9 ] } ?) > > The model I'm targeting is: IF the source has an encounter order AND > the operation must preserve encounter order, then we need to be > sensitive to encounter order, otherwise anything goes. Is this the > right model? If so, what are the preservation settings for each > operation? > > (A secondary consideration is that there are often situations where > the computation structurally suggests an encounter order, but the > user doesn't actually care, and then is paying for ordering he > doesn't want. But there are fixes for this.) > > Lots of more detailed questions but I'll let people respond to these. > > From david.holmes at oracle.com Mon Oct 22 22:45:02 2012 From: david.holmes at oracle.com (David Holmes) Date: Tue, 23 Oct 2012 15:45:02 +1000 Subject: Encounter order In-Reply-To: <5086177D.8050706@oracle.com> References: <5086177D.8050706@oracle.com> Message-ID: <50862EDE.3060900@oracle.com> Hi Brian, My initial reaction was similar to Joe's: I don't have any expectations on order after parallel() has been used. If I need to re-apply the original order to the final set of elements then I would sort it - my only issue there is how do I know how to sort them in the same way? On the other hand I might be a little upset if I have to re-sort my billion element collection after filtering out the blue blocks ;-) Is it feasible to identify and implement order-preserving operations without either sacrificing direct performance (to do the op in an order preserving way) or requiring an additional sort step? David On 23/10/2012 2:05 PM, Brian Goetz wrote: > For parallel calculations, there are several different things we might > mean by order. In the context of bulk operations, I'll (try to) use > these terms: > > - Encounter order. A source is said to have an encounter order when the > order in which elements are yielded by iterator / forEach is predictable > and meaningful. (This is a generalization of spatial order, since some > sources may be purely generative rather than representing data structures.) > > Sources such as arrays, lists, queues, sorted collections, and IO > channels have an encounter order; sources like HashSets or the key set > of a HashMap do not (though their implementation may have a predictable > iteration order anyway.) > > - Arrival order. Arrival order (aka temporal order) is the time order in > which elements may arrive at a stage of an operation pipeline. In > sequential calculations, the encounter order is generally the arrival > order, but in parallel calculations may not be. Sometimes this is good, > sometimes not. > > It is easy to track (via the stream flags mechanism) whether a stream > source has (or more precisely, supports) an encounter order, and > similarly easy to track whether an operation preserves that order. We > also don't start any processing until we've seen the whole pipeline. > > Some operations have intrinsic semantic constraints that require us to > produce a result that is consistent with processing the elements in > encounter order. For example, a fold/reduce operation across a list > using an associative but not commutative operation. (Though this can > still be parallelized efficiently.) Others should probably be > constrained to preserve order just because this is consistent with user > expectations (e.g., applying the map function x -> x*2 to [ 1, 2, 3 ] > should probably yield [ 2, 4, 6 ], not [ 4, 6, 2 ].) Other operations, > such as forEach, may have neither constraint. > > Like any other constraint, requiring processing to be done consistently > with encounter order has a cost. > > Finally, some operations have targets (like into or toArray), which > might themselves support or not support an encounter order. > > What I am trying to accomplish here is to identify reasonable user > expectations about order preservation for the (source, operation, > target) combinations we have, and whether we want to / need to add > additional operations to give users more control over ordering. > > Here are some starter questions. Let "list" contain the numbers 1..10. > What should the following expressions compute? > > list.parallel().toArray() > > list.parallel().filter(e -> e > 5).toArray(); > > list.parallel().into(new ArrayList<>()); > > List target = new ArrayList(); > list.parallel().forEach(e -> target.add(e)); > > I think the first one "obviously" should result in an array [ 1..10 ]. > And the last should "obviously" not be constrained at all as to order, > only that it print the numbers 1 through 10 in some order. The third > one, though, isn't completely obvious what the right answer is. > > What about this: > > Map m = list.parallel().groupBy(e -> e % 2 == 0); > > This partitions the list into two parts; should the elements in each > part be in the encounter order from the original list? (In other words, > should this result in { 0 -> [ 2, 4, 6, 8, 10 ], 1 -> [ 1, 3, 5, 7, 9 ] > } ?) > > The model I'm targeting is: IF the source has an encounter order AND the > operation must preserve encounter order, then we need to be sensitive to > encounter order, otherwise anything goes. Is this the right model? If > so, what are the preservation settings for each operation? > > (A secondary consideration is that there are often situations where the > computation structurally suggests an encounter order, but the user > doesn't actually care, and then is paying for ordering he doesn't want. > But there are fixes for this.) > > Lots of more detailed questions but I'll let people respond to these. > From joe.bowbeer at gmail.com Mon Oct 22 23:07:19 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Mon, 22 Oct 2012 23:07:19 -0700 Subject: Encounter order In-Reply-To: <508629FD.2040100@oracle.com> References: <5086177D.8050706@oracle.com> <508629FD.2040100@oracle.com> Message-ID: Inline. On Mon, Oct 22, 2012 at 10:24 PM, Brian Goetz wrote: > I don't have any assumptions about ordering after .parallel(). >> >> list.parallel().toArray() is analogous to a parallel-ized for-loop. >> (See Parallel.ForEach in C#.) >> > > Certainly, list.parallel().forEach() is analogous to a parallelized > for-loop. > > > If I wanted it to execute in order, I wouldn't parallel() it. >> > > But I didn't anything about *executing* in order. The question was, to > what extent should order be preserved. > > For example, array.parallel().map().**toArray() parallelizes perfectly > while preserving encounter order, though does not execute in order. I didn't know that. In PLINQ, for example, asParallel() is unordered, but asOrdered() can be added to preserve the order. http://msdn.microsoft.com/en-us/library/dd460677.aspx (But I am not advocating for asOrdered.) > Or, if I wanted to "preserve" the order, I would zip-with-index before >> parallel execution. >> > > Does that mean you assume that reducing functions passed to reduce are > necessarily commutative? > > What about > > array.parallel().sorted().**toArray() > > ? Would you expect the result to appear in the array in sorted order? > After .parallel(), I would make no assumptions about order, but after .sorted(), I would expect the order to be sorted. I'm curious to know what other people think. > > >> --Joe >> >> On Mon, Oct 22, 2012 at 9:05 PM, Brian Goetz > > wrote: >> >> For parallel calculations, there are several different things we >> might mean by order. In the context of bulk operations, I'll (try >> to) use these terms: >> >> - Encounter order. A source is said to have an encounter order >> when the order in which elements are yielded by iterator / forEach >> is predictable and meaningful. (This is a generalization of spatial >> order, since some sources may be purely generative rather than >> representing data structures.) >> >> Sources such as arrays, lists, queues, sorted collections, and IO >> channels have an encounter order; sources like HashSets or the key >> set of a HashMap do not (though their implementation may have a >> predictable iteration order anyway.) >> >> - Arrival order. Arrival order (aka temporal order) is the time >> order in which elements may arrive at a stage of an operation >> pipeline. In sequential calculations, the encounter order is >> generally the arrival order, but in parallel calculations may not >> be. Sometimes this is good, sometimes not. >> >> It is easy to track (via the stream flags mechanism) whether a >> stream source has (or more precisely, supports) an encounter order, >> and similarly easy to track whether an operation preserves that >> order. We also don't start any processing until we've seen the >> whole pipeline. >> >> Some operations have intrinsic semantic constraints that require us >> to produce a result that is consistent with processing the elements >> in encounter order. For example, a fold/reduce operation across a >> list using an associative but not commutative operation. (Though >> this can still be parallelized efficiently.) Others should probably >> be constrained to preserve order just because this is consistent >> with user expectations (e.g., applying the map function x -> x*2 to >> [ 1, 2, 3 ] should probably yield [ 2, 4, 6 ], not [ 4, 6, 2 ].) >> Other operations, such as forEach, may have neither constraint. >> >> Like any other constraint, requiring processing to be done >> consistently with encounter order has a cost. >> >> Finally, some operations have targets (like into or toArray), which >> might themselves support or not support an encounter order. >> >> What I am trying to accomplish here is to identify reasonable user >> expectations about order preservation for the (source, operation, >> target) combinations we have, and whether we want to / need to add >> additional operations to give users more control over ordering. >> >> Here are some starter questions. Let "list" contain the numbers >> 1..10. What should the following expressions compute? >> >> list.parallel().toArray() >> >> list.parallel().filter(e -> e > 5).toArray(); >> >> list.parallel().into(new ArrayList<>()); >> >> List target = new ArrayList(); >> list.parallel().forEach(e -> target.add(e)); >> >> I think the first one "obviously" should result in an array [ 1..10 >> ]. And the last should "obviously" not be constrained at all as to >> order, only that it print the numbers 1 through 10 in some order. >> The third one, though, isn't completely obvious what the right >> answer is. >> >> What about this: >> >> Map m = list.parallel().groupBy(e -> e % 2 == 0); >> >> This partitions the list into two parts; should the elements in each >> part be in the encounter order from the original list? (In other >> words, should this result in { 0 -> [ 2, 4, 6, 8, 10 ], 1 -> [ 1, 3, >> 5, 7, 9 ] } ?) >> >> The model I'm targeting is: IF the source has an encounter order AND >> the operation must preserve encounter order, then we need to be >> sensitive to encounter order, otherwise anything goes. Is this the >> right model? If so, what are the preservation settings for each >> operation? >> >> (A secondary consideration is that there are often situations where >> the computation structurally suggests an encounter order, but the >> user doesn't actually care, and then is paying for ordering he >> doesn't want. But there are fixes for this.) >> >> Lots of more detailed questions but I'll let people respond to these. >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121022/ba320850/attachment.html From paul.sandoz at oracle.com Tue Oct 23 04:11:53 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 23 Oct 2012 13:11:53 +0200 Subject: Encounter order In-Reply-To: References: <5086177D.8050706@oracle.com> <508629FD.2040100@oracle.com> Message-ID: <43D63FF7-CC92-4711-A790-60C42E7E93DC@oracle.com> On Oct 23, 2012, at 8:07 AM, Joe Bowbeer wrote: > >> Or, if I wanted to "preserve" the order, I would zip-with-index before >>> parallel execution. >>> >> >> Does that mean you assume that reducing functions passed to reduce are >> necessarily commutative? >> >> What about >> >> array.parallel().sorted().**toArray() >> >> ? Would you expect the result to appear in the array in sorted order? >> > > > After .parallel(), I would make no assumptions about order, but after > .sorted(), I would expect the order to be sorted. > What about: sortedList = list.parallel().sorted.into(new ArrayList<>()); // Is the filtered list sorted or does it need to be re-sorted? filteredList = sortedList.parallel().filter(...).into(new ArrayList<>); or: TreeSet ts = ... filteredList = ts.parallel().filter(...).into(new ArrayList<>()); The cost of re-sorting is likely to be more expensive than preserving the order. Paul. > I'm curious to know what other people think. > From paul.sandoz at oracle.com Tue Oct 23 04:53:49 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Tue, 23 Oct 2012 13:53:49 +0200 Subject: Encounter order In-Reply-To: <5086177D.8050706@oracle.com> References: <5086177D.8050706@oracle.com> Message-ID: Hi Brian, My first inclination is we need to be sensitive to encounter order, by default, because of the rather loosely defined "least surprise" principle: Developer starts with: List serial = list.stream().filter(e -> e > 5).into(new ArrayList<>()); then later on changes to: List parallel = list.parallel().filter(e -> e > 5).into(new ArrayList<>()); Would most developers expect those two lists, serial and parallel, to be equal? Is their expectation that going parallel gives a performance boost without affecting the result? Mathematica's Parallelize [1] states: Parallelize[expr] normally gives the same result as evaluating expr, except for side effects during the computation http://reference.wolfram.com/mathematica/ref/Parallelize.html Whatever the default position is on encounter order i suspect we will require some way to either explicitly disable or enable order preservation (like what Joe highlighted in PLINQ). Paul. [1] http://reference.wolfram.com/mathematica/ref/Parallelize.html http://reference.wolfram.com/mathematica/ref/ParallelCombine.html From brian.goetz at oracle.com Tue Oct 23 06:43:04 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 23 Oct 2012 09:43:04 -0400 Subject: Encounter order In-Reply-To: <50862EDE.3060900@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> Message-ID: <50869EE8.8020200@oracle.com> > On the other hand I might be a little upset if I have to re-sort my > billion element collection after filtering out the blue blocks ;-) Yes, this is the sort of user expectation I'm trying to make more precise. It's easy to say "no order at all" (and this is fine for x.parallel().forEach(...)), but I think we do have expectations of order preservation, and I am trying to tease them out. Whether an encounter order is provided by the source or an intermediate stage should not matter. So, if you expect: list.parallel().sort().toArray() to result in a sorted array, then I think you should also expect sortedSet.parallel().toArray() to result in a sorted array. Similarly, if you expect the first one to be sorted, I'm pretty sure you expect this to be sorted too: list.sorted().filter(x -> x.color != BLUE).toArray() Which means I think you expect filter to be order-preserving. Similarly, take reduce. Typically a reducing function is expected to be associative but not commutative. But that places a constraint on order; we can't just feed them to the reducer in random order. And I think there is no getting around this one -- requiring reducing functions be commutative is too strong a requirement. So there's at least one example where we absolutely must pay attention to order. (The cost of preserving order here is a little extra bookkeeping in the decomposition; we have to keep track of the order of a given node's children (e.g., left and right child pointers), so the children's results can be combined properly.) > Is it feasible to identify and implement order-preserving operations > without either sacrificing direct performance (to do the op in an order > preserving way) or requiring an additional sort step? We of course want to minimize the cost of preserving order where needed. And many ops have friendly parallel solutions that don't involve arbitrary sequencing or extra copying. For example, we can parallelize list.parallel().map().toArray() perfectly (modulo some assumptions about splitting), where we decompose the input list in such a way that we know where in the output array each chunk is going to go, and in parallel, compute and write the mapped results to exactly the right place in the output array. So order preservation is not always expensive. Other ops have minor costs of preserving order (like the reduce example.) Still others have bigger costs. But we don't want to pay for ordering where it is not wanted, and this includes operations for which ordering is not valuable. From brian.goetz at oracle.com Tue Oct 23 07:07:27 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 23 Oct 2012 10:07:27 -0400 Subject: Encounter order In-Reply-To: <50869EE8.8020200@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> Message-ID: <5086A49F.20205@oracle.com> Here's a doc on Scala's behavior here -- they go out of their way to preserve encounter order, and require only that reducers be associative: http://docs.scala-lang.org/overviews/parallel-collections/overview.html On 10/23/2012 9:43 AM, Brian Goetz wrote: >> On the other hand I might be a little upset if I have to re-sort my >> billion element collection after filtering out the blue blocks ;-) > > Yes, this is the sort of user expectation I'm trying to make more > precise. It's easy to say "no order at all" (and this is fine for > x.parallel().forEach(...)), but I think we do have expectations of order > preservation, and I am trying to tease them out. > > Whether an encounter order is provided by the source or an intermediate > stage should not matter. So, if you expect: > > list.parallel().sort().toArray() > > to result in a sorted array, then I think you should also expect > > sortedSet.parallel().toArray() > > to result in a sorted array. > > Similarly, if you expect the first one to be sorted, I'm pretty sure you > expect this to be sorted too: > > list.sorted().filter(x -> x.color != BLUE).toArray() > > Which means I think you expect filter to be order-preserving. > > > Similarly, take reduce. Typically a reducing function is expected to be > associative but not commutative. But that places a constraint on order; > we can't just feed them to the reducer in random order. And I think > there is no getting around this one -- requiring reducing functions be > commutative is too strong a requirement. So there's at least one > example where we absolutely must pay attention to order. (The cost of > preserving order here is a little extra bookkeeping in the > decomposition; we have to keep track of the order of a given node's > children (e.g., left and right child pointers), so the children's > results can be combined properly.) > >> Is it feasible to identify and implement order-preserving operations >> without either sacrificing direct performance (to do the op in an order >> preserving way) or requiring an additional sort step? > > We of course want to minimize the cost of preserving order where needed. > And many ops have friendly parallel solutions that don't involve > arbitrary sequencing or extra copying. For example, we can parallelize > list.parallel().map().toArray() perfectly (modulo some assumptions about > splitting), where we decompose the input list in such a way that we know > where in the output array each chunk is going to go, and in parallel, > compute and write the mapped results to exactly the right place in the > output array. So order preservation is not always expensive. Other ops > have minor costs of preserving order (like the reduce example.) Still > others have bigger costs. > > But we don't want to pay for ordering where it is not wanted, and this > includes operations for which ordering is not valuable. From andrey.breslav at jetbrains.com Tue Oct 23 11:06:01 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Tue, 23 Oct 2012 22:06:01 +0400 Subject: Encounter order In-Reply-To: <50869EE8.8020200@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> Message-ID: My first hunch is "of course parallel() has no order guarantees". And my hunch is that sorted() should be a terminal operation (for a different reason than its interaction with parallel(), which I expressed earlier on the lambda-spec-experts list), so the examples presented so far do not change my view that much. Which means I'm OK with having no order preserved after parallel. But if preserving the order comes at a price so low that we can afford it (performance-wise and, more importantly, explanation-wise), why not? So the question for me is only "how much does it cost?" On Oct 23, 2012, at 17:43 , Brian Goetz wrote: >> On the other hand I might be a little upset if I have to re-sort my >> billion element collection after filtering out the blue blocks ;-) > > Yes, this is the sort of user expectation I'm trying to make more precise. It's easy to say "no order at all" (and this is fine for x.parallel().forEach(...)), but I think we do have expectations of order preservation, and I am trying to tease them out. > > Whether an encounter order is provided by the source or an intermediate stage should not matter. So, if you expect: > > list.parallel().sort().toArray() > > to result in a sorted array, then I think you should also expect > > sortedSet.parallel().toArray() > > to result in a sorted array. > > Similarly, if you expect the first one to be sorted, I'm pretty sure you expect this to be sorted too: > > list.sorted().filter(x -> x.color != BLUE).toArray() > > Which means I think you expect filter to be order-preserving. > > > Similarly, take reduce. Typically a reducing function is expected to be associative but not commutative. But that places a constraint on order; we can't just feed them to the reducer in random order. And I think there is no getting around this one -- requiring reducing functions be commutative is too strong a requirement. So there's at least one example where we absolutely must pay attention to order. (The cost of preserving order here is a little extra bookkeeping in the decomposition; we have to keep track of the order of a given node's children (e.g., left and right child pointers), so the children's results can be combined properly.) > >> Is it feasible to identify and implement order-preserving operations >> without either sacrificing direct performance (to do the op in an order >> preserving way) or requiring an additional sort step? > > We of course want to minimize the cost of preserving order where needed. And many ops have friendly parallel solutions that don't involve arbitrary sequencing or extra copying. For example, we can parallelize list.parallel().map().toArray() perfectly (modulo some assumptions about splitting), where we decompose the input list in such a way that we know where in the output array each chunk is going to go, and in parallel, compute and write the mapped results to exactly the right place in the output array. So order preservation is not always expensive. Other ops have minor costs of preserving order (like the reduce example.) Still others have bigger costs. > > But we don't want to pay for ordering where it is not wanted, and this includes operations for which ordering is not valuable. -- Andrey Breslav http://jetbrains.com Develop with pleasure! From joe.bowbeer at gmail.com Tue Oct 23 11:17:03 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Tue, 23 Oct 2012 11:17:03 -0700 Subject: Encounter order In-Reply-To: References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> Message-ID: This is also my expectation/intuition regarding sorted(). On Oct 23, 2012 11:06 AM, "Andrey Breslav" wrote: > My first hunch is "of course parallel() has no order guarantees". > > And my hunch is that sorted() should be a terminal operation (for a > different reason than its interaction with parallel(), which I expressed > earlier on the lambda-spec-experts list), > so the examples presented so far do not change my view that much. > > Which means I'm OK with having no order preserved after parallel. > > But if preserving the order comes at a price so low that we can afford it > (performance-wise and, more importantly, explanation-wise), why not? > > So the question for me is only "how much does it cost?" > > On Oct 23, 2012, at 17:43 , Brian Goetz wrote: > > >> On the other hand I might be a little upset if I have to re-sort my > >> billion element collection after filtering out the blue blocks ;-) > > > > Yes, this is the sort of user expectation I'm trying to make more > precise. It's easy to say "no order at all" (and this is fine for > x.parallel().forEach(...)), but I think we do have expectations of order > preservation, and I am trying to tease them out. > > > > Whether an encounter order is provided by the source or an intermediate > stage should not matter. So, if you expect: > > > > list.parallel().sort().toArray() > > > > to result in a sorted array, then I think you should also expect > > > > sortedSet.parallel().toArray() > > > > to result in a sorted array. > > > > Similarly, if you expect the first one to be sorted, I'm pretty sure you > expect this to be sorted too: > > > > list.sorted().filter(x -> x.color != BLUE).toArray() > > > > Which means I think you expect filter to be order-preserving. > > > > > > Similarly, take reduce. Typically a reducing function is expected to be > associative but not commutative. But that places a constraint on order; we > can't just feed them to the reducer in random order. And I think there is > no getting around this one -- requiring reducing functions be commutative > is too strong a requirement. So there's at least one example where we > absolutely must pay attention to order. (The cost of preserving order here > is a little extra bookkeeping in the decomposition; we have to keep track > of the order of a given node's children (e.g., left and right child > pointers), so the children's results can be combined properly.) > > > >> Is it feasible to identify and implement order-preserving operations > >> without either sacrificing direct performance (to do the op in an order > >> preserving way) or requiring an additional sort step? > > > > We of course want to minimize the cost of preserving order where needed. > And many ops have friendly parallel solutions that don't involve arbitrary > sequencing or extra copying. For example, we can parallelize > list.parallel().map().toArray() perfectly (modulo some assumptions about > splitting), where we decompose the input list in such a way that we know > where in the output array each chunk is going to go, and in parallel, > compute and write the mapped results to exactly the right place in the > output array. So order preservation is not always expensive. Other ops > have minor costs of preserving order (like the reduce example.) Still > others have bigger costs. > > > > But we don't want to pay for ordering where it is not wanted, and this > includes operations for which ordering is not valuable. > > -- > Andrey Breslav > http://jetbrains.com > Develop with pleasure! > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121023/41d26fc5/attachment.html From brian.goetz at oracle.com Tue Oct 23 13:45:56 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 23 Oct 2012 16:45:56 -0400 Subject: Encounter order In-Reply-To: References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> Message-ID: <50870204.1070906@oracle.com> OK, let me try this a different way. Let's separate *result* from *side effects*. Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.)) For better or worse, we're used to a world where too much is done via side-effects, so perhaps that's where all of this "no ordering" assumption is coming from? If so, let's ignore side effects for a second, since many calculations can be expressed purely functionally, such as : [ 1, 2, 3 ].map( x -> x*2 ) Is there anyone claiming the answer is NOT required to be [ 2, 4, 6 ] ? Because this is what I'm hearing when I hear people (three of them, now) say "encounter order should be ignored." (Alternately, this claim amounts to saying that list.parallel().map() should return a multiset rather than a list.) On 10/23/2012 2:17 PM, Joe Bowbeer wrote: > This is also my expectation/intuition regarding sorted(). > > On Oct 23, 2012 11:06 AM, "Andrey Breslav" > wrote: > > My first hunch is "of course parallel() has no order guarantees". > > And my hunch is that sorted() should be a terminal operation (for a > different reason than its interaction with parallel(), which I > expressed earlier on the lambda-spec-experts list), > so the examples presented so far do not change my view that much. > > Which means I'm OK with having no order preserved after parallel. > > But if preserving the order comes at a price so low that we can > afford it (performance-wise and, more importantly, > explanation-wise), why not? > > So the question for me is only "how much does it cost?" > > On Oct 23, 2012, at 17:43 , Brian Goetz wrote: > > >> On the other hand I might be a little upset if I have to re-sort my > >> billion element collection after filtering out the blue blocks ;-) > > > > Yes, this is the sort of user expectation I'm trying to make more > precise. It's easy to say "no order at all" (and this is fine for > x.parallel().forEach(...)), but I think we do have expectations of > order preservation, and I am trying to tease them out. > > > > Whether an encounter order is provided by the source or an > intermediate stage should not matter. So, if you expect: > > > > list.parallel().sort().toArray() > > > > to result in a sorted array, then I think you should also expect > > > > sortedSet.parallel().toArray() > > > > to result in a sorted array. > > > > Similarly, if you expect the first one to be sorted, I'm pretty > sure you expect this to be sorted too: > > > > list.sorted().filter(x -> x.color != BLUE).toArray() > > > > Which means I think you expect filter to be order-preserving. > > > > > > Similarly, take reduce. Typically a reducing function is > expected to be associative but not commutative. But that places a > constraint on order; we can't just feed them to the reducer in > random order. And I think there is no getting around this one -- > requiring reducing functions be commutative is too strong a > requirement. So there's at least one example where we absolutely > must pay attention to order. (The cost of preserving order here is > a little extra bookkeeping in the decomposition; we have to keep > track of the order of a given node's children (e.g., left and right > child pointers), so the children's results can be combined properly.) > > > >> Is it feasible to identify and implement order-preserving operations > >> without either sacrificing direct performance (to do the op in > an order > >> preserving way) or requiring an additional sort step? > > > > We of course want to minimize the cost of preserving order where > needed. And many ops have friendly parallel solutions that don't > involve arbitrary sequencing or extra copying. For example, we can > parallelize list.parallel().map().toArray() perfectly (modulo some > assumptions about splitting), where we decompose the input list in > such a way that we know where in the output array each chunk is > going to go, and in parallel, compute and write the mapped results > to exactly the right place in the output array. So order > preservation is not always expensive. Other ops have minor costs of > preserving order (like the reduce example.) Still others have > bigger costs. > > > > But we don't want to pay for ordering where it is not wanted, and > this includes operations for which ordering is not valuable. > > -- > Andrey Breslav > http://jetbrains.com > Develop with pleasure! > > From paul.sandoz at oracle.com Wed Oct 24 06:23:31 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 24 Oct 2012 15:23:31 +0200 Subject: Encounter order In-Reply-To: <50870204.1070906@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> <50870204.1070906@oracle.com> Message-ID: <043BA4AF-156B-4F5C-AA87-702084305191@oracle.com> On Oct 23, 2012, at 10:45 PM, Brian Goetz wrote: > OK, let me try this a different way. > > Let's separate *result* from *side effects*. Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.)) Agreed. The details of the parallelism should be abstracted from the developer. The act of parallelism should for the most part be an implementation detail that may or may not be applied depending on the problem and resources at hand. Guy's talk is very relevant: http://www.infoq.com/presentations/Thinking-Parallel-Programming Especially slides 36-47 and the last 5 minutes of the talk. As Guy says this has costs and overheads just as garbage collectors do but those costs are considered acceptable because they free the developer to focus on what is important for solving their problem. Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121024/04932f0a/attachment.html From spullara at gmail.com Wed Oct 24 06:40:01 2012 From: spullara at gmail.com (Sam Pullara) Date: Wed, 24 Oct 2012 09:40:01 -0400 Subject: Encounter order In-Reply-To: <50870204.1070906@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> <50870204.1070906@oracle.com> Message-ID: <1230002283163840002@unknownmsgid> [2, 4, 6] is my expectation calculated in parallel or not. Sam All my photos are panoramas. On Oct 23, 2012, at 5:01 PM, Brian Goetz wrote: > OK, let me try this a different way. > > Let's separate *result* from *side effects*. Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.)) For better or worse, we're used to a world where too much is done via side-effects, so perhaps that's where all of this "no ordering" assumption is coming from? If so, let's ignore side effects for a second, since many calculations can be expressed purely functionally, such as : > > [ 1, 2, 3 ].map( x -> x*2 ) > > Is there anyone claiming the answer is NOT required to be > > [ 2, 4, 6 ] > > ? Because this is what I'm hearing when I hear people (three of them, now) say "encounter order should be ignored." (Alternately, this claim amounts to saying that list.parallel().map() should return a multiset rather than a list.) > > On 10/23/2012 2:17 PM, Joe Bowbeer wrote: >> This is also my expectation/intuition regarding sorted(). >> >> On Oct 23, 2012 11:06 AM, "Andrey Breslav" > > wrote: >> >> My first hunch is "of course parallel() has no order guarantees". >> >> And my hunch is that sorted() should be a terminal operation (for a >> different reason than its interaction with parallel(), which I >> expressed earlier on the lambda-spec-experts list), >> so the examples presented so far do not change my view that much. >> >> Which means I'm OK with having no order preserved after parallel. >> >> But if preserving the order comes at a price so low that we can >> afford it (performance-wise and, more importantly, >> explanation-wise), why not? >> >> So the question for me is only "how much does it cost?" >> >> On Oct 23, 2012, at 17:43 , Brian Goetz wrote: >> >> >> On the other hand I might be a little upset if I have to re-sort my >> >> billion element collection after filtering out the blue blocks ;-) >> > >> > Yes, this is the sort of user expectation I'm trying to make more >> precise. It's easy to say "no order at all" (and this is fine for >> x.parallel().forEach(...)), but I think we do have expectations of >> order preservation, and I am trying to tease them out. >> > >> > Whether an encounter order is provided by the source or an >> intermediate stage should not matter. So, if you expect: >> > >> > list.parallel().sort().toArray() >> > >> > to result in a sorted array, then I think you should also expect >> > >> > sortedSet.parallel().toArray() >> > >> > to result in a sorted array. >> > >> > Similarly, if you expect the first one to be sorted, I'm pretty >> sure you expect this to be sorted too: >> > >> > list.sorted().filter(x -> x.color != BLUE).toArray() >> > >> > Which means I think you expect filter to be order-preserving. >> > >> > >> > Similarly, take reduce. Typically a reducing function is >> expected to be associative but not commutative. But that places a >> constraint on order; we can't just feed them to the reducer in >> random order. And I think there is no getting around this one -- >> requiring reducing functions be commutative is too strong a >> requirement. So there's at least one example where we absolutely >> must pay attention to order. (The cost of preserving order here is >> a little extra bookkeeping in the decomposition; we have to keep >> track of the order of a given node's children (e.g., left and right >> child pointers), so the children's results can be combined properly.) >> > >> >> Is it feasible to identify and implement order-preserving operations >> >> without either sacrificing direct performance (to do the op in >> an order >> >> preserving way) or requiring an additional sort step? >> > >> > We of course want to minimize the cost of preserving order where >> needed. And many ops have friendly parallel solutions that don't >> involve arbitrary sequencing or extra copying. For example, we can >> parallelize list.parallel().map().toArray() perfectly (modulo some >> assumptions about splitting), where we decompose the input list in >> such a way that we know where in the output array each chunk is >> going to go, and in parallel, compute and write the mapped results >> to exactly the right place in the output array. So order >> preservation is not always expensive. Other ops have minor costs of >> preserving order (like the reduce example.) Still others have >> bigger costs. >> > >> > But we don't want to pay for ordering where it is not wanted, and >> this includes operations for which ordering is not valuable. >> >> -- >> Andrey Breslav >> http://jetbrains.com >> Develop with pleasure! >> >> From daniel.smith at oracle.com Wed Oct 24 13:23:47 2012 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 24 Oct 2012 14:23:47 -0600 Subject: Encounter order In-Reply-To: <50870204.1070906@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> <50870204.1070906@oracle.com> Message-ID: On Oct 23, 2012, at 2:45 PM, Brian Goetz wrote: > OK, let me try this a different way. > > Let's separate *result* from *side effects*. Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.)) Sounds like this discussion is mixing two different kinds of parallel streams. Parallel stream Kind A is a Stream that produces identical results to a serial stream, but is allowed have out-of-order and out-of-thread side effects. Parallel stream Kind B is a different entity that makes no guarantees about iteration order. This is not a Stream at all, and should be represented with a different interface; many operations in Stream should not exist in this interface, and those that remain will likely have different contracts (e.g., 'reduce' requires a commutative operator). (Aside 1: Sorry I don't have better names. :-)) (Aside 2: I think the decision to merge serial and parallel streams under one interface was made with Kind A in mind.) The question is, I think, which kind of parallel stream we're really talking about when we say we want "parallel collections." It's possible the answer is "both," and if that's the case, I think they should be implemented separately, rather than trying to mesh the two into one. It's also possible that the performance gains enabled by Kind B don't justify its existence. > [ 1, 2, 3 ].map( x -> x*2 ) > > Is there anyone claiming the answer is NOT required to be > > [ 2, 4, 6 ] You mean '[ 1, 2, 3 ].parallel().map( x -> x*2 )'. For Kind B, this means '{ 1, 2, 3 }.map( x -> x*2 )' (using braces to suggest a set). > (Alternately, this claim amounts to saying that list.parallel().map() should return a multiset rather than a list.) Even more fundamentally, a Kind B stream should be modeled so that 'list.parallel()' is already a multiset, before 'map' is called. ?Dan From brian.goetz at oracle.com Wed Oct 24 14:12:19 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 24 Oct 2012 17:12:19 -0400 Subject: Encounter order In-Reply-To: References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> <50870204.1070906@oracle.com> Message-ID: <508859B3.7070804@oracle.com> Right. While we could choose between Kind A and Kind B, I think choosing kind B is kind of silly. As you say, it already alters the semantics and would call for a different API -- results automatically become multisets. Since Java doesn't even have a multiset type, this will be foreign -- and also (IMO) not very useful. The general idea here is that parallelism is an optimization, which can affect side-effects, but not the result. (This is the choice that Scala, Clojure, Haskell, Mathematica, Fortress, and others have taken.) Since Java does so much with side-effects, sometimes it is hard to separate the result from the effects, but we should try anyway. Here's an example: stream.map(x -> { System.out.println(x); return x; }) .toArray(); Here, I would say that in either a serial or parallel stream, this should result in a copy of the stream source, but in parallel, the ordering (and more) of the side-effects (println) are unpredictable. (Operations like forEach are all side-effect.) What this means is that the question of "what should parallel toArray do" isn't the right question; the question is "what should toArray do". For example, one question we could ask is "what happens when you call reduce on a stream without an encounter order." One possibility is "hope that the reducer is commutative"; another is to treat reduction on a order-less stream as an error. Unfortunately we don't have ways in the type system to tag lambdas as commutative or associative, so the user is on their own to avoid GIGO here. This isn't great, as it requires some nonmodular reasoning on the part of the user. On the other hand, this isn't all that different from having to know "is this collection thread-safe or not." On 10/24/2012 4:23 PM, Dan Smith wrote: > On Oct 23, 2012, at 2:45 PM, Brian Goetz wrote: > >> OK, let me try this a different way. >> >> Let's separate *result* from *side effects*. Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.)) > > Sounds like this discussion is mixing two different kinds of parallel streams. > > Parallel stream Kind A is a Stream that produces identical results to a serial stream, but is allowed have out-of-order and out-of-thread side effects. > > Parallel stream Kind B is a different entity that makes no guarantees about iteration order. This is not a Stream at all, and should be represented with a different interface; many operations in Stream should not exist in this interface, and those that remain will likely have different contracts (e.g., 'reduce' requires a commutative operator). > > (Aside 1: Sorry I don't have better names. :-)) > (Aside 2: I think the decision to merge serial and parallel streams under one interface was made with Kind A in mind.) > > The question is, I think, which kind of parallel stream we're really talking about when we say we want "parallel collections." It's possible the answer is "both," and if that's the case, I think they should be implemented separately, rather than trying to mesh the two into one. It's also possible that the performance gains enabled by Kind B don't justify its existence. > >> [ 1, 2, 3 ].map( x -> x*2 ) >> >> Is there anyone claiming the answer is NOT required to be >> >> [ 2, 4, 6 ] > > You mean '[ 1, 2, 3 ].parallel().map( x -> x*2 )'. For Kind B, this means '{ 1, 2, 3 }.map( x -> x*2 )' (using braces to suggest a set). > >> (Alternately, this claim amounts to saying that list.parallel().map() should return a multiset rather than a list.) > > Even more fundamentally, a Kind B stream should be modeled so that 'list.parallel()' is already a multiset, before 'map' is called. > > ?Dan > From daniel.smith at oracle.com Wed Oct 24 15:04:54 2012 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 24 Oct 2012 16:04:54 -0600 Subject: Encounter order In-Reply-To: <508859B3.7070804@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> <50870204.1070906@oracle.com> <508859B3.7070804@oracle.com> Message-ID: <14DBDB86-2A76-459A-854F-EF9CDF50E73E@oracle.com> On Oct 24, 2012, at 3:12 PM, Brian Goetz wrote: > Right. While we could choose between Kind A and Kind B, I think choosing kind B is kind of silly. As you say, it already alters the semantics and would call for a different API -- results automatically become multisets. Since Java doesn't even have a multiset type, this will be foreign -- and also (IMO) not very useful. I think we're on the same page, but just to make sure my point is clear: choosing between Kind A and Kind B is fine; choosing both is also fine; what I would consider bad are: - Mixing Kind A and Kind B behavior into a single type (e.g., preserving order for 'map' but not 'into') - Making Kind B a Stream I'm not suggesting modeling Kind B with a new full-featured Collection -- just a ParallelStream interface that is unrelated to Stream. > The general idea here is that parallelism is an optimization, which can affect side-effects, but not the result. (This is the choice that Scala, Clojure, Haskell, Mathematica, Fortress, and others have taken.) > > Since Java does so much with side-effects, sometimes it is hard to separate the result from the effects, but we should try anyway. Here's an example: > > stream.map(x -> { System.out.println(x); return x; }) > .toArray(); > > Here, I would say that in either a serial or parallel stream, this should result in a copy of the stream source, but in parallel, the ordering (and more) of the side-effects (println) are unpredictable. (Operations like forEach are all side-effect.) > > What this means is that the question of "what should parallel toArray do" isn't the right question; the question is "what should toArray do". For Kind A, yes, absolutely. > For example, one question we could ask is "what happens when you call reduce on a stream without an encounter order." One possibility is "hope that the reducer is commutative"; another is to treat reduction on a order-less stream as an error. Unfortunately we don't have ways in the type system to tag lambdas as commutative or associative, so the user is on their own to avoid GIGO here. This isn't great, as it requires some nonmodular reasoning on the part of the user. On the other hand, this isn't all that different from having to know "is this collection thread-safe or not." So this is a Kind B stream. I'm saying an error would never be what we'd want -- since this is a new type unrelated to Stream, if we don't want to support an operation, it simply doesn't exist. On commutativity: 1) Kind A streams require associativity -- ultimately, the same problem. 2) Our functional interface story says that the type _can_ encode properties like this. They're just informal, so the compiler can't enforce them. (I'm not just being pedantic -- informal contracts are an important part of the language.) The user is responsible for providing a valid AssociativeOperator when that's what is asked for -- hence, GIGO. But I do think it's important that properties like this be expressed with the type, so that we're as clear as possible to clients about how the client's operation needs to behave (e.g., the IDE says "give me an AssociativeOperator", not "give me a binary function"). ?Dan > On 10/24/2012 4:23 PM, Dan Smith wrote: >> On Oct 23, 2012, at 2:45 PM, Brian Goetz wrote: >> >>> OK, let me try this a different way. >>> >>> Let's separate *result* from *side effects*. Parallelism can change the timing of side-effects, but (I argue) should not change the result of a computation (e.g., summing integers in parallel should yield the same result as summing them sequentially (modulo overflow.)) >> >> Sounds like this discussion is mixing two different kinds of parallel streams. >> >> Parallel stream Kind A is a Stream that produces identical results to a serial stream, but is allowed have out-of-order and out-of-thread side effects. >> >> Parallel stream Kind B is a different entity that makes no guarantees about iteration order. This is not a Stream at all, and should be represented with a different interface; many operations in Stream should not exist in this interface, and those that remain will likely have different contracts (e.g., 'reduce' requires a commutative operator). >> >> (Aside 1: Sorry I don't have better names. :-)) >> (Aside 2: I think the decision to merge serial and parallel streams under one interface was made with Kind A in mind.) >> >> The question is, I think, which kind of parallel stream we're really talking about when we say we want "parallel collections." It's possible the answer is "both," and if that's the case, I think they should be implemented separately, rather than trying to mesh the two into one. It's also possible that the performance gains enabled by Kind B don't justify its existence. >> >>> [ 1, 2, 3 ].map( x -> x*2 ) >>> >>> Is there anyone claiming the answer is NOT required to be >>> >>> [ 2, 4, 6 ] >> >> You mean '[ 1, 2, 3 ].parallel().map( x -> x*2 )'. For Kind B, this means '{ 1, 2, 3 }.map( x -> x*2 )' (using braces to suggest a set). >> >>> (Alternately, this claim amounts to saying that list.parallel().map() should return a multiset rather than a list.) >> >> Even more fundamentally, a Kind B stream should be modeled so that 'list.parallel()' is already a multiset, before 'map' is called. >> >> ?Dan >> From brian.goetz at oracle.com Wed Oct 24 15:12:20 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 24 Oct 2012 18:12:20 -0400 Subject: Encounter order In-Reply-To: <14DBDB86-2A76-459A-854F-EF9CDF50E73E@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> <50870204.1070906@oracle.com> <508859B3.7070804@oracle.com> <14DBDB86-2A76-459A-854F-EF9CDF50E73E@oracle.com> Message-ID: <508867C4.1040601@oracle.com> > I think we're on the same page, but just to make sure my point is clear: choosing between Kind A and Kind B is fine; choosing both is also fine; what I would consider bad are: > - Mixing Kind A and Kind B behavior into a single type (e.g., preserving order for 'map' but not 'into') Each operation can have its own semantics, which includes what it does with encounter order. For example, findFirst returns the first element of the stream in encounter order; findAny ignores encounter order completely, and is allowed to return a random element. Both are OK and useful. Similarly, I think we have latitude to define operations like into() either as producing a result or as having side-effects -- as long as we are clear which we are choosing. Or having forEach and forEachInOrder. But we should be clear that we are defining the semantics of the operation, not of the serial or parallel flavor of the operation. >> For example, one question we could ask is "what happens when you >> call reduce on a stream without an encounter order." One >> possibility is "hope that the reducer is commutative"; another is >> to treat reduction on a order-less stream as an error. >> Unfortunately we don't have ways in the type system to tag lambdas >> as commutative or associative, so the user is on their own to avoid >> GIGO here. This isn't great, as it requires some nonmodular >> reasoning on the part of the user. On the other hand, this isn't >> all that different from having to know "is this collection >> thread-safe or not." > > So this is a Kind B stream. I'm saying an error would never be what > we'd want -- since this is a new type unrelated to Stream, if we > don't want to support an operation, it simply doesn't exist. I don't think so. Stream sources are free to declare whether they have an encounter order or not. We could of course decide all streams have an encounter order (imposed by their iterators). But outlawing set.stream().reduce() seems pretty harsh. From daniel.smith at oracle.com Wed Oct 24 15:35:01 2012 From: daniel.smith at oracle.com (Dan Smith) Date: Wed, 24 Oct 2012 16:35:01 -0600 Subject: Encounter order In-Reply-To: <508867C4.1040601@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> <50870204.1070906@oracle.com> <508859B3.7070804@oracle.com> <14DBDB86-2A76-459A-854F-EF9CDF50E73E@oracle.com> <508867C4.1040601@oracle.com> Message-ID: <54C8D733-05CE-4C2B-8DA3-2920821136DE@oracle.com> On Oct 24, 2012, at 4:12 PM, Brian Goetz wrote: >>> For example, one question we could ask is "what happens when you >>> call reduce on a stream without an encounter order." One >>> possibility is "hope that the reducer is commutative"; another is >>> to treat reduction on a order-less stream as an error. >>> Unfortunately we don't have ways in the type system to tag lambdas >>> as commutative or associative, so the user is on their own to avoid >>> GIGO here. This isn't great, as it requires some nonmodular >>> reasoning on the part of the user. On the other hand, this isn't >>> all that different from having to know "is this collection >>> thread-safe or not." >> >> So this is a Kind B stream. I'm saying an error would never be what >> we'd want -- since this is a new type unrelated to Stream, if we >> don't want to support an operation, it simply doesn't exist. > > I don't think so. Stream sources are free to declare whether they have an encounter order or not. We could of course decide all streams have an encounter order (imposed by their iterators). But outlawing set.stream().reduce() seems pretty harsh. 'set.stream()', if it's parallel at all, is presumably a Kind A stream. Kind A streams are inherently ordered -- even if the source had to make arbitrary decisions about how to order it -- and can have operations that depend on order. 'set.asKindBParallel()' has no concept of order, and it makes no sense to give it order-dependent operations. (And, again, quite possibly Kind B streams can't justify their existence, and what you're designing is just Kind A.) ?Dan From david.holmes at oracle.com Wed Oct 24 17:05:48 2012 From: david.holmes at oracle.com (David Holmes) Date: Thu, 25 Oct 2012 10:05:48 +1000 Subject: Encounter order In-Reply-To: <54C8D733-05CE-4C2B-8DA3-2920821136DE@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> <50870204.1070906@oracle.com> <508859B3.7070804@oracle.com> <14DBDB86-2A76-459A-854F-EF9CDF50E73E@oracle.com> <508867C4.1040601@oracle.com> <54C8D733-05CE-4C2B-8DA3-2920821136DE@oracle.com> Message-ID: <5088825C.70701@oracle.com> As I discussed on IM with Brian yesterday it doesn't make sense to ask what properties the parallel() stream has. parallel() in and of itself does not preserve or mutate order. The question is what does parallel().op() do - and that depends both on the op() and what entity you applied parallel() to. For some ops like sum(), max() (the commutative ones) it doesn't matter how you implement them (serial, parallel) the result is always the same - given the input there is only one possible answer. For other ops, like findAny, there may be multiple possible answers, so the result is a function of both the input data and the algorithm used to locate it - hence a parallel implementation may not only produce a different result to a serial implementation, it may produce a different result each time it is executed. My concern with all this is that when writing the specification for SomeClass.op we have to understand what actions can precede it in the pipeline and how they affect the semantics that op provides. David On 25/10/2012 8:35 AM, Dan Smith wrote: > On Oct 24, 2012, at 4:12 PM, Brian Goetz wrote: > >>>> For example, one question we could ask is "what happens when you >>>> call reduce on a stream without an encounter order." One >>>> possibility is "hope that the reducer is commutative"; another is >>>> to treat reduction on a order-less stream as an error. >>>> Unfortunately we don't have ways in the type system to tag lambdas >>>> as commutative or associative, so the user is on their own to avoid >>>> GIGO here. This isn't great, as it requires some nonmodular >>>> reasoning on the part of the user. On the other hand, this isn't >>>> all that different from having to know "is this collection >>>> thread-safe or not." >>> >>> So this is a Kind B stream. I'm saying an error would never be what >>> we'd want -- since this is a new type unrelated to Stream, if we >>> don't want to support an operation, it simply doesn't exist. >> >> I don't think so. Stream sources are free to declare whether they have an encounter order or not. We could of course decide all streams have an encounter order (imposed by their iterators). But outlawing set.stream().reduce() seems pretty harsh. > > 'set.stream()', if it's parallel at all, is presumably a Kind A stream. Kind A streams are inherently ordered -- even if the source had to make arbitrary decisions about how to order it -- and can have operations that depend on order. > > 'set.asKindBParallel()' has no concept of order, and it makes no sense to give it order-dependent operations. > > (And, again, quite possibly Kind B streams can't justify their existence, and what you're designing is just Kind A.) > > ?Dan From brian.goetz at oracle.com Wed Oct 24 17:13:04 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Wed, 24 Oct 2012 20:13:04 -0400 Subject: Encounter order In-Reply-To: <5088825C.70701@oracle.com> References: <5086177D.8050706@oracle.com> <50862EDE.3060900@oracle.com> <50869EE8.8020200@oracle.com> <50870204.1070906@oracle.com> <508859B3.7070804@oracle.com> <14DBDB86-2A76-459A-854F-EF9CDF50E73E@oracle.com> <508867C4.1040601@oracle.com> <54C8D733-05CE-4C2B-8DA3-2920821136DE@oracle.com> <5088825C.70701@oracle.com> Message-ID: <50888410.4040903@oracle.com> > As I discussed on IM with Brian yesterday it doesn't make sense to ask > what properties the parallel() stream has. parallel() in and of itself > does not preserve or mutate order. The question is what does > parallel().op() do - and that depends both on the op() and what entity > you applied parallel() to. Right. The implementation maintains a set of flags describing properties of the source, such as "has encounter order", "is infinite", "is sorted by natural order", "is distinct", etc. Similarly, each op has a a set of corresponding flag modifiers, of the form ({sets, clears, preserves }, flag) So map() preserves size and encounter order, sort injects encounter order and sortedness, filter preserves order but not size, etc. > For some ops like sum(), max() (the commutative ones) it doesn't matter > how you implement them (serial, parallel) the result is always the same > - given the input there is only one possible answer. True. Though we don't have sum(), min(), max() on Stream -- because we don't know enough about the (erased) type parameter to know whether the thing is summable, maxable, etc. So instead we have reduce(BinaryOperator). The argument to reduce is traditionally assumed to be associative but not commutative. > For other ops, like findAny, there may be multiple possible answers, so > the result is a function of both the input data and the algorithm used > to locate it - hence a parallel implementation may not only produce a > different result to a serial implementation, it may produce a different > result each time it is executed. Correct. > My concern with all this is that when writing the specification for > SomeClass.op we have to understand what actions can precede it in the > pipeline and how they affect the semantics that op provides. I don't think so. Instead, it only has to be a function of the known upstream flags alluded to above. Which is a lot easier. So, findFirst can say something about what it does if the stream has no defined encounter order, but need not address the combinatorial explosion of ways a stream might find itself lacking such an order. From joe.bowbeer at gmail.com Thu Oct 25 07:28:37 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Thu, 25 Oct 2012 07:28:37 -0700 Subject: Encounter order Message-ID: If the term 'parallel' is not eliciting the right expectations, I wonder if a more cryptic term such as 'par' would serve better? On Oct 24, 2012 2:12 PM, "Brian Goetz" wrote: -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121025/fe5ce64d/attachment.html From brian.goetz at oracle.com Thu Oct 25 09:00:22 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 25 Oct 2012 12:00:22 -0400 Subject: Encounter order In-Reply-To: <5086177D.8050706@oracle.com> References: <5086177D.8050706@oracle.com> Message-ID: <50896216.9080808@oracle.com> Since we seem to be agreed that the question is not "what should parallel foo() do", but "what should foo() do", lets walk through the operations. Stream filter(Predicate predicate); Stream map(Mapper mapper); Stream flatMap(FlatMapper mapper); Stream tee(Block block); For stateless intermediate operations (filter, map, flatMap, tee), they are all capable of preserving encounter order at basically no cost. So if the stream source has an encounter order and the terminal operation wants to use encounter order, these guys just play along. Stream limit(int n); Stream skip(int n); Optional findFirst(); These are basically *about* encounter order, so obviously they would respect it. So, question is, what do they do for streams that have no defined encounter order? (My sense is: infer one from the iteration order. set.stream().findFirst() is perfectly reasonable, its just that "first" doesn't mean as much as it does on a list.) Preserving encounter order in parallel does have a real cost, but if the user asked for the first seven in encounter order, that's what we should give them. Optional findAny(); This one is explicitly about *ignoring* encounter order and optimizing for fastest return. void forEach(Block block); This is only about side effects, so encounter order shouldn't enter into the calculation. Elements are fed to the block in whatever order and thread they are available. Object[] toArray(); This one seems pretty clear that you expect to see the elements in the array in encounter order. (Again, if no encounter order, we should make one up based on iterator/forEach order.) In general we can still parallelize efficiently here (this depends on how early we know the sizes of each chunk; if we know these at split time the cost is basically zero.) Stream sorted(Comparator comparator); Since sort explicitly reorders things, one might think that encounter order is totally irrelevant. However, stability (preserving encounter order for values that are equal according to the comparator) is a desirable property of sorting (and stability does add considerable cost to parallel sorting.) So there's a decision to make here. (And (Andrey) this is relevant whether sorting is an intermediate or terminal operation.) Stream cumulate(BinaryOperator operator); This one is explicitly about encounter order too. There are efficient parallel algorithms for this (which is kind of the whole point of including it at all.) boolean anyMatch(Predicate predicate); boolean allMatch(Predicate predicate); boolean noneMatch(Predicate predicate); These are independent of encounter order. Optional reduce(BinaryOperator op); T reduce(T base, BinaryOperator op); U fold(Factory baseFactory, Combiner reducer, BinaryOperator combiner); Whether these are sensitive to encounter order depends on whether the operators are commutative or not. Traditionally, reduce/fold operators are expected to be associative but not commutative. The cost of respecting order in parallel here are minor; basically the bookkeeping overhead of remembering who your children are and their order, and the delays associated with tasks not completing until all their children complete. If the source has no encounter order, my inclination here is (again) to assume that the programmer understood that, impute an encounter order from the implementation, and feed the elements in that order. If the user provides a reducer that is associative but not commutative (e.g., String::concat), he may get a scrambled result, but this is no different from what happens when you iterate over a Set today. Stream uniqueElements(); This one is not obvious to me. On the one hand, yielding results in encounter order seems polite (like stable sorting, but more so); on the other, preserving order in parallel is fairly expensive. Map> groupBy(Mapper classifier); Map reduceBy(Mapper classifier, Factory baseFactory, Combiner reducer); Again, I am not sure what to do about these. Preserving encounter order in parallel definitely has a cost here. > A into(A target); This one is interesting, because in addition to "can the target support an order", we also have to be careful about "is the target thread-safe." What we've done here is implement it so that the target decides what to do; it gets passed the stream, and can choose to extract elements in serial or in parallel. Standard non-thread-safe collections force sequential insertion (though upstream operations can still proceed in parallel, so if you do list.filter().into(new ArrayList()), the filtering still proceeds in parallel, then the elements are collected in order and can be sequentially inserted into the new list.) So overall, the problematic operations are sort, uniqueElements, groupBy, and reduceBy, because these are the ones where the cost is not minor. Secondarily, we need to vet whether our simplifying assumptions about imputing an encounter order when one is needed are acceptable. In cases where we know there is no encounter order, the implementation is free (and does) use the more efficient approach. It is also trivial and basically cost-free (O(1) time cost) to introduce a new op, "unordered()", which would strip a stream of its encounter order. So if uniqueElements were deemed to require preserving order, but if the user didn't care, he could: list.unordered().uniqueElements().toArray(); and get the faster implementation of uniqueElements. Given this, my recommendation is: - Have all the problematic (sort, uniqueElements, groupBy, reduceBy) ops respect encounter order even though it may be expensive - Provide an .unordered() op for people to opt out of the encounter order when they know they don't care. - Impute encounter order if we need one and there isn't one (rather than throwing) From andrey.breslav at jetbrains.com Thu Oct 25 13:32:23 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Fri, 26 Oct 2012 00:32:23 +0400 Subject: Encounter order In-Reply-To: <50896216.9080808@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> Message-ID: <5089A1D7.3080802@jetbrains.com> > void forEach(Block block); > > This is only about side effects, so encounter order shouldn't enter > into the calculation. Elements are fed to the block in whatever order > and thread they are available. But do we guarantee the ordering of effects for sequential implementation? Just curious about what would the JavaDoc say for this method. > Given this, my recommendation is: > - Have all the problematic (sort, uniqueElements, groupBy, reduceBy) > ops respect encounter order even though it may be expensive > - Provide an .unordered() op for people to opt out of the encounter > order when they know they don't care. > - Impute encounter order if we need one and there isn't one (rather > than throwing) Overall, looks good to me. Maybe groupBy and reduceBy aren't worth the trouble: at least if something returns a Map, IMO nobody expects ordering in this map. Collections in groupBy are a different story: this is very much like stable sorting. From brian.goetz at oracle.com Thu Oct 25 13:45:17 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 25 Oct 2012 16:45:17 -0400 Subject: Encounter order In-Reply-To: <5089A1D7.3080802@jetbrains.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> <5089A1D7.3080802@jetbrains.com> Message-ID: <5089A4DD.9050108@oracle.com> >> void forEach(Block block); >> >> This is only about side effects, so encounter order shouldn't enter >> into the calculation. Elements are fed to the block in whatever order >> and thread they are available. > But do we guarantee the ordering of effects for sequential > implementation? Just curious about what would the JavaDoc say for this > method. Good question. What do you think we should guarantee here? (Its pretty hard to imagine an implementation that doesn't do this, but that's a different consideration.) I do think it is reasonable for us to say that the effects are predictable in serial and not predictable in parallel -- this is different from the *result*. >> Given this, my recommendation is: >> - Have all the problematic (sort, uniqueElements, groupBy, reduceBy) >> ops respect encounter order even though it may be expensive >> - Provide an .unordered() op for people to opt out of the encounter >> order when they know they don't care. >> - Impute encounter order if we need one and there isn't one (rather >> than throwing) > Overall, looks good to me. > > Maybe groupBy and reduceBy aren't worth the trouble: at least if > something returns a Map, IMO nobody expects ordering in this map. > Collections in groupBy are a different story: this is very much like > stable sorting. People don't expect ordering of the *keys*, and that's fine. What I was talking about was ordering of the values, since the Map returned is a Map>. For example: list = [ 1..10 ] Map> map = list.groupBy(e -> e % 2 == 0); Should the map be { true: [ 2, 4, 6, 8, 10 ], false: [ 1, 3, 5, 7, 9] } // respect order or { true: [ random evens ], false: [ random odds] } // ignore order That's the choice I think we face in groupBy and reduceBy. From forax at univ-mlv.fr Thu Oct 25 13:50:48 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Thu, 25 Oct 2012 22:50:48 +0200 Subject: Encounter order In-Reply-To: <5089A4DD.9050108@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> <5089A1D7.3080802@jetbrains.com> <5089A4DD.9050108@oracle.com> Message-ID: <5089A628.5080600@univ-mlv.fr> On 10/25/2012 10:45 PM, Brian Goetz wrote: >>> void forEach(Block block); >>> >>> This is only about side effects, so encounter order shouldn't enter >>> into the calculation. Elements are fed to the block in whatever order >>> and thread they are available. >> But do we guarantee the ordering of effects for sequential >> implementation? Just curious about what would the JavaDoc say for this >> method. > > Good question. What do you think we should guarantee here? (Its > pretty hard to imagine an implementation that doesn't do this, but > that's a different consideration.) I do think it is reasonable for us > to say that the effects are predictable in serial and not predictable > in parallel -- this is different from the *result*. PriorityQueue iterator already returns values in the order maintained by the priority queue. So I think it's fine to just say the order should be predictable for the same jdk. R?mi From daniel.smith at oracle.com Thu Oct 25 16:14:12 2012 From: daniel.smith at oracle.com (Dan Smith) Date: Thu, 25 Oct 2012 17:14:12 -0600 Subject: Encounter order In-Reply-To: <50896216.9080808@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> Message-ID: <94F7AE1E-7E10-4669-9794-EBC01F04E35C@oracle.com> On Oct 25, 2012, at 10:00 AM, Brian Goetz wrote: > list.unordered().uniqueElements().toArray(); > Given this, my recommendation is: > - Provide an .unordered() op for people to opt out of the encounter order when they know they don't care. Ah, 'unordered' is my 'asKindBParallel' operation. To make my previous discussion concrete, I'm arguing that the result of 'unordered' -- call it an 'UnorderedStream' -- should _not_ extend 'Stream', but rather should have its own set of operations that make sense in an unordered world (e.g., no 'findFirst', no 'skip', commutative-op-only 'reduce', etc.). (Back to my cave now...) ?Dan From brian.goetz at oracle.com Thu Oct 25 16:16:28 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Thu, 25 Oct 2012 19:16:28 -0400 Subject: Encounter order In-Reply-To: <94F7AE1E-7E10-4669-9794-EBC01F04E35C@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> <94F7AE1E-7E10-4669-9794-EBC01F04E35C@oracle.com> Message-ID: <5089C84C.8050800@oracle.com> Maybe... If you accept that set.findFirst is not a syntax error (not yet agreed, but I'm pushing this story), then list.unordered.findFirst shouldn't be either. What .unordered would mean is "I know there's an encounter order, but I don't care about it, do things as fast as you can." In reality this wouldn't likely affect the performance of reduce, findFirst, etc. But it would likely improve the performance of groupBy, uniqueElements, etc. On 10/25/2012 7:14 PM, Dan Smith wrote: > On Oct 25, 2012, at 10:00 AM, Brian Goetz wrote: > >> list.unordered().uniqueElements().toArray(); > >> Given this, my recommendation is: >> - Provide an .unordered() op for people to opt out of the encounter order when they know they don't care. > > Ah, 'unordered' is my 'asKindBParallel' operation. To make my previous discussion concrete, I'm arguing that the result of 'unordered' -- call it an 'UnorderedStream' -- should _not_ extend 'Stream', but rather should have its own set of operations that make sense in an unordered world (e.g., no 'findFirst', no 'skip', commutative-op-only 'reduce', etc.). > > (Back to my cave now...) > > ?Dan > From david.holmes at oracle.com Thu Oct 25 17:47:04 2012 From: david.holmes at oracle.com (David Holmes) Date: Fri, 26 Oct 2012 10:47:04 +1000 Subject: Encounter order In-Reply-To: <50896216.9080808@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> Message-ID: <5089DD88.2020304@oracle.com> Hi Brian, One comment regarding sorted() - we have already established that the parallelSort implementation for objects is not a stable sort, so I assume sorted() must behave similarly. I will also comment that not having looked at the implementation of any of this, it isn't obvious to me that certain operations either must be order-preserving, or that order-preservation is cheap. For example filter. I can imagine parallel algorithms that won't preserve encounter order without a lot of additional management put in place. David On 26/10/2012 2:00 AM, Brian Goetz wrote: > Since we seem to be agreed that the question is not "what should > parallel foo() do", but "what should foo() do", lets walk through the > operations. > > Stream filter(Predicate predicate); > Stream map(Mapper mapper); > Stream flatMap(FlatMapper mapper); > Stream tee(Block block); > > For stateless intermediate operations (filter, map, flatMap, tee), they > are all capable of preserving encounter order at basically no cost. So > if the stream source has an encounter order and the terminal operation > wants to use encounter order, these guys just play along. > > Stream limit(int n); > Stream skip(int n); > Optional findFirst(); > > These are basically *about* encounter order, so obviously they would > respect it. So, question is, what do they do for streams that have no > defined encounter order? (My sense is: infer one from the iteration > order. set.stream().findFirst() is perfectly reasonable, its just that > "first" doesn't mean as much as it does on a list.) Preserving encounter > order in parallel does have a real cost, but if the user asked for the > first seven in encounter order, that's what we should give them. > > Optional findAny(); > > This one is explicitly about *ignoring* encounter order and optimizing > for fastest return. > > void forEach(Block block); > > This is only about side effects, so encounter order shouldn't enter into > the calculation. Elements are fed to the block in whatever order and > thread they are available. > > Object[] toArray(); > > This one seems pretty clear that you expect to see the elements in the > array in encounter order. (Again, if no encounter order, we should make > one up based on iterator/forEach order.) In general we can still > parallelize efficiently here (this depends on how early we know the > sizes of each chunk; if we know these at split time the cost is > basically zero.) > > Stream sorted(Comparator comparator); > > Since sort explicitly reorders things, one might think that encounter > order is totally irrelevant. However, stability (preserving encounter > order for values that are equal according to the comparator) is a > desirable property of sorting (and stability does add considerable cost > to parallel sorting.) So there's a decision to make here. (And (Andrey) > this is relevant whether sorting is an intermediate or terminal operation.) > > Stream cumulate(BinaryOperator operator); > > This one is explicitly about encounter order too. There are efficient > parallel algorithms for this (which is kind of the whole point of > including it at all.) > > boolean anyMatch(Predicate predicate); > boolean allMatch(Predicate predicate); > boolean noneMatch(Predicate predicate); > > These are independent of encounter order. > > Optional reduce(BinaryOperator op); > T reduce(T base, BinaryOperator op); > U fold(Factory baseFactory, > Combiner reducer, > BinaryOperator combiner); > > Whether these are sensitive to encounter order depends on whether the > operators are commutative or not. Traditionally, reduce/fold operators > are expected to be associative but not commutative. The cost of > respecting order in parallel here are minor; basically the bookkeeping > overhead of remembering who your children are and their order, and the > delays associated with tasks not completing until all their children > complete. > > If the source has no encounter order, my inclination here is (again) to > assume that the programmer understood that, impute an encounter order > from the implementation, and feed the elements in that order. If the > user provides a reducer that is associative but not commutative (e.g., > String::concat), he may get a scrambled result, but this is no different > from what happens when you iterate over a Set today. > > Stream uniqueElements(); > > This one is not obvious to me. On the one hand, yielding results in > encounter order seems polite (like stable sorting, but more so); on the > other, preserving order in parallel is fairly expensive. > > Map> groupBy(Mapper > classifier); > Map reduceBy(Mapper classifier, > Factory baseFactory, > Combiner reducer); > > Again, I am not sure what to do about these. Preserving encounter order > in parallel definitely has a cost here. > > > A into(A target); > > This one is interesting, because in addition to "can the target support > an order", we also have to be careful about "is the target thread-safe." > What we've done here is implement it so that the target decides what to > do; it gets passed the stream, and can choose to extract elements in > serial or in parallel. Standard non-thread-safe collections force > sequential insertion (though upstream operations can still proceed in > parallel, so if you do list.filter().into(new ArrayList()), the > filtering still proceeds in parallel, then the elements are collected in > order and can be sequentially inserted into the new list.) > > > So overall, the problematic operations are sort, uniqueElements, > groupBy, and reduceBy, because these are the ones where the cost is not > minor. Secondarily, we need to vet whether our simplifying assumptions > about imputing an encounter order when one is needed are acceptable. > > In cases where we know there is no encounter order, the implementation > is free (and does) use the more efficient approach. It is also trivial > and basically cost-free (O(1) time cost) to introduce a new op, > "unordered()", which would strip a stream of its encounter order. So if > uniqueElements were deemed to require preserving order, but if the user > didn't care, he could: > > list.unordered().uniqueElements().toArray(); > > and get the faster implementation of uniqueElements. > > Given this, my recommendation is: > - Have all the problematic (sort, uniqueElements, groupBy, reduceBy) ops > respect encounter order even though it may be expensive > - Provide an .unordered() op for people to opt out of the encounter > order when they know they don't care. > - Impute encounter order if we need one and there isn't one (rather than > throwing) > From andrey.breslav at jetbrains.com Thu Oct 25 22:58:07 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Fri, 26 Oct 2012 09:58:07 +0400 Subject: Encounter order In-Reply-To: <5089A4DD.9050108@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> <5089A1D7.3080802@jetbrains.com> <5089A4DD.9050108@oracle.com> Message-ID: <508A266F.7070202@jetbrains.com> >>> void forEach(Block block); >>> >>> This is only about side effects, so encounter order shouldn't enter >>> into the calculation. Elements are fed to the block in whatever order >>> and thread they are available. >> But do we guarantee the ordering of effects for sequential >> implementation? Just curious about what would the JavaDoc say for this >> method. > > Good question. What do you think we should guarantee here? (Its > pretty hard to imagine an implementation that doesn't do this, but > that's a different consideration.) I do think it is reasonable for us > to say that the effects are predictable in serial and not predictable > in parallel -- this is different from the *result*. I think we should guarantee that forEach() has effects in the iteration order for the serial case. What I wanted to point out was that we will say things like "in serial ... in parallel" for the generic Stream interface which is supposed to be abstracting these things away. Not that I thought that it is very bad. >>> Given this, my recommendation is: >>> - Have all the problematic (sort, uniqueElements, groupBy, reduceBy) >>> ops respect encounter order even though it may be expensive >>> - Provide an .unordered() op for people to opt out of the encounter >>> order when they know they don't care. >>> - Impute encounter order if we need one and there isn't one (rather >>> than throwing) >> Overall, looks good to me. >> >> Maybe groupBy and reduceBy aren't worth the trouble: at least if >> something returns a Map, IMO nobody expects ordering in this map. >> Collections in groupBy are a different story: this is very much like >> stable sorting. > > People don't expect ordering of the *keys*, and that's fine. What I > was talking about was ordering of the values, since the Map returned > is a Map>. Your reduceBy() looked like this: Map reduceBy(...) no collections anywhere. Was that a mistake? From andrey.breslav at jetbrains.com Thu Oct 25 22:59:49 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Fri, 26 Oct 2012 09:59:49 +0400 Subject: Encounter order In-Reply-To: <94F7AE1E-7E10-4669-9794-EBC01F04E35C@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> <94F7AE1E-7E10-4669-9794-EBC01F04E35C@oracle.com> Message-ID: <508A26D5.9070701@jetbrains.com> > Ah, 'unordered' is my 'asKindBParallel' operation. To make my previous discussion concrete, I'm arguing that the result of 'unordered' -- call it an 'UnorderedStream' -- should _not_ extend 'Stream', but rather should have its own set of operations that make sense in an unordered world (e.g., no 'findFirst', no 'skip', commutative-op-only 'reduce', etc.). But as Brian pointed out, it needs to still have sorted(), groupBy() and such, because sometimes we need then stable, and sometimes we need them just fast. From brian.goetz at oracle.com Fri Oct 26 05:40:37 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 26 Oct 2012 08:40:37 -0400 Subject: Encounter order In-Reply-To: <508A266F.7070202@jetbrains.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> <5089A1D7.3080802@jetbrains.com> <5089A4DD.9050108@oracle.com> <508A266F.7070202@jetbrains.com> Message-ID: <508A84C5.2010703@oracle.com> >> People don't expect ordering of the *keys*, and that's fine. What I >> was talking about was ordering of the values, since the Map returned >> is a Map>. > Your reduceBy() looked like this: > > Map reduceBy(...) > > no collections anywhere. Was that a mistake? groupBy returns a Map>. reduceBy returns a Map, because instead of accumulating elements in a K bucket and sticking them in a collection, we reduce all the elements in the K bucket. For example: Map longestDocByAuthor = docs.reduceBy(Document::getAuthor, (d1, d2) -> (d1.length() > d2.length() ? d1 : d2)); From andrey.breslav at jetbrains.com Fri Oct 26 05:55:08 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Fri, 26 Oct 2012 16:55:08 +0400 Subject: Encounter order In-Reply-To: <508A84C5.2010703@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> <5089A1D7.3080802@jetbrains.com> <5089A4DD.9050108@oracle.com> <508A266F.7070202@jetbrains.com> <508A84C5.2010703@oracle.com> Message-ID: <508A882C.3030406@jetbrains.com> People don't expect ordering of the *keys*, and that's fine. What I >>> was talking about was ordering of the values, since the Map returned >>> is a Map>. >> Your reduceBy() looked like this: >> >> Map reduceBy(...) >> >> no collections anywhere. Was that a mistake? > > groupBy returns a Map>. > > reduceBy returns a Map, because instead of accumulating elements > in a K bucket and sticking them in a collection, we reduce all the > elements in the K bucket. > > For example: > > Map longestDocByAuthor > = docs.reduceBy(Document::getAuthor, > (d1, d2) -> (d1.length() > d2.length() ? d1 : d2)); > Yes. And my point was that thus no ordering considerations are applicable to the result of reduceBy. Or am I missing something? From brian.goetz at oracle.com Fri Oct 26 06:04:23 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 26 Oct 2012 09:04:23 -0400 Subject: Encounter order In-Reply-To: <508A882C.3030406@jetbrains.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> <5089A1D7.3080802@jetbrains.com> <5089A4DD.9050108@oracle.com> <508A266F.7070202@jetbrains.com> <508A84C5.2010703@oracle.com> <508A882C.3030406@jetbrains.com> Message-ID: <508A8A57.6030803@oracle.com> Its the same problem of sensibility that we get with ordinary reduce. If we can only assume the reducer is associative, and we see that the input source structurally has an encounter order, then reordering the elements could result in the wrong answer. (i.e., it stands to reason that stream.reduce(r) should produce the same result as stream.reduceBy(e -> 1, r).get(1)). On 10/26/2012 8:55 AM, Andrey Breslav wrote: > People don't expect ordering of the *keys*, and that's fine. What I >>>> was talking about was ordering of the values, since the Map returned >>>> is a Map>. >>> Your reduceBy() looked like this: >>> >>> Map reduceBy(...) >>> >>> no collections anywhere. Was that a mistake? >> >> groupBy returns a Map>. >> >> reduceBy returns a Map, because instead of accumulating elements >> in a K bucket and sticking them in a collection, we reduce all the >> elements in the K bucket. >> >> For example: >> >> Map longestDocByAuthor >> = docs.reduceBy(Document::getAuthor, >> (d1, d2) -> (d1.length() > d2.length() ? d1 : d2)); >> > Yes. And my point was that thus no ordering considerations are > applicable to the result of reduceBy. Or am I missing something? From andrey.breslav at jetbrains.com Fri Oct 26 06:16:42 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Fri, 26 Oct 2012 17:16:42 +0400 Subject: Encounter order In-Reply-To: <508A8A57.6030803@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> <5089A1D7.3080802@jetbrains.com> <5089A4DD.9050108@oracle.com> <508A266F.7070202@jetbrains.com> <508A84C5.2010703@oracle.com> <508A882C.3030406@jetbrains.com> <508A8A57.6030803@oracle.com> Message-ID: <508A8D3A.4000105@jetbrains.com> Makes sense. Thanks On 26.10.2012 17:04, Brian Goetz wrote: > Its the same problem of sensibility that we get with ordinary reduce. > If we can only assume the reducer is associative, and we see that the > input source structurally has an encounter order, then reordering the > elements could result in the wrong answer. (i.e., it stands to reason > that stream.reduce(r) should produce the same result as > stream.reduceBy(e -> 1, r).get(1)). > > On 10/26/2012 8:55 AM, Andrey Breslav wrote: >> People don't expect ordering of the *keys*, and that's fine. What I >>>>> was talking about was ordering of the values, since the Map returned >>>>> is a Map>. >>>> Your reduceBy() looked like this: >>>> >>>> Map reduceBy(...) >>>> >>>> no collections anywhere. Was that a mistake? >>> >>> groupBy returns a Map>. >>> >>> reduceBy returns a Map, because instead of accumulating elements >>> in a K bucket and sticking them in a collection, we reduce all the >>> elements in the K bucket. >>> >>> For example: >>> >>> Map longestDocByAuthor >>> = docs.reduceBy(Document::getAuthor, >>> (d1, d2) -> (d1.length() > d2.length() ? d1 : d2)); >>> >> Yes. And my point was that thus no ordering considerations are >> applicable to the result of reduceBy. Or am I missing something? From brian.goetz at oracle.com Fri Oct 26 10:32:38 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Fri, 26 Oct 2012 13:32:38 -0400 Subject: This Week in Lambda Libs Message-ID: <508AC936.6030505@oracle.com> I had promised to do a weekly update of what was happening in the repo, but fell down on that due to travel. Here's an accumulated update to date. (More info in the hg logs.) *Sept 25, 2012 - Oct 24, 2012 * *Added Stream.concat (Paul). *Added Stream.concat(Stream otherStream). Not clear whether this should be an instance method on Stream, or a static method; also not clear whether it should take Stream or Streamable (the latter works better if we want to go to detached pipelines.) *Refactor of Stream Op base types (Brian and Paul). *Numerous refactors in the hierarchy for operations; current hierarchy is a single base type StreamOp, with subtypes for IntermediateOp, StatefulOp, and TerminalOp. *Stream flags improvements (Paul). *Added an "encounter order" flag. Define flags with an enum. Make flags into declarative properties of Ops, where an op can declare that they preserve, inject, or clear a given flag, and move responsibility for flag computation into AbstractPipeline. Pass flags to wrap{Sink,Iterator} so they can act on the current set of flags. *Introduction of PipelineHelper (Brian and Paul). *All computations about the current state of the pipeline are no centralized in PipelineHelper (serial and parallel versions), which is passed to StreamOp.evaluate{Serial,Parallel}. *New parallel implementations (Brian and Paul). *Parallel implementation of groupBy, removeDuplicates, toArray. *Improvements in Node and NodeBuilder (Paul). *Node is a simple conc-tree representation, which can both be the output or source of a stream. Parallel operations that can produce flat arrays are encouraged to do so; tree shape is known to Node so operations like toArray can become no-ops if upstream ops have already produced a flattened array. More optimizations to effectively use known source/split size to avoid copying. *Removal of MapStream (Brian). *While at first having map-shaped streams was an "obvious" requirement, it turns out that they failed a cost-benefit analysis. They introduced a lot of complexity into the implementation, and in our explorations for use cases, we found that most were well-enough handled by map.{keys,values,entrySet}().stream(), or by stream.reduceBy (the fused group+reduce operation). *Common implementations of forEach/reduce (Brian). *Factored out common mutable-reduce logic into OpUtils; elemented separate decomposition implementation in ForEachOp, GroupByOp, UniqOp, FoldUp, SeedlessFoldUp. Reduced boxing overhead in folding when intermediate representation (e.g., T) was different from final representation (e.g., Optional). *Optimizations in StreamBuilder (Mike). *Implement a no-copy List optimized for append*-iterate*. *Spliterator implementation improvements (Mike). *Broader/better implementation of Spliterator throughout Collections. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121026/5689a3a9/attachment.html From forax at univ-mlv.fr Sat Oct 27 03:19:52 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sat, 27 Oct 2012 12:19:52 +0200 Subject: Introduce Covariance/Contravariance at declaration site for Java 9 ? In-Reply-To: References: <507C899F.1000907@univ-mlv.fr> <31095704-E327-43CB-ACB3-98E3DD71BD6B@oracle.com> Message-ID: <508BB548.20704@univ-mlv.fr> I've maybe found a solution :) First, it's not possible to let the compiler do the transformation automagically because in that case adding a default method to a function interface may break a lot of user code. A function interface that is inherently co/contravariant should be declared as such in its declaration. Removing the burden for the user to write wildcard is a nice goal, but it means that the Java type system will be inherently more complex and it may not worth to add such complexity for little benefit. The issue is that users that creates API forget to add wildcards, so the practical solution is to add a lint option to javac that warn user when they use a co/contravariant function interfaces without specifying wildcards. This solution doesn't change the type system so it can be implemented without pain and fear of corner cases. Here is the proposal: - adds two new type annotations (as defined by JSR308) in java.lang.annotation, Covariant and Contravariant, that applies on type variables. These annotations are not inherited. - add a new lint pass to javac that checks parameters of methods (not returns type, you should not use wildcards on return type). If a type of a parameter is a parametrized type with type variables annotated with Covariant (resp. contravariant), emit a warning is the parametrized type is not a ? extends X (resp. ? super X). so by example, the function interface Mapper will be declared: interface Mapper<@Covariant U, @Contravariant T> { public U map(T element); } and Iterator and Iterable can be retrofitted like this: interface Iterator<@Covariant T> { ... } interface Iterable<@Covariant T> { ... } but ListIterator, that inherits from Iterator can not have it's parameter T declared as covariant because ListIterator defined methods add and set that takes a T as parameter (that's why annotations @Covariant/@Contravariant should be declared are not inheritable). now if in a class there is a method declared like this: static Iterator mapIterator(Iterator iterator, Mapper mapper) { ... } the compiler will emit two warnings because the parameter iterator should be an Iterator and mapper should be a Mapper. The only open question is does lint should accept to tag a type parameter with @Covariant/@Contravariant even if the type parameter appear in method of the interface at position it should not. C# does that. cheers, R?mi On 10/18/2012 09:25 PM, Kevin Bourrillion wrote: > FTR, I agree fairly strongly with everything Dan says here. > > > On Thu, Oct 18, 2012 at 12:20 PM, Dan Smith > wrote: > > I think it's a good idea, at least worth serious consideration. > > There would be no _requirement_ to design libraries in > declaration-site-friendly ways, but the fact is we already have > _lots_ of types that are inherently co-/contra- variant, and the > "right" way to use those types is to always use a wildcard. It > turns into a mechanical transformation that obscures the code > behind layers of wildcards and pointlessly punishes users if they > mess up; it would sure be nice to remove that burden from clients > of variant types. > > Anyway, I can say it's on the radar. But maybe we will conclude > it's a horrible idea; or maybe other things will take priority. > > ?Dan > > On Oct 15, 2012, at 6:24 PM, Joshua Bloch > wrote: > > > I believe that declaration site variance annotations are every > bit as bad as use-site annotations. They're bad in a different > way--they force you to write idiosyncratic types because natural > types don't lend themselves to fixed variance restrictions--but > they're still bad. Providing both use and declaration site > variance in one language is the worst of both worlds (unless > you're trying to kill the language). > > > > Josh > > > > On Mon, Oct 15, 2012 at 3:09 PM, Remi Forax > wrote: > > I've just read the presentation of Stuart Marks at JavaOne [1], > > all examples after slide 32, the first one that use lambdas are > not written correctly > > because all method signatures do not use wildcards. > > > > Brian, I know that we will not be able to introduce > covariance/contravariance > > at declaration site for Java 8, so the solution we will deliver > will be far from perfect > > because nobody understand wildcards. > > Is there a way to free Dan and Maurizio enough time to > investigate if > > covariance/contravariance can be added to Java 9. > > > > R?mi > > [1] > https://stuartmarks.wordpress.com/2012/10/07/javaone-2012-jump-starting-lambda-programming/ > > > > > > > > > -- > Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com > > From brian.goetz at oracle.com Sat Oct 27 09:22:28 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sat, 27 Oct 2012 12:22:28 -0400 Subject: Introduce Covariance/Contravariance at declaration site for Java 9 ? In-Reply-To: <508BB548.20704@univ-mlv.fr> References: <507C899F.1000907@univ-mlv.fr> <31095704-E327-43CB-ACB3-98E3DD71BD6B@oracle.com> <508BB548.20704@univ-mlv.fr> Message-ID: <508C0A44.9090304@oracle.com> This is definitely worthy of exploration. And, the good news is, it can be entirely implemented using a JSR-308 checker -- so (at least initially) no javac changes are required to develop and validate the concept. This fits nicely into the use of annotations as a means of capturing design intent, and enabling tools to verify design intent. (Its bad enough we make these mistakes; its worse that some of these then cannot be corrected because of compatibility concerns.) On 10/27/2012 6:19 AM, Remi Forax wrote: > I've maybe found a solution :) > > First, it's not possible to let the compiler do the transformation > automagically because > in that case adding a default method to a function interface may break a > lot of user code. > A function interface that is inherently co/contravariant should be > declared as such in its declaration. > > Removing the burden for the user to write wildcard is a nice goal, but > it means that > the Java type system will be inherently more complex and it may not > worth to add such > complexity for little benefit. > > The issue is that users that creates API forget to add wildcards, so the > practical solution is to > add a lint option to javac that warn user when they use a > co/contravariant function interfaces > without specifying wildcards. > This solution doesn't change the type system so it can be implemented > without pain and fear of corner cases. > > Here is the proposal: > - adds two new type annotations (as defined by JSR308) in > java.lang.annotation, Covariant and Contravariant, > that applies on type variables. These annotations are not inherited. > - add a new lint pass to javac that checks parameters of methods (not > returns type, you should not use wildcards > on return type). If a type of a parameter is a parametrized type with > type variables annotated with Covariant > (resp. contravariant), emit a warning is the parametrized type is not a > ? extends X (resp. ? super X). > > so by example, the function interface Mapper will be declared: > interface Mapper<@Covariant U, @Contravariant T> { > public U map(T element); > } > and Iterator and Iterable can be retrofitted like this: > interface Iterator<@Covariant T> { ... } > interface Iterable<@Covariant T> { ... } > but ListIterator, that inherits from Iterator can not have it's > parameter T declared as covariant > because ListIterator defined methods add and set that takes a T as > parameter > (that's why annotations @Covariant/@Contravariant should be declared are > not inheritable). > > now if in a class there is a method declared like this: > static Iterator mapIterator(Iterator iterator, Mapper T> mapper) { ... } > the compiler will emit two warnings because the parameter iterator > should be an Iterator > and mapper should be a Mapper. > > The only open question is does lint should accept to tag a type > parameter with @Covariant/@Contravariant > even if the type parameter appear in method of the interface at position > it should not. C# does that. > > cheers, > R?mi > > On 10/18/2012 09:25 PM, Kevin Bourrillion wrote: >> FTR, I agree fairly strongly with everything Dan says here. >> >> >> On Thu, Oct 18, 2012 at 12:20 PM, Dan Smith > > wrote: >> >> I think it's a good idea, at least worth serious consideration. >> >> There would be no _requirement_ to design libraries in >> declaration-site-friendly ways, but the fact is we already have >> _lots_ of types that are inherently co-/contra- variant, and the >> "right" way to use those types is to always use a wildcard. It >> turns into a mechanical transformation that obscures the code >> behind layers of wildcards and pointlessly punishes users if they >> mess up; it would sure be nice to remove that burden from clients >> of variant types. >> >> Anyway, I can say it's on the radar. But maybe we will conclude >> it's a horrible idea; or maybe other things will take priority. >> >> ?Dan >> >> On Oct 15, 2012, at 6:24 PM, Joshua Bloch > > wrote: >> >> > I believe that declaration site variance annotations are every >> bit as bad as use-site annotations. They're bad in a different >> way--they force you to write idiosyncratic types because natural >> types don't lend themselves to fixed variance restrictions--but >> they're still bad. Providing both use and declaration site >> variance in one language is the worst of both worlds (unless >> you're trying to kill the language). >> > >> > Josh >> > >> > On Mon, Oct 15, 2012 at 3:09 PM, Remi Forax > > wrote: >> > I've just read the presentation of Stuart Marks at JavaOne [1], >> > all examples after slide 32, the first one that use lambdas are >> not written correctly >> > because all method signatures do not use wildcards. >> > >> > Brian, I know that we will not be able to introduce >> covariance/contravariance >> > at declaration site for Java 8, so the solution we will deliver >> will be far from perfect >> > because nobody understand wildcards. >> > Is there a way to free Dan and Maurizio enough time to >> investigate if >> > covariance/contravariance can be added to Java 9. >> > >> > R?mi >> > [1] >> >> https://stuartmarks.wordpress.com/2012/10/07/javaone-2012-jump-starting-lambda-programming/ >> >> > >> > >> >> >> >> >> -- >> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >> >> > From andrey.breslav at jetbrains.com Sat Oct 27 09:35:16 2012 From: andrey.breslav at jetbrains.com (Andrey Breslav) Date: Sat, 27 Oct 2012 20:35:16 +0400 Subject: Introduce Covariance/Contravariance at declaration site for Java 9 ? In-Reply-To: <508C0A44.9090304@oracle.com> References: <507C899F.1000907@univ-mlv.fr> <31095704-E327-43CB-ACB3-98E3DD71BD6B@oracle.com> <508BB548.20704@univ-mlv.fr> <508C0A44.9090304@oracle.com> Message-ID: <1C474EFE-1292-4F92-BE49-CD0DFCA2E275@jetbrains.com> Looks like a decent practical solution. An IDE would make it a breeze to put all those wildcards in. On Oct 27, 2012, at 20:22 , Brian Goetz wrote: > This is definitely worthy of exploration. And, the good news is, it can be entirely implemented using a JSR-308 checker -- so (at least initially) no javac changes are required to develop and validate the concept. This fits nicely into the use of annotations as a means of capturing design intent, and enabling tools to verify design intent. > > (Its bad enough we make these mistakes; its worse that some of these then cannot be corrected because of compatibility concerns.) > > On 10/27/2012 6:19 AM, Remi Forax wrote: >> I've maybe found a solution :) >> >> First, it's not possible to let the compiler do the transformation >> automagically because >> in that case adding a default method to a function interface may break a >> lot of user code. >> A function interface that is inherently co/contravariant should be >> declared as such in its declaration. >> >> Removing the burden for the user to write wildcard is a nice goal, but >> it means that >> the Java type system will be inherently more complex and it may not >> worth to add such >> complexity for little benefit. >> >> The issue is that users that creates API forget to add wildcards, so the >> practical solution is to >> add a lint option to javac that warn user when they use a >> co/contravariant function interfaces >> without specifying wildcards. >> This solution doesn't change the type system so it can be implemented >> without pain and fear of corner cases. >> >> Here is the proposal: >> - adds two new type annotations (as defined by JSR308) in >> java.lang.annotation, Covariant and Contravariant, >> that applies on type variables. These annotations are not inherited. >> - add a new lint pass to javac that checks parameters of methods (not >> returns type, you should not use wildcards >> on return type). If a type of a parameter is a parametrized type with >> type variables annotated with Covariant >> (resp. contravariant), emit a warning is the parametrized type is not a >> ? extends X (resp. ? super X). >> >> so by example, the function interface Mapper will be declared: >> interface Mapper<@Covariant U, @Contravariant T> { >> public U map(T element); >> } >> and Iterator and Iterable can be retrofitted like this: >> interface Iterator<@Covariant T> { ... } >> interface Iterable<@Covariant T> { ... } >> but ListIterator, that inherits from Iterator can not have it's >> parameter T declared as covariant >> because ListIterator defined methods add and set that takes a T as >> parameter >> (that's why annotations @Covariant/@Contravariant should be declared are >> not inheritable). >> >> now if in a class there is a method declared like this: >> static Iterator mapIterator(Iterator iterator, Mapper> T> mapper) { ... } >> the compiler will emit two warnings because the parameter iterator >> should be an Iterator >> and mapper should be a Mapper. >> >> The only open question is does lint should accept to tag a type >> parameter with @Covariant/@Contravariant >> even if the type parameter appear in method of the interface at position >> it should not. C# does that. >> >> cheers, >> R?mi >> >> On 10/18/2012 09:25 PM, Kevin Bourrillion wrote: >>> FTR, I agree fairly strongly with everything Dan says here. >>> >>> >>> On Thu, Oct 18, 2012 at 12:20 PM, Dan Smith >> > wrote: >>> >>> I think it's a good idea, at least worth serious consideration. >>> >>> There would be no _requirement_ to design libraries in >>> declaration-site-friendly ways, but the fact is we already have >>> _lots_ of types that are inherently co-/contra- variant, and the >>> "right" way to use those types is to always use a wildcard. It >>> turns into a mechanical transformation that obscures the code >>> behind layers of wildcards and pointlessly punishes users if they >>> mess up; it would sure be nice to remove that burden from clients >>> of variant types. >>> >>> Anyway, I can say it's on the radar. But maybe we will conclude >>> it's a horrible idea; or maybe other things will take priority. >>> >>> ?Dan >>> >>> On Oct 15, 2012, at 6:24 PM, Joshua Bloch >> > wrote: >>> >>> > I believe that declaration site variance annotations are every >>> bit as bad as use-site annotations. They're bad in a different >>> way--they force you to write idiosyncratic types because natural >>> types don't lend themselves to fixed variance restrictions--but >>> they're still bad. Providing both use and declaration site >>> variance in one language is the worst of both worlds (unless >>> you're trying to kill the language). >>> > >>> > Josh >>> > >>> > On Mon, Oct 15, 2012 at 3:09 PM, Remi Forax >> > wrote: >>> > I've just read the presentation of Stuart Marks at JavaOne [1], >>> > all examples after slide 32, the first one that use lambdas are >>> not written correctly >>> > because all method signatures do not use wildcards. >>> > >>> > Brian, I know that we will not be able to introduce >>> covariance/contravariance >>> > at declaration site for Java 8, so the solution we will deliver >>> will be far from perfect >>> > because nobody understand wildcards. >>> > Is there a way to free Dan and Maurizio enough time to >>> investigate if >>> > covariance/contravariance can be added to Java 9. >>> > >>> > R?mi >>> > [1] >>> >>> https://stuartmarks.wordpress.com/2012/10/07/javaone-2012-jump-starting-lambda-programming/ >>> >>> > >>> > >>> >>> >>> >>> >>> -- >>> Kevin Bourrillion | Java Librarian | Google, Inc. |kevinb at google.com >>> >>> >> -- Andrey Breslav http://jetbrains.com Develop with pleasure! From brian.goetz at oracle.com Sun Oct 28 09:12:13 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 28 Oct 2012 12:12:13 -0400 Subject: How far to go in replacing inner classes with lambdas? Message-ID: <508D595D.8050104@oracle.com> The IntelliJ EAP builds already have inspections for "replace inner with lambda" and "replace lambda with method reference" (nice job, guys!) In browsing through the inspection results in the JDK, we find ourselves in the same position we did with Coin features like Diamond; many code snippets can be simplified, but not all code snippets that can be simplified should be. Here's an example: private final static Executor defaultExecutor = new Executor() { // DirectExecutor using caller thread public void execute(Runnable r) { r.run(); } }; We could simplify this to: private final static Executor defaultExecutor = r -> r.run(); or further to private final static Executor defaultExecutor = Runnable::run; (Executor is a SAM interface with one method, execute(Runnable).) What guidelines can we offer users on when they should and should not replace inner classes with lambdas or method reference? Does this last one look a little weird only because the powerful "unbound method reference" construct is still a little unfamiliar? From josh at bloch.us Sun Oct 28 09:53:45 2012 From: josh at bloch.us (Joshua Bloch) Date: Sun, 28 Oct 2012 09:53:45 -0700 Subject: How far to go in replacing inner classes with lambdas? In-Reply-To: <508D595D.8050104@oracle.com> References: <508D595D.8050104@oracle.com> Message-ID: Brian, I would stop at the first refactoring: I'm not convinced that the unbound method reference construct will *ever *look familiar. That said, neither of the refactorings is as easy to understand as the original code declaration. In the first, you loose the fact that the argument to an Executor's sole abstract method is of type Runnable. When you're dealing with two SAM types at once (Executor and Runnable in this case), things can get a bit confusing. The second refactoring does make the argument type (Runnable) explicit, but it's just unreadable. A casualty in both refactorings is the comment: // DirectExecutor using caller thread It's invaluable in keeping the code readable, and must be maintained. Josh P.S. While you're cleaning up the code, "final static" should be "static final." On Sun, Oct 28, 2012 at 9:12 AM, Brian Goetz wrote: > The IntelliJ EAP builds already have inspections for "replace inner with > lambda" and "replace lambda with method reference" (nice job, guys!) > > In browsing through the inspection results in the JDK, we find ourselves > in the same position we did with Coin features like Diamond; many code > snippets can be simplified, but not all code snippets that can be > simplified should be. > > Here's an example: > > private final static Executor defaultExecutor = new Executor() { > // DirectExecutor using caller thread > public void execute(Runnable r) { > r.run(); > } > }; > > We could simplify this to: > > private final static Executor defaultExecutor = r -> r.run(); > > or further to > > private final static Executor defaultExecutor = Runnable::run; > > (Executor is a SAM interface with one method, execute(Runnable).) > > > What guidelines can we offer users on when they should and should not > replace inner classes with lambdas or method reference? Does this last one > look a little weird only because the powerful "unbound method reference" > construct is still a little unfamiliar? > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121028/a72e9561/attachment.html From forax at univ-mlv.fr Sun Oct 28 10:18:59 2012 From: forax at univ-mlv.fr (Remi Forax) Date: Sun, 28 Oct 2012 18:18:59 +0100 Subject: How far to go in replacing inner classes with lambdas? In-Reply-To: References: <508D595D.8050104@oracle.com> Message-ID: <508D6903.6070907@univ-mlv.fr> On 10/28/2012 05:53 PM, Joshua Bloch wrote: > Brian, > > I would stop at the first refactoring: I'm not convinced that the > unbound method reference construct will /ever /look familiar. That > said, neither of the refactorings is as easy to understand as the > original code declaration. In the first, you loose the fact that the > argument to an Executor's sole abstract method is of type Runnable. > When you're dealing with two SAM types at once (Executor and Runnable > in this case), things can get a bit confusing. The second refactoring > does make the argument type (Runnable) explicit, but it's just > unreadable. > > A casualty in both refactorings is the comment: > > // DirectExecutor using caller thread > > It's invaluable in keeping the code readable, and must be maintained. > > Josh > > P.S. While you're cleaning up the code, "final static" should be > "static final." Brian, you can go a step further because either r -> r.run() or Runnable::run are non capturing lambdas thus they doesn't need to be stored in a static final field. So you can replace public NotificationBroadcasterSupport(Executor executor, MBeanNotificationInfo... info) { this.executor = (executor != null) ? executor : defaultExecutor; ... } by public NotificationBroadcasterSupport(Executor executor, MBeanNotificationInfo... info) { // if the executor is null, the runnable is executed by the current thread this.executor = (executor != null) ? executor : r -> r.run(); ... } You can apply the same idea to j.u.functions.Mappers and Predicates. cheers, R?mi > > On Sun, Oct 28, 2012 at 9:12 AM, Brian Goetz > wrote: > > The IntelliJ EAP builds already have inspections for "replace > inner with lambda" and "replace lambda with method reference" > (nice job, guys!) > > In browsing through the inspection results in the JDK, we find > ourselves in the same position we did with Coin features like > Diamond; many code snippets can be simplified, but not all code > snippets that can be simplified should be. > > Here's an example: > > private final static Executor defaultExecutor = new Executor() { > // DirectExecutor using caller thread > public void execute(Runnable r) { > r.run(); > } > }; > > We could simplify this to: > > private final static Executor defaultExecutor = r -> r.run(); > > or further to > > private final static Executor defaultExecutor = Runnable::run; > > (Executor is a SAM interface with one method, execute(Runnable).) > > > What guidelines can we offer users on when they should and should > not replace inner classes with lambdas or method reference? Does > this last one look a little weird only because the powerful > "unbound method reference" construct is still a little unfamiliar? > > From sam at sampullara.com Sun Oct 28 10:26:50 2012 From: sam at sampullara.com (Sam Pullara) Date: Sun, 28 Oct 2012 10:26:50 -0700 Subject: How far to go in replacing inner classes with lambdas? In-Reply-To: <508D6903.6070907@univ-mlv.fr> References: <508D595D.8050104@oracle.com> <508D6903.6070907@univ-mlv.fr> Message-ID: <19C1906C-D316-4DD8-9C01-6B6AB8AB6DAE@sampullara.com> On Oct 28, 2012, at 10:18 AM, Remi Forax wrote: > On 10/28/2012 05:53 PM, Joshua Bloch wrote: >> I would stop at the first refactoring: I'm not convinced that the unbound method reference construct will /ever /look familiar. That said, neither of the refactorings is as easy to understand as the original code declaration. In the first, you loose the fact that the argument to an Executor's sole abstract method is of type Runnable. When you're dealing with two SAM types at once (Executor and Runnable in this case), things can get a bit confusing. The second refactoring does make the argument type (Runnable) explicit, but it's just unreadable. I agree with the unbound method reference being very hard to read ? not sure how long it will take me to get used to them. I'm ok with the Runnable type being lost mostly because it isn't that interesting in this case since you don't do anything with it. >> >> A casualty in both refactorings is the comment: >> >> // DirectExecutor using caller thread Agree. >> >> It's invaluable in keeping the code readable, and must be maintained. >> >> Josh >> >> P.S. While you're cleaning up the code, "final static" should be "static final." > > Brian, > you can go a step further because either r -> r.run() or Runnable::run are non capturing lambdas > thus they doesn't need to be stored in a static final field. > So you can replace > > public NotificationBroadcasterSupport(Executor executor, MBeanNotificationInfo... info) { > this.executor = (executor != null) ? executor : defaultExecutor; > ... > } > > by > > public NotificationBroadcasterSupport(Executor executor, MBeanNotificationInfo... info) { > // if the executor is null, the runnable is executed by the current thread > this.executor = (executor != null) ? executor : r -> r.run(); > ... > } This is pretty nice that we no longer have to have separate fields for stuff like this. Also, putting the lambda in context rather than in a field, I think makes it much more clear what is going on. Almost don't need the comment with this code. Sam > > You can apply the same idea to j.u.functions.Mappers and Predicates. > > cheers, > R?mi > >> >> On Sun, Oct 28, 2012 at 9:12 AM, Brian Goetz > wrote: >> >> The IntelliJ EAP builds already have inspections for "replace >> inner with lambda" and "replace lambda with method reference" >> (nice job, guys!) >> >> In browsing through the inspection results in the JDK, we find >> ourselves in the same position we did with Coin features like >> Diamond; many code snippets can be simplified, but not all code >> snippets that can be simplified should be. >> >> Here's an example: >> >> private final static Executor defaultExecutor = new Executor() { >> // DirectExecutor using caller thread >> public void execute(Runnable r) { >> r.run(); >> } >> }; >> >> We could simplify this to: >> >> private final static Executor defaultExecutor = r -> r.run(); >> >> or further to >> >> private final static Executor defaultExecutor = Runnable::run; >> >> (Executor is a SAM interface with one method, execute(Runnable).) >> >> >> What guidelines can we offer users on when they should and should >> not replace inner classes with lambdas or method reference? Does >> this last one look a little weird only because the powerful >> "unbound method reference" construct is still a little unfamiliar? >> >> > From joe.bowbeer at gmail.com Sun Oct 28 10:44:03 2012 From: joe.bowbeer at gmail.com (Joe Bowbeer) Date: Sun, 28 Oct 2012 10:44:03 -0700 Subject: How far to go in replacing inner classes with lambdas? In-Reply-To: <508D6903.6070907@univ-mlv.fr> References: <508D595D.8050104@oracle.com> <508D6903.6070907@univ-mlv.fr> Message-ID: The r -> r.run() refactoring looks OK to me provided it is not transforming a static nested class into an inner class. If it is performing this transformation, then I will need to inspect the use of this lambda and convince myself that it is not retaining or leaking memory. We're all aware of the SIC warning from FindBugs: http://findbugs.sourceforge.net/bugDescriptions.html#SIC_INNER_SHOULD_BE_STATIC which in my experience is the leading cause of memory leaks. Joe On Sun, Oct 28, 2012 at 10:18 AM, Remi Forax wrote: > Brian, > you can go a step further because either r -> r.run() or Runnable::run are > non capturing lambdas > thus they doesn't need to be stored in a static final field. > So you can replace > > public NotificationBroadcasterSupport**(Executor executor, > MBeanNotificationInfo... info) { > this.executor = (executor != null) ? executor : defaultExecutor; > ... > } > > by > > public NotificationBroadcasterSupport**(Executor executor, > MBeanNotificationInfo... info) { > // if the executor is null, the runnable is executed by the current > thread > this.executor = (executor != null) ? executor : r -> r.run(); > ... > } > > You can apply the same idea to j.u.functions.Mappers and Predicates. > > cheers, > R?mi > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/lambda-libs-spec-experts/attachments/20121028/bba9a45d/attachment.html From brian.goetz at oracle.com Sun Oct 28 10:47:57 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 28 Oct 2012 13:47:57 -0400 Subject: How far to go in replacing inner classes with lambdas? In-Reply-To: References: <508D595D.8050104@oracle.com> <508D6903.6070907@univ-mlv.fr> Message-ID: <508D6FCD.1080906@oracle.com> > The r -> r.run() refactoring looks OK to me provided it is not > transforming a static nested class into an inner class. Actually, the compiler doesn't generate an extra class at all. It generates a method for the body, some metadata in the constant pool describing the target type (e.g., Executor), and an invokedynamic to create the lambda object. With the current (naive) runtime translation strategy, the generated class is a static one, and further, for stateless (non-capturing) lambdas, all evaluations of the same lambda expression are translated to the same cached object -- with that cached object being lazily initialized on first capture. So, if you say Executor e = r -> r.run(); at runtime, there will only be one instance of that lambda, no matter how many times the above line of code is executed. From brian.goetz at oracle.com Sun Oct 28 11:00:28 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Sun, 28 Oct 2012 14:00:28 -0400 Subject: How far to go in replacing inner classes with lambdas? In-Reply-To: References: <508D595D.8050104@oracle.com> Message-ID: <508D72BC.2010806@oracle.com> > I would stop at the first refactoring Indeed, I think that's a pretty sensible place to stop *in this case* (at least in 2012; how we feel about this code in 2017 might well be different after we've been using these features for a few years.) The reason I bring this up is that there are not yet well-understood guidelines about when and how best to use these new features, but the earlier we can come up with proto-EJ-like material, the better chance we have of encouraging people to use these features right. There are a lot of cases where you CAN but its not clear you SHOULD. Many programmers will tend to erroneously think that the smaller code is necessarily better, and especially when the IDE is goading them on to do so, may take it too far. On the other hand, unbound method refs are sometimes exactly what the doctor ordered. But, this doesn't really answer my question yet. In this example, we had a pretty clear intuition about what the right degree was. But, different cases will be different, and there are plenty of cases where the method ref is arguably *more* clear than the lambda. So, what I was trying to get at was: what guidelines can we come up with for when we recommend one vs the other? > I'm not convinced that the > unbound method reference construct will /ever /look familiar. I think in cases where it is obvious what the semantics of the target type are, it will become familiar quite quickly: Comparator c = Comparators.comparing(Person::getLastName); or shapes.stream().filter(Shape::isBlue).into(blueShapes); I don't think anyone who understands what filter() or comparing() does (filter is obvious; comparing takes a function which extracts a Comparable key from a type T, and returns a Comparator) will have any trouble getting used to this construct. What is most confusing about this example is the higher-order-function-ness of comparing(), but once you get past that, its pretty clear, and I would argue that both of these examples are more readable than the lambda alternatives: Comparator c = Comparators.comparing(p -> p.getLastName()); shapes.stream().filter(s -> s.isBlue()).into(blueShapes); I think what makes these different from the Executor example is that it is not completely obvious from context what an Executor does, so you have to go and look at the definition of Executor and reason through what is happening, which is less intuitive. I don't expect anyone to have a candidate rule off the top of their head. But, I think it would be extremely useful if we could offer some guidelines on when different approaches are warranted. > That said, > neither of the refactorings is as easy to understand as the original > code declaration. In the first, you loose the fact that the argument to > an Executor's sole abstract method is of type Runnable. When you're > dealing with two SAM types at once (Executor and Runnable in this case), > things can get a bit confusing. The second refactoring does make the > argument type (Runnable) explicit, but it's just unreadable. > > A casualty in both refactorings is the comment: > > // DirectExecutor using caller thread Yes, I had actually intended to retain the comment in the refactored version, but got distracted while writing my e-mail. > It's invaluable in keeping the code readable, and must be maintained. > > Josh > > P.S. While you're cleaning up the code, "final static" should be "static > final." > > On Sun, Oct 28, 2012 at 9:12 AM, Brian Goetz > wrote: > > The IntelliJ EAP builds already have inspections for "replace inner > with lambda" and "replace lambda with method reference" (nice job, > guys!) > > In browsing through the inspection results in the JDK, we find > ourselves in the same position we did with Coin features like > Diamond; many code snippets can be simplified, but not all code > snippets that can be simplified should be. > > Here's an example: > > private final static Executor defaultExecutor = new Executor() { > // DirectExecutor using caller thread > public void execute(Runnable r) { > r.run(); > } > }; > > We could simplify this to: > > private final static Executor defaultExecutor = r -> r.run(); > > or further to > > private final static Executor defaultExecutor = Runnable::run; > > (Executor is a SAM interface with one method, execute(Runnable).) > > > What guidelines can we offer users on when they should and should > not replace inner classes with lambdas or method reference? Does > this last one look a little weird only because the powerful "unbound > method reference" construct is still a little unfamiliar? > > From paul.sandoz at oracle.com Mon Oct 29 03:04:53 2012 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 29 Oct 2012 11:04:53 +0100 Subject: Encounter order In-Reply-To: <50896216.9080808@oracle.com> References: <5086177D.8050706@oracle.com> <50896216.9080808@oracle.com> Message-ID: <5A500F23-B149-4F69-BF2E-E0091CCECBE2@oracle.com> On Oct 25, 2012, at 6:00 PM, Brian Goetz wrote: > > Given this, my recommendation is: > - Have all the problematic (sort, uniqueElements, groupBy, reduceBy) ops respect encounter order even though it may be expensive > - Provide an .unordered() op for people to opt out of the encounter order when they know they don't care. > - Impute encounter order if we need one and there isn't one (rather than throwing) > +1 Thinking positively we might even be able to find/develop sufficiently optimized implementations over time (even if they are never as fast as non-order-preserving implementations). So it may be sufficient right now for uniqueElements() if each leaf has a LinkedHashSet and the merge/combine adds the right hand set to the left hand set, and the latter pushes up to the parent (similar to of how groupBy is currently implemented). Then we go searching/thinking about parallel map merge/combine algorithms... Paul. From dl at cs.oswego.edu Tue Oct 30 09:14:26 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 30 Oct 2012 12:14:26 -0400 Subject: misc catchup Message-ID: <508FFCE2.2010004@cs.oswego.edu> As most of you probably probably surmised from http://cs.oswego.edu/pipermail/concurrency-interest/2012-October/010182.html implicit use of the ForkJoinPool.commonPool can/should replace the internal ForkJoinUtils in lambda builds. Besides being easier to use, and reducing resource use when people are otherwise tempted to create their own "default" pools, it exploits knowledge of the pool to improve execution times, In particular, fjt.invoke() will often be faster than FJP.invoke(fjt); moreso when people use some new idioms (including those I've thought of and probably more.) As usual, I've been using ConcurrentHashMap to explore maximal customized bulk op performance (using extremely specialized tasks). Using it with commonPool leads to a nicer API and better performance. While I'm at it, a quick digression on all the Encounter Order posts: I confess to not understanding "encounter order". I understand bulk op semantics on random-access data structures -- arrays and associative arrays (i.e., ConcurrentHashMaps), which is why I focus on them. Everything else seems to be a matter of consensus interpretation. I understand that everyone wants parallel ops to work on structures besides arrays and hash maps, and I am happy to live with Brian's attempts to make as much sense of them as possible and have the least sucky performance as possible. But both of these concerns will always be relative. (And one reason not to complain too much is that the Scala folks did approximately the same things and don't seem to regret it yet :-) -Doug From brian.goetz at oracle.com Tue Oct 30 10:49:24 2012 From: brian.goetz at oracle.com (Brian Goetz) Date: Tue, 30 Oct 2012 13:49:24 -0400 Subject: misc catchup In-Reply-To: <508FFCE2.2010004@cs.oswego.edu> References: <508FFCE2.2010004@cs.oswego.edu> Message-ID: <0D7976DF-F7FB-4F4B-B896-E38164176F7B@oracle.com> > As most of you probably probably surmised from > http://cs.oswego.edu/pipermail/concurrency-interest/2012-October/010182.html > implicit use of the ForkJoinPool.commonPool can/should replace > the internal ForkJoinUtils in lambda builds. Besides being easier > to use, and reducing resource use when people are otherwise > tempted to create their own "default" pools, it exploits knowledge > of the pool to improve execution times, In particular, fjt.invoke() > will often be faster than FJP.invoke(fjt); moreso when people use > some new idioms (including those I've thought of and probably more.) As soon as we have this merged, we'll migrate our implementation. > As usual, I've been using ConcurrentHashMap to explore maximal > customized bulk op performance (using extremely specialized > tasks). Using it with commonPool leads to a nicer API and > better performance. Hopefully ditto for streams. From aleksey.shipilev at oracle.com Tue Oct 30 12:22:06 2012 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Tue, 30 Oct 2012 23:22:06 +0400 Subject: misc catchup In-Reply-To: <0D7976DF-F7FB-4F4B-B896-E38164176F7B@oracle.com> References: <508FFCE2.2010004@cs.oswego.edu> <0D7976DF-F7FB-4F4B-B896-E38164176F7B@oracle.com> Message-ID: <509028DE.1070605@oracle.com> On 10/30/2012 09:49 PM, Brian Goetz wrote: >> As most of you probably probably surmised from >> http://cs.oswego.edu/pipermail/concurrency-interest/2012-October/010182.html >> implicit use of the ForkJoinPool.commonPool can/should replace >> the internal ForkJoinUtils in lambda builds. Besides being easier >> to use, and reducing resource use when people are otherwise >> tempted to create their own "default" pools, it exploits knowledge >> of the pool to improve execution times, In particular, fjt.invoke() >> will often be faster than FJP.invoke(fjt); moreso when people use >> some new idioms (including those I've thought of and probably more.) > > As soon as we have this merged, we'll migrate our implementation. It's actually there in my patch on lambda-dev at . I had dumped the parts already implemented in FJP from ForkJoinUtils, as well as translate our usages of FJP.invoke(fjt) to fjt.invoke(), plus translating tryCompleter -> helpCompleter. You might want to review that patch more thoroughly, but the wakeup performance is clearly better. -Aleksey. From david.holmes at oracle.com Wed Oct 31 00:05:18 2012 From: david.holmes at oracle.com (David Holmes) Date: Wed, 31 Oct 2012 17:05:18 +1000 Subject: misc catchup In-Reply-To: <508FFCE2.2010004@cs.oswego.edu> References: <508FFCE2.2010004@cs.oswego.edu> Message-ID: <5090CDAE.5040306@oracle.com> Hi Doug, I didn't know this was being folded back into FJP. We'll have to update the JEP. A couple of comments: 1. The new spin-lock stuff using Unsafe directly isn't sending a very good message. We're basically saying ReentrantLock is not performant and we have to dive under the covers to get performance. :( That may be the truth but it is still not a good message. 2. I don't recall if we definitely established that a thread can always interrupt itself, but if not I can't quite convince myself that if tryAwaitMainLock throws SecurityException that everything continues to work ok - deregisterWorker seems problematic as the WorkQueue state has already been mutated. Cheers, David On 31/10/2012 2:14 AM, Doug Lea wrote: > > As most of you probably probably surmised from > http://cs.oswego.edu/pipermail/concurrency-interest/2012-October/010182.html > > implicit use of the ForkJoinPool.commonPool can/should replace > the internal ForkJoinUtils in lambda builds. Besides being easier > to use, and reducing resource use when people are otherwise > tempted to create their own "default" pools, it exploits knowledge > of the pool to improve execution times, In particular, fjt.invoke() > will often be faster than FJP.invoke(fjt); moreso when people use > some new idioms (including those I've thought of and probably more.) > > As usual, I've been using ConcurrentHashMap to explore maximal > customized bulk op performance (using extremely specialized > tasks). Using it with commonPool leads to a nicer API and > better performance. > > While I'm at it, a quick digression on all the Encounter Order > posts: I confess to not understanding "encounter order". > I understand bulk op semantics on random-access data structures -- > arrays and associative arrays (i.e., ConcurrentHashMaps), which > is why I focus on them. > Everything else seems to be a matter of consensus interpretation. > I understand that everyone wants parallel ops to work on structures > besides arrays and hash maps, and I am happy to live with Brian's > attempts to make as much sense of them as possible and have the least > sucky performance as possible. But both of these concerns will always > be relative. (And one reason not to complain too much is that the > Scala folks did approximately the same things and don't seem to > regret it yet :-) > > -Doug > > > > > > From dl at cs.oswego.edu Wed Oct 31 05:16:31 2012 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 31 Oct 2012 08:16:31 -0400 Subject: misc catchup In-Reply-To: <5090CDAE.5040306@oracle.com> References: <508FFCE2.2010004@cs.oswego.edu> <5090CDAE.5040306@oracle.com> Message-ID: <5091169F.1030908@cs.oswego.edu> On 10/31/12 03:05, David Holmes wrote: > Hi Doug, > > I didn't know this was being folded back into FJP. We'll have to update the JEP. > Sorry not to give more people advanced warning, but I wanted to do most of this out before announcing, to internally sanity-check that it wouldn't hit some big snag. I expect to find ways to further improve now that the basic structure is stabilizing. > 1. The new spin-lock stuff using Unsafe directly isn't sending a very good > message. We're basically saying ReentrantLock is not performant and we have to > dive under the covers to get performance. :( That may be the truth but it is > still not a good message. No. The reason is entirely to reduce initial commonPool footprint -- there is no secondary allocation (just setup of 12 fields), not even for a Lock object needed to further initialize. This might shrink even further. (It takes some unfortunate luck to ever get that lock to block. Most runs never do. So using the CHM 2bit scheme works fine even through the construction is terrible if heavily contended.) (BTW, mostly unrelatedly, AQS improvements based on some of the new adaptive spin stuff in StampedLock are on the todo list.) > > 2. I don't recall if we definitely established that a thread can always > interrupt itself, but if not I can't quite convince myself that if We did, but ... > tryAwaitMainLock throws SecurityException that everything continues to work ok - > deregisterWorker seems problematic as the WorkQueue state has already been mutated. > ... for sake of paranoia, I added the catch (SecurityException ignore) -Doug From mike.duigou at oracle.com Wed Oct 31 13:16:25 2012 From: mike.duigou at oracle.com (Mike Duigou) Date: Wed, 31 Oct 2012 13:16:25 -0700 Subject: Review Request: CR#8001634 : Initial set of lambda functional interfaces Message-ID: There's a large set of library changes that will be coming with Lambda. We're getting near the end of the runway and there's lots left to do so we want to start the process of getting some of the more stable pieces put back to the JDK8 repositories. We've spent a some time slicing things into manageable chunks. This is the first bunch. We'd like to time-box this review at one week (until Nov. 7th), since there are many more pieces to follow. The first chunk is the basic set of functional interface types. While this set is not complete, it is enough to be able to proceed on some other pieces. This set contains no extension methods (we'll do those separately) and does not contain all the specializations we may eventually need. Doug has also suggested we have some sort of regularized, low-level naming scheme. There's nothing in this bunch that is inconsistent with that; if we had such a thing, the nominal SAMs here could easily implement the horribly named low-level versions. We're still thinking about how that might fit in, so while that's not directly reflected here, it hasn't been forgotten. The specification is limited; most of the interesting restrictions (side-effect-freedom, idempotency, stability) would really be imposed not by the SAM itself by by how the SAM is used in a calculation. However, some common doc for "how to write good SAMs" that we can stick in the package doc would be helpful. Suggestions welcome. Elements of this naming scheme include: - Each SAM type has a unique (arity, method name) pair. This allows SAMs to implement other SAMs without collision. - The argument lists are structured so that specializations act on the first argument(s), so IntMapper is a specialization of Mapper, and IntBinaryOperator is a specialization of BinaryOperator. - Multi-arg versions use prefix BiXxx, TriXxx, as suggested by Doug. However, the "natural" arity varies. No good two or three letter prefix for zero or one comes to mind (e.g., UnaryFactory or NilaryBlock (though that's the same as Runnable.) So that could be improved. Please review and comment. http://cr.openjdk.java.net/~mduigou/8001634/2/webrev/ From david.holmes at oracle.com Wed Oct 31 17:45:19 2012 From: david.holmes at oracle.com (David Holmes) Date: Thu, 01 Nov 2012 10:45:19 +1000 Subject: misc catchup In-Reply-To: <5091169F.1030908@cs.oswego.edu> References: <508FFCE2.2010004@cs.oswego.edu> <5090CDAE.5040306@oracle.com> <5091169F.1030908@cs.oswego.edu> Message-ID: <5091C61F.4000406@oracle.com> On 31/10/2012 10:16 PM, Doug Lea wrote: > On 10/31/12 03:05, David Holmes wrote: >> Hi Doug, >> >> I didn't know this was being folded back into FJP. We'll have to >> update the JEP. >> > > Sorry not to give more people advanced warning, but I wanted to > do most of this out before announcing, to internally sanity-check > that it wouldn't hit some big snag. I expect to find ways to further > improve now that the basic structure is stabilizing. > >> 1. The new spin-lock stuff using Unsafe directly isn't sending a very >> good >> message. We're basically saying ReentrantLock is not performant and we >> have to >> dive under the covers to get performance. :( That may be the truth but >> it is >> still not a good message. > > No. The reason is entirely to reduce initial commonPool footprint -- > there is no secondary allocation (just setup of 12 fields), not > even for a Lock object needed to further initialize. This might > shrink even further. Okay - so that raised a point I forgot to raise - this common pool is always created, not just on demand. I understand the expectation is that anyone loading FJP will in all likelihood be using the common pool. But my concern is that pool initialization is part of the class static initialization, so if anything goes wrong the class will be unusable. My main concern is potential SecurityExceptions from the getProperty calls. Am I being paranoid? :) Thanks, David ----- > (It takes some unfortunate luck to ever get that lock to block. > Most runs never do. So using the CHM 2bit scheme works fine > even through the construction is terrible if heavily contended.) > > (BTW, mostly unrelatedly, AQS improvements based on some of the > new adaptive spin stuff in StampedLock are on the todo list.) > >> >> 2. I don't recall if we definitely established that a thread can always >> interrupt itself, but if not I can't quite convince myself that if > > We did, but ... > >> tryAwaitMainLock throws SecurityException that everything continues to >> work ok - >> deregisterWorker seems problematic as the WorkQueue state has already >> been mutated. >> > > ... for sake of paranoia, I added the > catch (SecurityException ignore) > > -Doug > > From maurizio.cimadamore at oracle.com Mon Oct 29 02:54:48 2012 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 29 Oct 2012 09:54:48 +0000 Subject: How far to go in replacing inner classes with lambdas? In-Reply-To: <508D595D.8050104@oracle.com> References: <508D595D.8050104@oracle.com> Message-ID: <508E5268.40707@oracle.com> On 28/10/12 16:12, Brian Goetz wrote: > The IntelliJ EAP builds already have inspections for "replace inner > with lambda" and "replace lambda with method reference" (nice job, guys!) Note that javac has been supporting a similar suggestion from early days: it detect inner classes that are lambdifiable when the -XDidentifyLambdaCandidates is enabled. I think at some point I even did an experiment to see how many instances I could find in the JDK - some snippets are reported below: *) Total SAM types in JDK: 807 *) Total potential lambda sites in JDK: 1862 [It is possible that the above numbers are a bit different now, because of changes in the nature of the functional interface definition and changes to the JDK codebase - however they should give a rough idea of how many potential lambda sites we have in our code]. Maurizio > > In browsing through the inspection results in the JDK, we find > ourselves in the same position we did with Coin features like Diamond; > many code snippets can be simplified, but not all code snippets that > can be simplified should be. > > Here's an example: > > private final static Executor defaultExecutor = new Executor() { > // DirectExecutor using caller thread > public void execute(Runnable r) { > r.run(); > } > }; > > We could simplify this to: > > private final static Executor defaultExecutor = r -> r.run(); > > or further to > > private final static Executor defaultExecutor = Runnable::run; > > (Executor is a SAM interface with one method, execute(Runnable).) > > > What guidelines can we offer users on when they should and should not > replace inner classes with lambdas or method reference? Does this > last one look a little weird only because the powerful "unbound method > reference" construct is still a little unfamiliar?